As intrepid explorers of cutting-edge technology, we find ourselves perpetually scaling new peaks. Today, our focus is the Automatic1111 User Interface and the WebUI Forge User Interface. If you’ve dabbled in Stable Diffusion models and have your fingers on the pulse of AI art creation, chances are you’ve encountered these 2 popular Web UIs. Its power, myriad options, and tantalizing dropdown menus promise an exhilarating new ways to create content.
But what does it all mean? What do these tabs, options, and sliders do? I empathize—I’ve been there too. That’s precisely why we’re going to deep dive into WebUI Forge and Automatic1111’s user interface (They’re both similar). From the subtle intricacies of the txt2img tab to the img2img section, not to mention the ever-useful Upscaler options—we’re dissecting them all, understanding their functions, and unlocking their potential.
This guide is tailor-made for novices like you. Whether you’re dipping your toes into Automatic1111 or you’re an intermediate user hungry for every last detail, this series is your compass. Now, let’s get to work and learn what all these buttons and sliders do.
What is Automatic 1111 web ui?
Automatic1111 Web UIis a browser-based interface forStable Diffusion, a powerful open-source AI art generator. It provides a user-friendly way to interact with Stable Diffusion models and create AI images. Here are some key features of theAutomatic1111 Web UI:
To install the Web UI, check out this guide below:
What is Webui Forge?
Stable Diffusion WebUI Forgeis a platform built on top ofStable Diffusion WebUI, which is based on theGradioframework.The name “Forge” draws inspiration from “Minecraft Forge.” This project aims to enhance the functionality and efficiency of the originalStable Diffusion WebUIby making development easier, optimizing resource management, and speeding up inference.
Key Features ofStable Diffusion WebUI Forge:
Speed Improvements:
- The GPU memory peak (as seen in the task manager) will decrease by about700MB to 1.3GB.
- The maximum diffusion resolution (that won’t cause out-of-memory errors) will increase by approximately2x to 3x.
- The maximum diffusion batch size (that won’t cause out-of-memory errors) will increase by about4x to 6x.
- For less powerful GPUs (e.g., 6GB VRAM), the speed-up is even more significant.
- Even powerful GPUs (e.g., 4090 with 24GB VRAM) benefit from some improvements, albeit to a lesser extent.
How to Install Automatic1111 Web UI for Stable Diffusion
Installing the Automatic1111 Web UI for Stable Diffusion requires a solid groundwork. If you’ve been following our guide series, you’ve likely laid down this essential foundation. This tutorial builds upon the preparatory steps detailed in our previous blog.
Installing the WebUI Forge for Stable Diffusion requires a solid groundwork. If you’ve been following our guide series, you’ve likely laid down this essential foundation. This tutorial builds upon the preparatory steps detailed in our previous blog so that you can learn how to Install WebUI Forge for Stable Diffusion.
In summary,Stable Diffusion WebUI Forgeprovides a faster and more efficient experience for image synthesis and manipulation, especially for low-VRAM GPUs. To install the Web UI, check out this guide below:
Table of Contents
Mastering the Automatic1111 User Interface & WebUI Forge: A Comprehensive Guide for Stable Diffusion
WEBUI Forge & Automatic1111 User Interface
TheForge and Automatic1111user interface serves as the foundation for your AI art creation. Within this interface, you’ll find an array of tabs and options. While they might appear daunting initially, take the time to explore them—each tab has a distinct purpose, all aimed at simplifying your workflow.
Let’s begin to take a look into the main tabs within the interface and explore their fundamental purposes. Here’s an overview of theGeneralsection in bothAutomatic1111andWebUI Forge:Txt2imgserves as the foundational area, encompassing essential sliders and parameters. Many of these settings are also present in theimg2imgtab. TheExtrastab provides additional features and functionalities, while thePNG Infosection offers information related to PNG files.
For handling checkpoint merging tasks, there’s theCheckpoint Mergertab. TheTrainsection pertains to training processes, and theSettingstab allows customization and configuration. Finally, theExtensionstab expands capabilities through extensions. These basic tabs streamline your experience, whether you’re working with text-to-image or image-to-image transformations. 🚀🎨
Download Model to Follow Along
Prompt (press Ctrl+Enter or Alt+Enter to generate) |
beautiful lady, freckles, dark makeup, hyperdetailed photography, soft light, head and shoulders portrait, cover, expired polaroid |
Negative Prompt (press Ctrl+Enter or Alt+Enter to generate) |
(worst quality, low quality, normal quality, lowres, low details, oversaturated, undersaturated, overexposed, underexposed, grayscale, bw, bad photo, bad photography, bad art)++++, (watermark, signature, text font, username, error, logo, words, letters, digits, autograph, trademark, name)+, (blur, blurry, grainy), morbid, ugly, asymmetrical, mutated malformed, mutilated, poorly lit, bad shadow, draft, cropped, out of frame, cut off, censored, jpeg artifacts, out of focus, glitch, duplicate, (airbrushed, cartoon, anime, semi-realistic, cgi, render, blender, digital art, manga, amateur)++, (3D ,3D Game, 3D Game Scene, 3D Character), (bad hands, bad anatomy, bad body, bad face, bad teeth, bad arms, bad legs, deformities)++ |
Sampling Method: DPM++ 2M Karras | Seed: 2810691589 |
Sampling Steps: 10 | CFG Scale: 7 |
Width: 768 | Height: 768 |
Generation
txt2img
img2img
Extras
PNG Info
Checkpoint Merger
Train
Settings
Extensions
Stable Diffusion Checkpoint
- TheStable Diffusion Checkpointallows you to select the specific checkpoint or models you want to use. It’s an important setting that determines the starting point for your experiments.
- When working with Stable Diffusion, you can choose from various pre-trained checkpoints or even fine-tune your own models.
How to Install Stable Diffusion Models for Automatic1111 Web UI
Learn how to install Stable Diffusion Models for AUTOMATIC1111 Web UI. Access powerful art generators with convenient web interfaces.
LoRa (Add Network to Prompt)
- Think of it as a smaller, more lightweight model compared to the full-sized ones.
- It’s designed to be easier to train and more optimal for certain tasks. You can use LoRa on top of a trained checkpoint to enhance performance or explore different creative possibilities.
- To enable LoRa, make sure you turn it on in the settings.
- Goto Settings > User Interface > [info] Quicksettings list > add sd_lora
Clip Skip
- CLIP Skipis a feature inStable Diffusionthat allows users to skip layers of theCLIP modelwhen generating images. Let me break it down for you:
- CLIP Model: TheCLIP modelis a large language model trained on a massive dataset of text and images.It can be used to generate text descriptions of images and match images to text descriptions.
- Embedding Process: When usingStable Diffusion, theCLIP embedding processis an important step.It takes yourtext promptand converts it into a numerical representation that the model can understand.
- CLIP Skip: This feature lets youskip some layersof theCLIP embedding process. Imagine theCLIP modelas having multiple layers, each becoming more specific than the last.For example, if layer 1 represents “Person,” layer 2 could be “male” and “female.” Going further, layer 3 might include specific terms like “Man,” “boy,” “lad,” “father,” and “grandpa.” TheCLIP Skipoption allows you to stop at a specific layer, effectively controlling the depth of the embedding process.
- Example: If you setCLIP Skipto 2, it means you’ll stop at the 10th layer (assuming the CLIP model has 12 layers).Essentially, you’re stopping just before the deepest level of specificity. [Source 1] [Source 2]
- To enable Clip Skip, make sure you turn it on in the settings.
- Goto Settings > User Interface > [info] Quicksettings list > add CLIP_stop_at_last_layers
SD VAE
Stable Diffusion (SD)incorporates a technique calledVariational Autoencoder (VAE)to enhance the quality of AI-generated images. Let’s learn more:
- What is VAE?
- VAEstands forVariational Autoencoder. It’s a neural network component that encodes and decodes images into and from a smaller latent space. This compression allows for faster computation.
- In the context ofStable Diffusion, VAEs play a crucial role in improving image quality by recovering fine details.
- Do I Need a VAE?
- By default,Stable Diffusionmodels (whether v1, v2, or custom) come with a built-in VAE.
- However, when people refer to downloading and using a VAE, they mean using animproved versionof it. These improved versions result from further fine-tuning the VAE part of the model with additional data.
- Instead of releasing an entirely new model, only the updated VAE portion is provided as a smaller file.
- Effect of Using VAE:
- The impact of using VAE is usually subtle but significant.
- An improved VAE enhances image decoding from the latent space, leading to better recovery of fine details.
- It particularly benefits rendering features likeeyesandtext, where fine details matter.
- Stability AI’s Fine-Tuned VAEs:
- Stability AIhas released two variants of fine-tuned VAE decoders:
- EMA (Exponential Moving Average): Produces sharper images.
- MSE (Mean Square Error): Yields somewhat smoother outputs.
- Both versions improve image quality without any degradation.
- Stability AIhas released two variants of fine-tuned VAE decoders:
- Comparison(256×256 images):
- Original VAE: Used inStable Diffusion.
- ft-EMA VAE: Slightly better overall.
- ft-MSE VAE: Emphasizes MSE reconstruction.
- Example Comparison(512×512 images):
- Rendering eyes improves, especially for small faces.
- Text rendering improvements are less pronounced.
- Neither EMA nor MSE performs worse; they either do better or maintain the same quality.
Download VAE Here.
- Download the required file and place it in the stable-diffusion-webui/models/VAE/ directory.
- Navigate to the Settings Tab on the A1111 Webui, select Stable Diffusion from the left-side menu, click on SD VAE, and then choose ‘vae-ft-mse-840000-ema-pruned’.
- Hit the ‘Apply Settings‘ button and patiently wait for a successful application message.
- Proceed to generate your image in the usual manner, using any Stable Diffusion model in either the txt2img or img2img options.
- To enable SD VAE, make sure you turn it on in the settings.
- Goto Settings > User Interface > [info] Quicksettings list > add sd_vae
Generation Settings
Prompt
Prompt Styles
Sampling Method
Sampling Steps
Hires.fix
Width/Height
Batch count
Batch size
CFG Scale
Seed
Script
Below is the WebUI Forge User Interface. The Different between Forge and Automatic1111 is that Forge comes pre-installed with SVD and Z123. SVD stands for Stable Video Diffusion which allows you to turn still images into animated videos. Stable Zero123 is anAI-powered model for generating novel views of 3D objects with improved quality. Released for non-commercial and research purposes, it uses an improved dataset and elevation conditioning for higher-quality predictions.
Prompt Engineering
This tab, short for Text-to-Image, stands as the cornerstone of the Automatic1111 user interface. It’s the favored tool of a new generation of creators called Prompt Engineers. The concept is straightforward yet powerful: you describe an image using words in the prompt box, and the underlying Stable Diffusion algorithm does its best to materialize your textual description into a tangible image. This feature has truly revolutionized digital art creation, bridging the gap between imagination and visualization.
txt2img (Text to Image)
Prompt Engineeringis a comprehensive discipline within artificial intelligence (AI) that involves the systematic design, refinement, and optimization of prompts. It plays an important role in training AI models to produce specific outputs. Essentially, a prompt serves as the starting point for an AI model to generate a response.It can be as simple as a few words or as complex as an entire paragraph.
Here’s a breakdown of the key aspects related to prompt engineering:
Anatomy of a Good Prompt:
Prompt (press Ctrl+Enter or Alt+Enter to generate) |
The Prompt box is is where you write your textual description of the image you wish to generate. The Stable Diffusion algorithm takes this prompt as input and, using its complex learning and pattern recognition capabilities, attempts to create an image that corresponds to your description. |
Agood promptneeds to bedetailed and specific. When creating a prompt, consider various keyword categories:
- Subject: Clearly define what you want to see in the image or output.
- Medium: Specify the material or style used for the artwork (e.g., digital art, oil painting).
- Style: Describe the desired artistic style.
- Resolution: Indicate the desired image resolution.
- Additional Details: Include any relevant specifics, such as color, lighting, and context.
Use these keywords as a checklist to guide your prompt creation.The more precise your prompt, the better the AI model’s understanding of your intent.
Prompt Analysis:
Subject: Lion man with muscular build.
Medium: Digital Concept Art
Style: Art by Jim Lee
Resolution: 8K
Additional Details: Ambient Lighting
You can see the difference when you use one of these resolution type prompts. It doesn’t literally increase the resolution, but it understands that it means to add more details. In newer and future models, these prompts aren’t necessary anymore but you can experiment with it.
Negative Prompts:
Negative Prompt (press Ctrl+Enter or Alt+Enter to generate) |
The Negative Prompt box is a complementary tool that refines your image generation process. Here, you can input what you specifically don’t want in your image. This allows for a higher degree of control and precision over the output, helping you to avoid unwanted elements in your generated image. |
Negative promptssteer the AI model away from certain outcomes.
- Instead of specifying what you want, you indicate what you don’t want. These are essential for newer AI models (v2) to improve image quality and alignment with expectations.
- For example, if you want to avoid certain attributes (e.g., “ugly,” “deformed”), use negative prompts to guide the model away from those aspects.
Prompt: beautiful lady, freckles, dark makeup, hyperdetailed photography, soft light, head and shoulders portrait, cover, expired polaroid | Prompt: beautiful lady, freckles, dark makeup, hyperdetailed photography, soft light, head and shoulders portrait, cover, expired polaroid |
Negative Prompts: None | Negative Prompts: (worst quality, low quality, normal quality, lowres, low details, oversaturated, undersaturated, overexposed, underexposed, grayscale, bw, bad photo, bad photography, bad art) |
Prompt engineering doesn’t require coding experience; creativity and persistence are key.As AI models continue to evolve, exploring different prompt strategies will enhance your results. Many use LLMs like chatGPT to write their prompts.
Learn how to prompt in my guides below:
Learn the ins and outs of Stable Diffusion Prompt Weights for Automatic1111. I’ll be sharing my findings, breaking down complex concepts into easy-to-understand language, and providing practical examples along the way.
Words of Power: The Essential Cheat Sheet for Super Prompts Techniques
Mastering the art of prompting can be a game-changer. This comprehensive guide serves as a toolkit to enhance your image prompting skills. Leveraging power words, super prompts, and strategically curated techniques, this handbook provides a foundational blueprint for creating prompts that yield phenomenal results. Some of these prompts may already be embedded in your AI…
What are Tokens for?
In the context ofStable Diffusion, atokenserves as a fundamental unit of text input that you can feed to the model. When constructing prompts, you’ll notice a number like “0/75” displayed at the far end of your prompt bar. This represents the standard maximum limit of tokens you can use in a single prompt. If you exceed this limit, the model cleverly divides your prompt into smaller pieces, each consisting of 75 tokens or fewer. These chunks are then processed independently and later combined to form the complete output.
For instance, if you have a 120-token prompt, it gets split into a 75-token chunk and a 45-token chunk, which are processed separately and then merged.
Tokens and the 0/75 Indicator
In the context of Stable Diffusion, a ‘token’ is essentially a unit of text input you can feed to the model. The number “0/75” displayed at the far end of your prompt bar represents the standard maximum limit of tokens you can use in a single prompt. Anything beyond that may not be considered by the model.
Tokens Beyond 75 – Infinite Prompt Length
But what if you have more to say? That’s where ‘Infinite Prompt Length’ comes in. If you type more than 75 tokens, the model cleverly divides your prompt into smaller pieces, each consisting of 75 tokens or fewer, and processes each chunk independently. So, if you have a 120-token prompt, it gets divided into a 75-token chunk and a 45-token chunk, which are processed separately and then combined. This allows you to provide more complex instructions to the model.
The ‘BREAK’ Keyword
If you want to start a new chunk of tokens before reaching the 75-token limit, you can use the ‘BREAK’ keyword. This command will fill the remainder of the current chunk with ’empty’ tokens, and any text you type afterwards will start a new chunk. This gives you more control over how your input is processed. And there you have it! A simplified overview of tokens, infinite prompt length, and the BREAK keyword in Stable Diffusion.
Note: Each letter, number, and punctuation mark counts as a token, but spaces between them do not.
Prompt Style
ThePrompt Stylesection inAutomatic1111/Forgeprovides tools for managing and customizing prompt styles. Here’s a breakdown of its components:
Buttons and Dropdown Menu:
- Four buttonsand adropdown menuconstitute this section.
- These elements allow you to create presets or upload CSV files containing attributes related to image styles.
Button Functions:
Click on the paintbrush icon to the right and the below menu will appear. Here, you will type in your information and save your presets. This becomes valuable as you start to collect data as you become more experienced to save time.- Blue Button (“Read generation parameters from prompt or last generation if prompt is empty”):
- This button extracts generation parameters from the prompt or the last generated image (if the prompt is empty) and populates them in the user interface.
- Trash Bin Icon (“Clears the prompts in the prompt and negative prompt box”):
- Clicking this icon clears the content in the prompt and negative prompt boxes.
- Paint Brush Icon (“Edit Style”):
- Clicking this icon opens a menu where you can edit and save custom styles.
- Dropdown Menu:
- The dropdown menu allows you to select from available prompt styles that you can use for your creative process.
- Blue Button (“Read generation parameters from prompt or last generation if prompt is empty”):
Sampling Method
Sampling methodsplay an important role in theAUTOMATIC1111 Stable Diffusionmodel, which aims to generate denoised images. Let’s take a look at details:
Examples of Sampling Methods
1. What is Sampling?
In the context of Stable Diffusion,samplingrefers to the process of generating a sequence of cleaner and cleaner images from a noisy starting point.
Here’s how it works:
- Initially, a completely random image is generated in the latent space.
- Anoise predictorestimates the noise present in the image.
- The predicted noise is subtracted from the image.
- This denoising process is repeated multiple times (usually a dozen times) until a clean image is obtained.
The method used for this denoising process is called thesamplerorsampling method.
2. Noise Schedule:
Thenoise schedulecontrols the noise level at each sampling step.
Noise is highest at the first step and gradually reduces to zero at the last step.
Increasing the number of sampling steps results in smaller noise reduction between each step, which helps reduce truncation errors.
3. Samplers Overview:
AUTOMATIC1111/WebUI Forge offers a variety of samplers, each with its own characteristics.
- Let’s explore some of them:
While these samplers are essential for Stable Diffusion, they are just one part of the entire model.If you want to learn more, I recommend reading my guide on Sampling Methods for Stable Diffusion. Below is a list of sampling methods available in both Automatic1111 and WebUI Forge.
Sampling Methods in Automatic1111 & WebUI Forge
DPM++ 2M Karras | DPM++ SDE Karras | DPM++ 2M SDE Exponential | DPM++ 2M SDE Karras | Euler a |
Euler | LMS | Heun | DPM2 | DPM2 a |
DPM++ 2S a | DPM++ 2M | DPM++ SDE | DPM++ 2M SDE | DPM++ 2M SDE Heun |
DPM++ 2M SDE Heun Karras | DPM++ 2M SDE Heun Exponential | DPM++ 3M SDE | DPM++ 3M SDE Karras | DPM++ 3M SDE Exponential |
DPM fast | DPM adaptive | LMS Karras | DPM2 Karras | DPM2 a Karras |
DPM++ 2S a Karras | Restart | DDIM | PLMS | UniPC |
Sampling Methods in WebUI Forge
DDPM | DDPM Karras | DPM++ 2M SDE SGMUniform | DPM++ 2M SDE Turbo | DPM++ 2M SGMUniform |
DPM++ 2M Turbo | Euler A SGMUniform | Euler A Turbo | Euler SGMUniform | LCM |
LCM Karras |
To learn more about all the available sampling methods used for Stable Diffusion, please check out the guide below.
Select a Post
Post Excerpt
Sampling Step
In Stable Diffusion, aSampling Steprefers to one iteration of the denoising process used to generate an image from random noise. The model starts with a noisy image and progressively refines it through a series of steps, each time reducing the noise and adding detail to the image. The number of sampling steps can affect the quality and clarity of the final image; more steps generally mean more detail at the cost of longer processing time.
Different samplers may have optimal ranges for the number of steps to produce the best results.For instance, some users find that most images stabilize around 30 steps, while others may increase the steps to 64 or even higher if they’re seeking more coherency or correcting issues like distorted limbs.
The choice of sampler and the number of steps used can significantly influence the outcome, and users often experiment to find the best combination for their specific needs.There are various samplers available, such as Euler, Heun, and DDIM, each with its own characteristics and settings. Each samplers and models perform differently at different steps. This is something you will need to play around with yourself, or look at the suggested directions for the models you use.
I’m working on a guide on how to save Prompt Styles:
Select a Post
Post Excerpt
Hires.fix
HiRes.fixinAutomatic1111is a powerful feature that allows you to upscale your generated images while they’re being created. Keep reading to learn the details:
Upscaling During Generation:
- Normally, you’d generate an image first and then upscale it usingimg2imgor other options in Automatic1111.
- However,HiRes.fixstreamlines this process by increasing the resolutionduring image generationitself.
- This approach ensures that no artifacts or irregularities are introduced during upscaling.
- Additionally, it enhances the overall image quality.
Setting the Target Resolution:
- When you openHiRes.fix, you’ll notice that it’s initially set to‘Upscale by 2’.
- This means that the default width and height (512×512) will be doubled, resulting in a1024×1024image.
- You can further upscale by a maximum factor of4x.
- Be aware that the more you upscale, the longer it will take to generate the image.
Choosing an Upscaler:
- HiRes.fixallows you to select from various upscalers available in a dropdown menu.
- Each upscaler has unique characteristics, so experimentation is key.
- Here are some options:
- Latent Upscaler: A good general-purpose choice.
- R-ESRGAN 4x+: Ideal for photorealistic images.
- R-ESRGAN 4x+ Anime6B: Works well for animated/cartoon-style images.
HiRes Steps: Post-Sampling Brilliance:
- HiRes stepsrefine image quality after the initial sampling.
- These steps occur after the sampling steps and contribute to the overall image quality.
- You can setHiRes stepsin the range of0–150.
- Keeping it at0makes HiRes steps equal to the sampling steps.
- For example, if you have 20 sampling steps and 0 HiRes steps, the total steps would be 40.
In summary,HiRes.fixsimplifies the process of creating high-resolution Stable Diffusion images directly during generation, resulting in better quality and smoother workflows.
Learn more about upscaling below:
ControlNet Upscale: Learn How to Tile and Upscale with Ultimate SD Upscale
Learn how to upscale Stable Diffusion art with ControlNet Upscale. Ever tried to transform your favorite small image into a massive printable billboard, only to be met with unnatural smoothness, loss of details, or quirky artifacts?
Width/Height
It’s pretty obvious what the width and height parameter does in Automatic1111 and WebUI Forge. Let’s learn about it more in details.
Width and Height Parameters:
- In Stable Diffusion, theWidthandHeightparameters determine the size of the generated images.
- By default, Stable Diffusion produces images with dimensions of512×512 pixels.
- You can customize these dimensions to create rectangular images with specific aspect ratios.
- Keep in mind that both the width and height should bemultiples of 8to avoid distortions.
- Aspect ratios can differ. Common dimensions are 512×768 or vice versa. This depends on the models you’re using.
Model Training and Image Size:
- Versions likeStable Diffusion 1.5were trained on512×512images. They work well with 768 pixel images as well.
- However, newer models likeSDXL 1.0,Stable Cascade, andStable Diffusion 3.0are trained on1024×1024images.
- When generating images using these newer models, sticking to the aspect ratio they were trained on (i.e.,1:1 or 4:3) tends to yield the best results.
- For example, if you choose a width of512and a height of1024, you’ll maintain the same aspect ratio as the training data.
Automatic1111 & WebUI Forge:
- TheHeight/Widthparameters inAutomatic1111allow you to specify the desired image size.
- To achieve op
- Width: 512 pixels
- Height: 1024 pixels
Remember that adjusting the image size affects both computational requirements and output quality.Sticking to the aspect ratio used during training ensures consistent and visually pleasing results in Stable Diffusion.
Batch Sizes and Counts
InStable Diffusion, bothbatch sizeandbatch countplay important roles in generating images. Let’s learn what these parameters mean:
Batch Size:
- Batch sizedetermines the number of images generated in a single batch.
- When you set the batch size, you’re essentially specifying how many images will be processed together in parallel.
- Larger batch sizes require moreVRAM(graphics card memory) because each image in the batch needs to be processed simultaneously.
- If your batch size is too high, your graphics card might run out of memory, resulting in an “out-of-memory” error during image generation.
- Generally,8GB of VRAMis sufficient for Stable Diffusion unless you’re generating very large batches of images.
- Increasing the batch size can improve efficiency, but it also increases VRAM usage and processing time.
- Remember that VRAM is separate from your main system RAM and is essential for handling complex AI models.
- Setting a higher batch size means more parallel image processing, but it also requires more VRAM.
Batch Count:
- Batch countspecifies the number of batches of images you want to generate.
- In the Stable Diffusion WebUI, the batch size can go up to8 images, while the batch count can go much higher (up to100).
- The total number of images generated is determined by multiplying the batch size by the batch count.
- For example, if you set the batch count to 10 and the batch size to 1, Stable Diffusion will generate 10 images from the same text prompt.
- Adjusting the batch count allows you to control the overall number of image generations.
In summary,batch sizeaffects parallel processing and VRAM usage, whilebatch countdetermines the total number of images produced. Balancing these settings ensures efficient and successful image generation in Stable Diffusion.
CFG Scale | Classifier Free Guidance
In Stable Diffusion, the CFG Scale, which stands for Classifier Free Guidance scale, is a setting that influences how closely the generated image adheres to the text prompt you provide. A higher CFG Scale value means the output will more strictly align with the input prompt or image, potentially at the cost of introducing distortions.Conversely, a lower CFG Scale value may result in higher quality images that are less faithful to the prompt.
Purpose:
- TheCFG Scalecontrols theguidanceprovided to Stable Diffusion during image generation.
- It plays a crucial role in bothtext-to-image (txt2img)andimage-to-image (img2img)processes.
Functionality:
- When you adjust theCFG Scale, you influence how closely the generated image aligns with yourtext prompt.
- Higher CFG values result in images that closely match the prompt, while lower values allow for more creative deviations.
- Striking the right balance is essential: too high, and the image may overcomplicate; too low, and it might stray from your intent.
Experimentation:
- The ideal CFG Scale varies based on factors like themodeland samplers you’re using and your desired outcome.
- Some models prefer higher CFG for consistency, while others yield abstract results with lower CFG.
- Experiment with different values to find what works best for your creative vision.
Remember, the CFG Scale acts as a balancing tool, shaping the interplay between prompt guidance and artistic exploration!
Imagine CFG as a sliding scale that controls your guide’s attentiveness to your instructions. The default position, at a CFG value of 7, offers a balance, granting Stable Diffusion enough liberty to interpret your prompt creatively, while also ensuring it doesn’t stray too far off course. If you notch it down to 1, you’re essentially giving your guide free rein. Cranking it up above 15, however, is like you’re micromanaging the guide to stick strictly to your instructions.
While the Stable Diffusion Web UI limits the CFG scale between 1 and 30, there are no such limits if you’re working via a Terminal. You can push it all the way up to 999, or even enter negative territory!
CFG Scale Balancing
It’s a balancing act to find the right CFG Scale value that provides the best combination of fidelity to the prompt and image quality.Users often experiment with different values to find the sweet spot for their particular project.
Color Saturation and Contrast:
- As you adjust the CFG value, you’ll notice changes in color saturation and contrast within the generated images.
- Higher CFG values often result in more vivid colors and pronounced contrasts.
Detail and Artifacts:
- However, there’s a delicate balance to strike.
- Pushing the CFG value too high can lead to unintended consequences:
- Loss of Detail: Images may lose fine details, becoming less intricate.
- Blurrier Output: High CFG values might introduce blurriness.
- Artifacts: Unwanted artifacts may appear, affecting image quality.
Scenario Example:
- Consider using theDPM++ Karrasmodel with20 Sampling Steps(the default Stable Diffusion Web UI settings).
- Experiment with CFG values to find the sweet spot that aligns with your creative vision.
Prompt: redhead woman running on a beach CFG Scale
Sampling Steps: 20
The CFG Scale, ranging from 1 to 30, demonstrates noticeable differences at each level. However, these variations are also influenced by other factors and settings. Does that mean you’re stuck with blurry images if you want to stick to higher CFG values? Not quite! You can counterbalance this by:
Increasing Sampling Steps:
Adding more sampling steps typically adds more detail to the output image, but be mindful of processing times, which could increase as well.
Switching Sampling Methods:
Some sampling methods perform better at specific CFG and sampling steps. For instance, UniPC tends to deliver good results at a CFG as low as 3, while DPM++ SDE Karras excels at providing detailed images at CFG values greater than 7.
More on CFG Scales below:
Select a Post
Post Excerpt
To squeeze the best image quality from Stable Diffusion without blowing up memory and processing times, it’s essential to strike the right balance between CFG, sampling steps, and the sampling method. The XYZ Plots technique is a handy tool to neatly display all your results in an organized grid. Be sure to check out my dedicated blog post on mastering XYZ Plots, linked below.
Hires Fix
Upscaler
Latent
Latent (antialiased)
Latent (bicubic)
Latent (bicubic antialiased)
Latent (nearest)
Latent (nearest-exact)
None
Lanczos
Nearest
DAT x2, x3, x4
ESRGAN_4x
LDSR
R-ESRGAN 4x+
R-ESRGAN 4x+ Anime6B
ScuNET GAN
ScuNET PSNR
None
Hires steps
When you enableHiRes stepsinAutomatic1111, it reveals a set of sliders that allow you to fine-tune the resolution of your upscaled images. These steps occur after the initial sampling and contribute significantly to the overall image quality. By adjusting these sliders, you can precisely control how much the image is upscaled, ensuring optimal results for your specific requirements.
- What Are HiRes Steps?
- HiRes stepsrefine image quality after the initial sampling process.
- They occurafterthe sampling steps and are responsible for upscaling the image.
- The total number of steps in the process includes both thesampling stepsand theHiRes steps.
- You can set theHiRes stepsin the range of0 to 150.
- If you keep the HiRes steps at0, it means that there are no additional upscaling steps beyond the initial sampling.
- Why Are HiRes Steps Important?
- HiRes steps allow for further enhancement of image quality.
- They help refine the denoised image obtained from the sampling process.
- By adjusting the number of HiRes steps, you can control the trade-off between image quality and computational resources.
- Note that the impact of HiRes steps on image generation diminishes beyond a certain point.
Denoising Strength
- A value of0has no effect on the image, while a value of1tends to significantly alter the image.
- The denoising strength determines how strongly the upscaler affects image generation.
- By default,Automatic1111sets the denoising strength to0.7.
Upscale by
- The “Upscale By” slider controls the factor by which the image is upscaled during generation.
- By default,Automatic1111sets the width and height at512 x 512pixels.
- When you openHiRes.fix, it’s initially set to‘Upscale by 2’, resulting in a1024 x 1024image.
- You can upscale it by a maximum factor of4x.
- Keep in mind that the more you upscale, the longer it will take to generate your image.
Resize width/Height to
- This setting allows you to specify the target resolution for your image.
- You can either:
- Set the width and height individually (not recommended).
- Use the‘Upscale By’slider to achieve the desired resolution.
- Experiment with different options to find the ideal balance between quality and processing time.
- Remember that resizing during generation ensures no artifacts or irregularities are introduced in the image.
HiRes Checkpoint/Sampling Method
InAutomatic1111 version 1.8, two new tabs have been introduced in theHires Fixsection:Hires CheckpointandHires Sampling Method. These tabs provide additional flexibility when upscaling and fixing images.
- Hires Checkpoint: This feature allows you to select a different checkpoint for the upscaling process. It’s particularly useful if you want to introduce a distinct style to your image during upscaling. You can choose a checkpoint that aligns better with your desired outcome.
- Hires Sampling Method: The sampling method complements different models. Experimenting with various combinations of checkpoints and sampling methods will help you find the optimal approach for upscaling your images. Additionally, you have the option to prompt the model for specific checkpoints and samplers, providing fine-grained control over the upscaling process.
In summary,HiRes.fixprovides a way to upscale your images during the generation process, improving quality and avoiding post-generation artifacts. Play around with these settings to create stellar images inStable DiffusionusingAutomatic1111!🌟🖼️.
Finding the right balance between denoising and upscaling is essential for achieving optimal results in the AUTOMATIC1111 Stable Diffusion model. Experiment with different settings to achieve the desired output! 🌟
Employing the Hires.fix function effectively can help avoid issues such as twinning and loss of composition in your upscaled images. “Twinning,” in this context, refers to the unwanted duplication or multiplication of features in your creations. For instance, this might result in characters with two faces or two heads, which can be visually disruptive unless intentionally desired.
Restore Faces Button – How does restore faces work in Stable Diffusion?
Restore Faces in Automatic1111
TheRestore Facesfeature inAutomatic1111is designed to enhance and restore facial features in images and videos. Specifically, it focuses on improving the appearance of faces by applying advanced restoration techniques. When enabled, it can significantly enhance the quality of facial details, reduce imperfections, and create more visually appealing results.
Face Restoration Models
- CodeFormer:
- This model utilizes a deep learning architecture to restore faces. It is trained on a large dataset of facial images and can handle various lighting conditions, poses, and expressions.
- The goal of CodeFormer is to produce realistic and high-quality results by learning from diverse face images.
- GFPGAN:
- GFPGAN is another face restoration model that uses generative adversarial networks (GANs).
- It enhances facial features by learning from a wide range of face images, resulting in improved visual quality.
How to Turn On Restore Faces
To enable theRestore Facesfeature inAutomatic1111, follow these steps:
- OpenAutomatic1111/WebUI Forge.
- Navigate to theSettingssection.
- Look for thePostprocessingsettings on the left sidebar.
- UnderFace Restoration, you’ll find theRestore Facesoption.
- Click on theRestore Facesbutton to enable it.
- Apply Settings > Reload UI
CodeFormer Weight Slider
- Theweight sliderallows you to control the intensity of the face restoration effect. By adjusting the slider:
- Moving it towards the right increases the restoration effect.
- Moving it towards the left reduces the effect, preserving more of the original facial features.
Move Face Restoration Model from VRAM After Processing
- After the face restoration process (e.g., applying enhancements to an image or video frame), some software applications choose to move the model from VRAM to RAM.
- Here’s how it works:
- During Processing:
- The model resides in VRAM for fast access by the GPU during active processing.
- After Processing:
- Once the processing is complete, the model is moved from VRAM to RAM.
- By moving the model to RAM:
- It frees up VRAM for other tasks.
- It ensures that the model is available for future frames or subsequent processing steps.
- Both the CPU and GPU can access the model, allowing for more flexibility.
- During Processing:
Might be better off sometimes.
In some instances, leaving the ‘Restore Face’ function off could yield better results. On the flip side, if you notice an image where a face appears incomplete or unrefined, that might be a perfect scenario to engage the ‘Restore Face’ feature. In the end, it’s all about experimenting and finding the balance that works best for your creative process. Or you can use After Detailer below for better face detailing.
ADetailer 101: A Detection and Inpainting Tool for Image Quality Improvement in Stable Diffusion
Discover the power of ADetailer, a web-UI extension for image enhancement. Fix distorted faces, change backgrounds, add or remove objects, and more!
Understanding the Seed
Seeds are numbers that define the starting point of your image generation in Automatic1111, a GUI for stable diffusion. They control the content of the image, so changing the seed will result in a different image, even if you keep all the other settings the same. Seeds can be set to any integer value, or -1 for a random seed.
You can think of seeds as coordinates in a very large and complex map of possible images. Each seed corresponds to a unique location on this map, and each location has a different image. When you use Automatic1111, you are exploring this map by moving from one location to another, using your text prompt and other settings as guidance.
The seed value is important because it determines what kind of images you will see when you use Automatic1111. For example, if you use the same prompt but different seeds, you will get different variations of the same concept. If you use the same seed but different prompts, you will get different interpretations of the same image. If you use the same seed and the same prompt, you will get the same image every time.
You can find the seed used to generate an image in Automatic1111 by looking at the output window, the filename, or the PNG info tab. You can also change the seed value manually by typing it in the seed box, or clicking the random button to generate a new random seed. By changing the seed value, you can explore different images that match your prompt and settings.
Additional Seed Tools: Dice, Recycle and ‘Extra’
How to use the Seed in Automatic1111/Forge
The Seed is a number that defines the starting point of your image generation in Automatic1111, a GUI for stable diffusion. It controls the content of the image, so changing the seed will result in a different image, even if you keep all the other settings the same. Seeds can be set to any integer value, or -1 for a random seed.
Seed icons
Next to the Seed box, there are a few icons that you should know about:
- The dice icon: This sets your Seed to -1, which means Automatic1111 will choose a random Seed for each image.
- The recycle icon: This copies the Seed from your previous image.
Extra Seed menu
If you want to have more control over the Seed, you can click on the ‘Extra’ option. This will show you the Extra Seed menu with some advanced settings:
- Variation seed: This is another Seed that you can use to modify your image.
- Variation strength: This is a slider that lets you adjust how much of the Variation seed you want to apply to your image. A value of 0 means you only use the original Seed, while a value of 1 means you only use the Variation seed.
- Resize seed from width/height: This is a checkbox that you should enable when you change the size of your image. Otherwise, changing the width or height will also change the content of your image, even if you keep the same Seed. By checking this option, you preserve the content of your image when resizing.
Finding and changing the Seed
You can find the Seed used to generate an image in Automatic1111 by looking at the output window, the filename, or the PNG info tab . You can also change the Seed value manually by typing it in the Seed box, or clicking the random button to generate a new random Seed. By changing the Seed value, you can explore different images that match your prompt and settings.
How to use the ‘Script’ in Automatic1111
The ‘Script’ is a feature that allows you to automate the image generation process in Automatic1111, a GUI for stable diffusion. It lets you run custom scripts that can modify the settings, prompts, seeds, and outputs of your images. Scripts can be written in Python or other languages, and can be loaded from files or text boxes. Scripts can also be shared and installed from online sources.
Script options
Next to the Script box, there is a dropdown menu that shows you the available scripts that you can use. By default, you have four options:
- None: This means you don’t use any script, and you generate images manually by clicking the buttons.
- Prompt Matrix: This is a built-in script that generates a matrix of images based on different combinations of prompts. You can specify the prompts in the text box, separated by commas. You can also use negative prompts by adding a minus sign before them. For example, if you type
cat, dog, -bird
, you will get images of cats, dogs, and cats and dogs, but not birds. - Prompts from file or textbox: This is a built-in script that generates images based on a list of prompts that you provide in a file or a text box. You can specify the file name or the text box name in the Script box, and the script will read the prompts from there. Each prompt should be on a separate line. You can also use negative prompts by adding a minus sign before them. For example, if you have a file named
animals.txt
with the following content: - X/Y/Z plot: This allows you to generate variations of images based on settings for each parameter within Automatic1111, such as seed, scale, prompt weights, denoising strength, etc. You can also use expressions or functions to define the parameters.
Maximize Your Workflow Efficiency: Pre-Planning with Stable Diffusion’s XYZ Plot Image Grid
Successful AI creations hinge not only on talent, but also on the effective use of sophisticated tools. Among them, the XYZ Plot of Stable Diffusion shines bright. This tool streamlines your creative process, offering a tangible roadmap for your AI visual generation. How do you use XYZ plot in Stable Diffusion? The XYZ plot, found…
IMG2IMG (Image to Image)
The Img2Img tab in Automatic1111 is a feature that allows you to generate new images from an input image and a text prompt. The output image preserves the color and composition of the input image, but modifies it according to the text prompt. For example, you can use the Img2Img tab to turn a simple sketch into a realistic painting, or to add details to an existing image. The Img2Img tab uses a technique called Stable Diffusion, which is a state-of-the-art method for image synthesis.
You can learn more about img2img in the guide below. It is pretty deep and deserves a completely different guide for it. 👇
Inpainting 101: How to Inpaint Anything in Stable Diffusion using Automatic1111
Learn about Stable Diffusion Inpainting in Automatic1111! Explore the unique features, tools, and techniques for flawless image editing and content replacement.
PNG Info
PNG info is a feature in Automatic1111 that allows you to view and edit the text information stored in PNG files. This information can include the prompt, the negative prompt, and other generation parameters used to create the image with Stable Diffusion. You can also send the image and the parameters to other tabs in Automatic1111, such as img2img, inpainting, or upscaling, to modify or enhance the image. PNG info can help you understand how an image was generated, recreate it, or use it as a starting point for your own creations.
Extras Tab for Upscaling Your Images
The Extras tab in Automatic1111 is a section with features that allows you to enhance and customize your images using various options. You can use this tab to upscale your images by choosing the scale factor or the target size.You can also use CodeFormer, a robust face restoration algorithm, to improve the quality and fidelity of faces in your images.Additionally, you can split oversized images into smaller pieces, crop them automatically based on focal points or size, create flipped copies for data augmentation, and generate captions using Deepbooru, a deep learning-based tag estimation system.
The Extras tab is useful for creating high-resolution, realistic, and diverse images for your projects. I will explore this further in the future, but for now, check out my guide on upscaling using the features in Extras.
ControlNet Upscale: Learn How to Tile and Upscale with Ultimate SD Upscale
Learn how to upscale Stable Diffusion art with ControlNet Upscale. Ever tried to transform your favorite small image into a massive printable billboard, only to be met with unnatural smoothness, loss of details, or quirky artifacts?
This means that you will ignore some of the details of the cat, such as its fur pattern or eye color, and focus more on its general features, such as its shape or size. This might give you a more abstract or creative image of a cat, but it might also be less realistic or accurate.
Checkpoint Merger is a complex but intriguing aspect of this toolset, requiring its own dedicated tutorial for complete understanding. In essence, this feature allows you to amalgamate different fine-tuned models, generating novel ones. It provides the ability to create Checkpoints and Safesensors, and it even supports the integration of VAEs.
- Be sure to revisit for a comprehensive guide on how to effectively utilize the Checkpoint Merger in the near future.
Learn to Train Your Own AI Art Models
The Train tab, a gateway to personalized creative AI work suit, enables you to train your own model. This complex area, equipped with tools like Hypernetwork, Textural Inversion, and Dreambooth, demands its own focused tutorial, which we’ll be providing in due time.
Training your model paves the way for a personalized approach to AI art, replacing the hit-and-miss method with precision. By using machine learning to customize your style and coupling it with other tools like Runway ML Gen 2 and ControlNet, you can unlock a plethora of creation possibilities, ranging from comics to full-blown films. The beauty of Stable Diffusion is directly proportional to your dedication.
Despite the brilliant video generation capabilities of tools like RunwayML, Stable Diffusion Automatic1111, equipped with extensions and free of charge, brings similar functionality right to your fingertips. With the right hardware, you can let your creativity run wild.
For a guide on training your own diffusion models, check out this link.
- Related – ‘A Guide to Training Your Personalized Stable Diffusion Models’ Coming Soon’
Stable Diffusion: Captioning for Training Data Sets
This captions and data sets guide is intended for those who seek to deepen their knowledge of Captioning for Training Data Sets in Stable Diffusion. It will assist you in preparing and structuring your captions for training datasets.
Extensions – Uncovered Features in Automatic1111
TheExtension tabinAUTOMATIC1111’s Stable Diffusion Web UIis where you can manage and install extensions created by the community. These extensions enhance the functionality of the web interface and provide additional features beyond the default capabilities. You can learn how to install extensions below:
How to Install Automatic1111 Extensions for Stable Diffusion
Enhance your Stable Diffusion experience with Automatic1111 Extensions. Follow our comprehensive guide to install Automatic1111 extensions.
Github Automatic111 v1.8.0 Updates information
How to Install SDXL 1.0 for Automatic1111: A Step-by-Step Guide
Welcome to this step-by-step guide on How to install SDXL 1.0 for Automatic1111. This blog post aims to streamline the installation process for you, so you can quickly utilize the power of this cutting-edge image generation model released by Stability AI.
Official Automatic1111 Web UI guide found here.
Learning Stable Diffusion 101:
- How to Install Stable Diffusion Web UI
- Mastering the Automatic1111 User Interface: A Comprehensive Guide for Beginners
- Words of Power: The Essential Cheat Sheet for Super Prompts Techniques
- Maximize Your Workflow Efficiency: Pre-Planning with Stable Diffusion’s XYZ Plot