Stable Diffusion
- Related Topics:
- open source
- generative AI
What is Stable Diffusion?
Who developed Stable Diffusion?
How does Stable Diffusion differ from other diffusion models?
What are the limitations of Stable Diffusion?
Is Stable Diffusion available for commercial use?
Stable Diffusion, open-source generative artificial intelligence (AI) diffusion model that generates images, video, and animations from users’ text prompts. Developed by researchers at the Ludwig Maximilian University of Munich, Stable Diffusion was managed by the British company Stability AI before its public release in August 2022.
Deep learning
Deep-learning models consist of neural networks that have four or more layers and that can discover features in data without initial prompting. (A neural network is a machine-learning system inspired by the human brain that emulates the brain’s pattern-recognition skills.) Diffusion models, which are a type of deep-learning model, are designed to generate new data based on training data, which typically consists of image-word pairs. They are named for their resemblance to the concept of diffusion in physics, a process in which random molecular movement causes a net flow of matter from a region of high concentration to a region of low concentration. Diffusion models, however, are trained to apply diffusion in reverse. Models add “noise,” or random values (which appear as static in the image), to make an original data set unrecognizable. The model must then “reverse” the noise in order to re-obtain the original data. This helps the model to learn gradually and to generate high-quality data over time.
Stable Diffusion differs from many diffusion models in its speed. If a program solely uses the diffusion process to generate images, it must generate a picture using the entirety of the image space. For an image at 512 x 512 resolution and three colors (RGB) for each pixel, that means more than 780,000 dimensions. Stable Diffusion instead uses a latent diffusion model. The AI compresses the image in the latent space, which is a space that captures solely essential features, using a variational autoencoder (VAE). The latent space consists of one-fiftieth the dimensions of a standard image space, leading the program to use much less time than a standard diffusion model. After the image is compressed, latent noise is added to the compressed image. The noise is then removed, as with other diffusion models, and the image is restored to full quality in the final result.
When prompted by a user to generate an image, video, or animation from a text prompt, Stable Diffusion executes the following process:
- Stable Diffusion translates the user’s text prompt into a text representation; that is, the words in the prompt are expressed as groups of numbers.
- The software then translates the text representation into an image representation, or vector, that correlates with the text prompt. In a process consisting of 50 to 100 steps, randomly generated noise is removed from a latent image space in order to reveal an image that aligns with the image representation.
- Finally, Stable Diffusion uses the VAE’s decoder to refine its result to produce a high-resolution image in the pixel space. The resulting image, video, or animation is then revealed to the user.
Limits and availability
Stable Diffusion was the second major artificial intelligence text-to-image generator released to the public, following OpenAI’s DALL-E 2, which was made widely available in July 2022. Stability AI launched Stable Diffusion that August. DALL-E 2, Stable Diffusion, and the image generator Midjourney (named after the company that created it) all struggled to portray such small human features as hands, fingers, teeth, and earlobes. More prominent features, such as faces or body shapes, were generated more competently. This was generally attributed to a lack of training data consisting of clear images of hands. Research scientist Patrick Esser, who worked on Stable Diffusion’s core model, told the AI lab Runway that generative AI is capable of creating “really high quality outputs” but that they would not be “100 percent consistent.”
As an open-source model, Stable Diffusion is free for research, noncommercial, and limited commercial uses by individuals or corporations with less than $1 million in annual revenue. After the October 2024 release of Stable Diffusion version 3.5, Stability AI encouraged individuals and businesses to distribute and monetize work created by Stable Diffusion. Commercial entities making more than $1 million annually can access Stable Diffusion through paid subscriptions.