Stable Diffusion

Table of Contents

References & Edit History Related Topics

Images

Stable Diffusion

Q: What is Stable Diffusion?

Stable Diffusion is an open-source generative artificial intelligence (AI) diffusion model that generates images, video, and animations from text prompts.

Q: Who developed Stable Diffusion?

Stable Diffusion was developed by researchers at the Ludwig Maximilian University of Munich and was managed by the British company Stability AI before its public release in August 2022.

Q: How does Stable Diffusion differ from other diffusion models?

Stable Diffusion uses a latent diffusion model, compressing the image in the latent space with a variational autoencoder (VAE), making it faster and more energy efficient than standard diffusion models.

Q: What are the limitations of Stable Diffusion?

Stable Diffusion and other AI image generators struggle to portray smaller human features such as hands and fingers because of a lack of clear training data for those features.

AI diffusion model

Written by Meg Matthias

Fact-checked by The Editors of Encyclopaedia Britannica

Last Updated: Jun 23, 2025 • Article History

Related Topics:: open source; generative AI

See all related content

What is Stable Diffusion?

Who developed Stable Diffusion?

How does Stable Diffusion differ from other diffusion models?

What are the limitations of Stable Diffusion?

Is Stable Diffusion available for commercial use?

Stable Diffusion, open-source generative artificial intelligence (AI) diffusion model that generates images, video, and animations from users’ text prompts. Developed by researchers at the Ludwig Maximilian University of Munich, Stable Diffusion was managed by the British company Stability AI before its public release in August 2022.

Deep learning

Deep-learning models consist of neural networks that have four or more layers and that can discover features in data without initial prompting. (A neural network is a machine-learning system inspired by the human brain that emulates the brain’s pattern-recognition skills.) Diffusion models, which are a type of deep-learning model, are designed to generate new data based on training data, which typically consists of image-word pairs. They are named for their resemblance to the concept of diffusion in physics, a process in which random molecular movement causes a net flow of matter from a region of high concentration to a region of low concentration. Diffusion models, however, are trained to apply diffusion in reverse. Models add “noise,” or random values (which appear as static in the image), to make an original data set unrecognizable. The model must then “reverse” the noise in order to re-obtain the original data. This helps the model to learn gradually and to generate high-quality data over time.

Stable Diffusion differs from many diffusion models in its speed. If a program solely uses the diffusion process to generate images, it must generate a picture using the entirety of the image space. For an image at 512 x 512 resolution and three colors (RGB) for each pixel, that means more than 780,000 dimensions. Stable Diffusion instead uses a latent diffusion model. The AI compresses the image in the latent space, which is a space that captures solely essential features, using a variational autoencoder (VAE). The latent space consists of one-fiftieth the dimensions of a standard image space, leading the program to use much less time than a standard diffusion model. After the image is compressed, latent noise is added to the compressed image. The noise is then removed, as with other diffusion models, and the image is restored to full quality in the final result.

When prompted by a user to generate an image, video, or animation from a text prompt, Stable Diffusion executes the following process:

Stable Diffusion translates the user’s text prompt into a text representation; that is, the words in the prompt are expressed as groups of numbers.
The software then translates the text representation into an image representation, or vector, that correlates with the text prompt. In a process consisting of 50 to 100 steps, randomly generated noise is removed from a latent image space in order to reveal an image that aligns with the image representation.
Finally, Stable Diffusion uses the VAE’s decoder to refine its result to produce a high-resolution image in the pixel space. The resulting image, video, or animation is then revealed to the user.

Limits and availability

Stable Diffusion was the second major artificial intelligence text-to-image generator released to the public, following OpenAI’s DALL-E 2, which was made widely available in July 2022. Stability AI launched Stable Diffusion that August. DALL-E 2, Stable Diffusion, and the image generator Midjourney (named after the company that created it) all struggled to portray such small human features as hands, fingers, teeth, and earlobes. More prominent features, such as faces or body shapes, were generated more competently. This was generally attributed to a lack of training data consisting of clear images of hands. Research scientist Patrick Esser, who worked on Stable Diffusion’s core model, told the AI lab Runway that generative AI is capable of creating “really high quality outputs” but that they would not be “100 percent consistent.”

As an open-source model, Stable Diffusion is free for research, noncommercial, and limited commercial uses by individuals or corporations with less than $1 million in annual revenue. After the October 2024 release of Stable Diffusion version 3.5, Stability AI encouraged individuals and businesses to distribute and monetize work created by Stable Diffusion. Commercial entities making more than $1 million annually can access Stable Diffusion through paid subscriptions.

Meg Matthias