Generative Adversarial Networks (GANs)
HL Only. This entire page covers GANs, which are HL-only content in the case study. SL students are not expected to know GANs and can skip this page.
HL. Booklet section: Generative adversarial networks (GANs). Syllabus link: A4 (machine learning). One of the two HL-only model families, alongside hybrid models.
A generative adversarial network (GAN) generates images in a completely different way from a diffusion model. Instead of denoising, a GAN sets two neural networks against each other and lets them improve through competition. Understanding that competition, and why it is hard to control, is the heart of HL challenge 3.
Table of Contents
- Two networks: generator and discriminator
- The adversarial dynamic
- The D-dimensional noise vector
- Why GANs are hard to train
- Strengths, and when a GAN makes sense
- Quick check
- Practice exercises
- Connections
Two networks: generator and discriminator
A GAN has two parts with opposite jobs.
- The generator creates synthetic images. It takes a random input and transforms it into a candidate image, trying to make something realistic enough to pass as genuine.
- The discriminator judges images. It is shown a mix of real images (from the training dataset) and fake images (from the generator), and it has to decide which is which.
The two are trained together. The generator is rewarded when it fools the discriminator; the discriminator is rewarded when it catches a fake. Each one’s success pushes the other to improve.
The adversarial dynamic
This competitive push-and-pull is the adversarial dynamic, and it is what drives a GAN to produce sharp, high-quality images. As the discriminator gets better at spotting fakes, the generator is forced to make more convincing images; as the generator improves, the discriminator has to get sharper still. In effect the generator is always trying to deceive a judge that keeps getting harder to fool.
flowchart LR
NV["D-dimensional noise vector"] --> G["Generator"]
G --> F["Fake images"]
R["Real images (training data)"] --> Dsc["Discriminator"]
F --> Dsc
Dsc --> P["Predicted labels: real or fake"]
P -. feedback .-> G
P -. feedback .-> Dsc
The GAN training loop. The generator turns a noise vector into fake images; the discriminator compares them with real images and predicts real or fake; both networks learn from the result. (Original diagram, drawn for this page.)
The D-dimensional noise vector
The generator does not start from a blank input. It starts from a D-dimensional noise vector, a random vector with D values. This vector is the seed: the generator learns to map it to an image. Because the vector is random, sampling a new one gives a different image, which is how a GAN produces variety. The size D sets how much room the generator has to represent different outputs. The mapping from vector to image is learned, not human-readable, so you cannot simply read off what a given vector “means.”
Why GANs are hard to train
The adversarial dynamic is powerful but fragile. The two networks have to stay balanced, and that balance is easy to lose. Two failure modes matter for the exam.
- Instability. GAN training is highly sensitive. If the discriminator becomes too strong, it rejects everything and the generator gets no useful signal to learn from; if it is too weak, the generator is never pushed to improve. Keeping the two in step is the central difficulty.
- Mode collapse. The generator can take a shortcut: instead of learning the full variety of the training data, it produces only a few outputs that reliably fool the discriminator. This is mode collapse, and it means the model loses diversity, generating near-identical images rather than the full range the data contains.
Mitigation techniques exist, but the need to manage instability and mode collapse is exactly why a studio might think twice before choosing a GAN.
flowchart LR
NV["Varied noise vectors"] --> G["Generator"]
G -->|"healthy training"| H["Varied, diverse images"]
G -->|"mode collapse"| M["A few near-identical images"]
With healthy training, varied noise vectors produce varied images. Under mode collapse the generator outputs only a few near-identical images, losing the dataset’s diversity. (Original diagram.)
Strengths, and when a GAN makes sense
The reward for that difficulty is sharpness: GANs can produce very crisp, high-quality images, which is why they have been widely used for image synthesis, face generation, and style transfer. For Visionary Studios, a GAN could be the right tool where very sharp output matters and the team can absorb the harder, less predictable training. Where stable, reliable generation matters more, a diffusion model is usually the safer choice, and a hybrid model may combine the strengths of both.
Quick check
Q1. What are the two networks that make up a GAN?
Q2. What does the discriminator do?
Q3. A GAN keeps generating almost identical images instead of varied ones. What has most likely happened?
Q4. What does a GAN's generator take as its starting input?
Q5. Why is GAN training described as unstable?
Practice exercises
HL practice. Mark allocations and command terms match the case study’s exam style.
Core
- Outline (2 marks) - Outline the roles of the generator and the discriminator in a GAN.
- Describe (3 marks) - Describe the adversarial dynamic that drives a GAN to improve.
Extension
- Outline (4 marks) - Outline two reasons a GAN can be difficult to train.
- Explain (4 marks) - Explain what mode collapse is and why it reduces the usefulness of a GAN for Visionary Studios. Write in prose, with no diagram.
Challenge
- Evaluate (8 marks) - Evaluate whether Visionary Studios should use a GAN rather than a diffusion model to generate a set of varied, high-quality images for a campaign. Weigh image sharpness against training instability and the risk of mode collapse, and reach a conclusion.
Connections
- Previous: Diffusion models - the stable, iterative alternative every student must know.
- Next (HL): Hybrid models - combining GANs with other approaches.
- Related: Evaluating generative AI models - where training stability is compared.