Hybrid Models

HL Only. This entire page covers hybrid models, which are HL-only content in the case study. SL students are not expected to know them and can skip this page.

HL. Booklet section: Hybrid models. Syllabus link: A4 (machine learning). For VAEs and flow-based models a broad understanding is enough, even at HL: know what they are and what they are good for, not their internal mathematics.

No single generative model is best at everything. Hybrid models combine more than one approach to get the benefits of each and cover each other’s weaknesses. HL challenge 4 asks you to weigh how a hybrid model balances the strengths and weaknesses of VAEs, GANs, flow-based models, and diffusion models, so the goal here is to understand why you would combine approaches, not to memorise every detail.

Why combine approaches
Variational autoencoders (VAEs)
1. Latent space, briefly
Flow-based models
Putting the pieces together
The cost of combining
Why this matters to Visionary Studios
Quick check
Practice exercises
1. Core
2. Extension
3. Challenge
Connections

Why combine approaches

Each model family has a characteristic trade-off. Diffusion models give high quality and stable training but are slow and compute-heavy. GANs give sharp images but train unstably and risk mode collapse. A hybrid model combines components so that the strengths of one offset the weaknesses of another. The booklet gives three reasons to do this: higher quality, better control over output attributes (such as style or a recurring character), and more flexibility.

The two component types you should be able to name are variational autoencoders and flow-based models.

Variational autoencoders (VAEs)

A variational autoencoder (VAE) represents images in a latent space: a compressed, organised version of the data in which similar images sit close together. The VAE learns to encode an image into this space and decode a point in the space back into an image. Because the space is organised, you can generate or edit images by moving through it, blending smoothly between styles rather than jumping between unrelated outputs.

VAEs are valued for flexibility, experimentation, and artistic exploration. Their outputs can be blurrier than those from diffusion models or GANs, which is one reason they are often used as a component in a hybrid system rather than on their own. (Broad understanding is enough here.)

Latent space, briefly

Latent space is worth understanding on its own because it explains the flexibility VAEs bring. It is a lower-dimensional representation where meaningful directions and neighbourhoods exist: nearby points decode to similar images. Moving smoothly through latent space lets a studio interpolate between two concept styles, or nudge an image toward a desired attribute, in a controllable way.

Flow-based models

A flow-based model learns an exact, reversible mapping between random noise and realistic images. Because the mapping is invertible, the model can run in both directions without loss: it can generate an image from noise, and it can also trace exactly how a given image was formed. That makes flow-based models valued for precision, reproducibility, and transparency. They tend to be heavier and less common than diffusion models or GANs, which again is why they often appear as part of a hybrid rather than alone. (Broad understanding is enough here.)

Putting the pieces together

Combining VAEs, flow-based models, GANs, and diffusion models can produce generation that is more reliable, more consistent, and higher quality than any one approach alone. For example, a system might use a VAE’s organised latent space for controllable editing while relying on a diffusion model for the final, photorealistic output. The point for the exam is the reasoning: a hybrid is an attempt to buy the strengths of several models at once.

flowchart LR
    VAE["VAE"] --> H["Hybrid model"]
    FB["Flow-based model"] --> H
    DIFF["Diffusion model"] --> H
    GAN["GAN"] --> H
    H --> OUT["Higher quality, more control,<br/>more consistency"]

A hybrid combines components so each one’s strengths offset another’s weaknesses, at the cost of added complexity. (Original diagram.)

The cost of combining

Hybrids are not free wins. Putting several models together makes a system more complex to design, train, and reason about, and that complexity usually means more engineering effort and more computation. A more complex architecture can also be harder to debug and to keep stable. So challenge 4 is a genuine trade-off: a hybrid can give better control and consistency, but only if the extra complexity and cost are worth it for the job at hand.

Exam framing. When you evaluate a hybrid model, do not just say “it combines the best of both.” Name which strengths it borrows and which weaknesses it adds (complexity, cost, harder training), then judge whether the trade is worth it for Visionary Studios’ specific need.

Why this matters to Visionary Studios

If Visionary Studios needs both photorealistic quality and fine control over a consistent brand style across a campaign, no single model may deliver both well. A hybrid model is the option to consider, accepting more complexity in exchange for control and consistency. Whether that trade is worthwhile is exactly the kind of justified judgement the 12-mark response rewards.

Quick check

Q1. What is the main reason to use a hybrid model?

Q2. A variational autoencoder (VAE) represents images in a:

Q3. What is distinctive about a flow-based model?

Q4. What is the main drawback of a hybrid model?

Practice exercises

HL practice. For VAEs and flow-based models, a broad understanding is enough; do not get pulled into their mathematics.

Core

Outline (2 marks) - Outline why a studio might use a hybrid model instead of a single model.
State (2 marks) - State one strength of a VAE and one strength of a flow-based model.

Extension

Describe (3 marks) - Describe what a latent space is and how it supports controllable image generation.
Explain (4 marks) - Explain one benefit and one cost of combining several models into a hybrid. Write in prose, with no diagram.

Challenge

Evaluate (10 marks) - Evaluate how a hybrid model could balance the strengths and weaknesses of VAEs, GANs, flow-based models, and diffusion models for Visionary Studios. Reach a conclusion on when the added complexity is justified. (This is HL challenge 4.)

Connections

Previous (HL): GANs - one of the components a hybrid can combine.
Related: Diffusion models - the stable, high-quality component most hybrids build on.
Next: Evaluating generative AI models - the factors used to judge any of these options.

Hybrid Models

Table of Contents

Why combine approaches

Variational autoencoders (VAEs)

Latent space, briefly

Flow-based models

Putting the pieces together

The cost of combining

Why this matters to Visionary Studios

Quick check

Practice exercises

Core

Extension

Challenge

Connections