Evaluating Generative AI Models
SL and HL. Booklet section: Evaluating generative AI models. Syllabus links: A4 and A1 (computational resources). This is the comparison framework that turns “which model” into a structured judgement.
Visionary Studios cannot choose a model on image quality alone. It has to weigh several factors at once, because a model that produces beautiful images but cannot run on the studio’s hardware, or cannot keep a character consistent across a campaign, is not fit for the job. The booklet names five factors, and they double as a ready-made checklist for the extended-response question.
Table of Contents
- 1. Output quality
- 2. Computational efficiency
- 3. Training stability
- 4. Flexibility and scalability
- 5. Consistency
- The five factors at a glance
- HL Section: comparing the model families
- Quick check
- Practice exercises
- Connections
1. Output quality
The most visible factor. Output quality asks whether the images are high-resolution, photorealistic where that is wanted, stylistically coherent, and on-brief. For advertising and concept art this is non-negotiable, but it is only the starting point, not the whole decision.
2. Computational efficiency
Computational efficiency asks whether the studio’s existing infrastructure can train and run the model without delays or hitting resource limits. This is where the case study connects to Theme A1. Diffusion models, for example, are expensive because generation is iterative, so producing many images for a campaign takes real time and money. Generative models are usually run on hardware built for parallel work, such as GPUs, and the choice of hardware affects both cost and speed. A model that is wonderful in a demo but too slow or too costly at production scale fails this factor.
3. Training stability
Training stability asks how reliably a model trains. Some models give repetitive or suboptimal output when training goes wrong; others train reliably but more slowly and at greater cost. Diffusion models are generally stable; GANs are notoriously sensitive. A studio that cannot afford a long, uncertain training process will weight this factor heavily.
4. Flexibility and scalability
Flexibility and scalability asks whether the model can adapt across different jobs (advertising campaigns, concept art, branded assets) and scale up to meet future demand. A model locked to one narrow task is less valuable to a studio that needs to reuse it across very different briefs.
5. Consistency
Consistency is critical for brand identity, and it has two parts.
- Character consistency is keeping a recurring character or motif looking the same across many images. Generative models naturally vary their output run to run, so this is genuinely hard. Embedding-based approaches help: the character’s identity is captured as a learned vector that every image can be conditioned on.
- Style adherence is keeping to a defined artistic style or theme. Conditional and text-to-image models have to match the style described in the prompt, image after image, so the campaign looks like one coherent body of work.
The five factors at a glance
| Factor | The question Visionary Studios asks |
|---|---|
| Output quality | Are the images high-resolution, photorealistic, and on-brief? |
| Computational efficiency | Can our infrastructure train and run this without delays or resource limits? |
| Training stability | Does it train reliably, or give repetitive or suboptimal output? |
| Flexibility and scalability | Can it adapt across jobs and scale to future demand? |
| Consistency | Can it keep a character and a style consistent across many images? |
These five are the backbone of a strong extended response. When a question asks you to choose or evaluate a model, walk the factors, say how the options compare on each, and let that drive your conclusion.
flowchart TD
OQ["Output quality"] --> DEC{"Which model<br/>for this job?"}
CE["Computational efficiency"] --> DEC
TS["Training stability"] --> DEC
FS["Flexibility and scalability"] --> DEC
CO["Consistency"] --> DEC
The five factors are weighed together to choose a model. No single factor decides it on its own. (Original diagram.)
HL Section: comparing the model families
HL Only. This comparison involves GANs and hybrid models, which are HL-only. SL students can skip to the quick check.
For HL, the five factors give you a way to compare the three model families directly. Keep every claim grounded in what the booklet actually says, and avoid inventing precise numbers.
| Factor | Diffusion | GANs | Hybrid |
|---|---|---|---|
| Output quality | Photorealistic, high quality | Very sharp | Can be highest, by combining strengths |
| Training stability | Generally stable | Unstable, mode-collapse risk | More components to keep stable |
| Computational cost | High (iterative generation) | Sensitive, hard-to-tune training | Highest, most complex to build |
| Flexibility | Good, supports several modes | Good | Highest, fine control over attributes |
| Consistency | Good with conditioning and embeddings | Variable | Best control over style and character |
The pattern to take into the exam: diffusion is the stable, high-quality default; a GAN trades stability for sharpness; a hybrid trades simplicity for control. Which wins depends on the specific job.
Quick check
Q1. Which of these is not one of the five evaluation factors in the case study?
Q2. Visionary Studios needs its mascot to look identical across ten adverts. Which factor is this, and what helps?
Q3. A model produces stunning images but is far too slow and costly to run at the scale the studio needs. Which factor does it fail?
Q4. (HL) On the training-stability factor, how do diffusion models and GANs typically compare?
Practice exercises
Mark allocations and command terms match the case study’s exam style.
Core
- Identify (2 marks) - Identify two factors Visionary Studios uses to evaluate generative AI models.
- Outline (2 marks) - Outline what “computational efficiency” means for the studio.
Extension
- Describe (4 marks) - Describe why consistency is important for a brand, covering both character consistency and style adherence.
- Explain (4 marks) - Explain why output quality on its own is not enough to choose a model. Write in prose, with no diagram.
Challenge
- Justify (6 marks) - Visionary Studios must pick one model for a large, ongoing campaign. Justify which two of the five factors should matter most, and why, for this particular job.
Connections
- Previous: Diffusion models - the model these factors are most often applied to.
- Next: Ethics and law - the considerations that sit alongside the five factors.
- Related (HL): Hybrid models - where the consistency and flexibility factors are strongest.