Evaluating Generative AI Models

SL and HL. Booklet section: Evaluating generative AI models. Syllabus links: A4 and A1 (computational resources). This is the comparison framework that turns “which model” into a structured judgement.

Visionary Studios cannot choose a model on image quality alone. It has to weigh several factors at once, because a model that produces beautiful images but cannot run on the studio’s hardware, or cannot keep a character consistent across a campaign, is not fit for the job. The booklet names five factors, and they double as a ready-made checklist for the extended-response question.

1. Output quality
2. Computational efficiency
3. Training stability
4. Flexibility and scalability
5. Consistency
The five factors at a glance
HL Section: comparing the model families
Quick check
Practice exercises
1. Core
2. Extension
3. Challenge
Connections

1. Output quality

The most visible factor. Output quality asks whether the images are high-resolution, photorealistic where that is wanted, stylistically coherent, and on-brief. For advertising and concept art this is non-negotiable, but it is only the starting point, not the whole decision.

2. Computational efficiency

Computational efficiency asks whether the studio’s existing infrastructure can train and run the model without delays or hitting resource limits. This is where the case study connects to Theme A1. Diffusion models, for example, are expensive because generation is iterative, so producing many images for a campaign takes real time and money. Generative models are usually run on hardware built for parallel work, such as GPUs, and the choice of hardware affects both cost and speed. A model that is wonderful in a demo but too slow or too costly at production scale fails this factor.

3. Training stability

Training stability asks how reliably a model trains. Some models give repetitive or suboptimal output when training goes wrong; others train reliably but more slowly and at greater cost. Diffusion models are generally stable; GANs are notoriously sensitive. A studio that cannot afford a long, uncertain training process will weight this factor heavily.

4. Flexibility and scalability

Flexibility and scalability asks whether the model can adapt across different jobs (advertising campaigns, concept art, branded assets) and scale up to meet future demand. A model locked to one narrow task is less valuable to a studio that needs to reuse it across very different briefs.

5. Consistency

Consistency is critical for brand identity, and it has two parts.

Character consistency is keeping a recurring character or motif looking the same across many images. Generative models naturally vary their output run to run, so this is genuinely hard. Embedding-based approaches help: the character’s identity is captured as a learned vector that every image can be conditioned on.
Style adherence is keeping to a defined artistic style or theme. Conditional and text-to-image models have to match the style described in the prompt, image after image, so the campaign looks like one coherent body of work.

The five factors at a glance

Factor	The question Visionary Studios asks
Output quality	Are the images high-resolution, photorealistic, and on-brief?
Computational efficiency	Can our infrastructure train and run this without delays or resource limits?
Training stability	Does it train reliably, or give repetitive or suboptimal output?
Flexibility and scalability	Can it adapt across jobs and scale to future demand?
Consistency	Can it keep a character and a style consistent across many images?

These five are the backbone of a strong extended response. When a question asks you to choose or evaluate a model, walk the factors, say how the options compare on each, and let that drive your conclusion.

flowchart TD
    OQ["Output quality"] --> DEC{"Which model<br/>for this job?"}
    CE["Computational efficiency"] --> DEC
    TS["Training stability"] --> DEC
    FS["Flexibility and scalability"] --> DEC
    CO["Consistency"] --> DEC

The five factors are weighed together to choose a model. No single factor decides it on its own. (Original diagram.)

HL Section: comparing the model families

HL Only. This comparison involves GANs and hybrid models, which are HL-only. SL students can skip to the quick check.

For HL, the five factors give you a way to compare the three model families directly. Keep every claim grounded in what the booklet actually says, and avoid inventing precise numbers.

Factor	Diffusion	GANs	Hybrid
Output quality	Photorealistic, high quality	Very sharp	Can be highest, by combining strengths
Training stability	Generally stable	Unstable, mode-collapse risk	More components to keep stable
Computational cost	High (iterative generation)	Sensitive, hard-to-tune training	Highest, most complex to build
Flexibility	Good, supports several modes	Good	Highest, fine control over attributes
Consistency	Good with conditioning and embeddings	Variable	Best control over style and character

The pattern to take into the exam: diffusion is the stable, high-quality default; a GAN trades stability for sharpness; a hybrid trades simplicity for control. Which wins depends on the specific job.

Quick check

Q1. Which of these is not one of the five evaluation factors in the case study?

Q2. Visionary Studios needs its mascot to look identical across ten adverts. Which factor is this, and what helps?

Q3. A model produces stunning images but is far too slow and costly to run at the scale the studio needs. Which factor does it fail?

Q4. (HL) On the training-stability factor, how do diffusion models and GANs typically compare?

Practice exercises

Mark allocations and command terms match the case study’s exam style.

Core

Identify (2 marks) - Identify two factors Visionary Studios uses to evaluate generative AI models.
Outline (2 marks) - Outline what “computational efficiency” means for the studio.

Extension

Describe (4 marks) - Describe why consistency is important for a brand, covering both character consistency and style adherence.
Explain (4 marks) - Explain why output quality on its own is not enough to choose a model. Write in prose, with no diagram.

Challenge

Justify (6 marks) - Visionary Studios must pick one model for a large, ongoing campaign. Justify which two of the five factors should matter most, and why, for this particular job.

Connections

Previous: Diffusion models - the model these factors are most often applied to.
Next: Ethics and law - the considerations that sit alongside the five factors.
Related (HL): Hybrid models - where the consistency and flexibility factors are strongest.