Evaluating Generative AI Models

SL and HL. Booklet section: Evaluating generative AI models. Syllabus links: A4 and A1 (computational resources). This is the comparison framework that turns “which model” into a structured judgement.

Visionary Studios cannot choose a model on image quality alone. It has to weigh several factors at once, because a model that produces beautiful images but cannot run on the studio’s hardware, or cannot keep a character consistent across a campaign, is not fit for the job. The booklet names five factors, and they double as a ready-made checklist for the extended-response question.

Table of Contents

  1. 1. Output quality
  2. 2. Computational efficiency
  3. 3. Training stability
  4. 4. Flexibility and scalability
  5. 5. Consistency
  6. The five factors at a glance
  7. HL Section: comparing the model families
  8. Quick check
  9. Practice exercises
    1. Core
    2. Extension
    3. Challenge
  10. Connections

1. Output quality

The most visible factor. Output quality asks whether the images are high-resolution, photorealistic where that is wanted, stylistically coherent, and on-brief. For advertising and concept art this is non-negotiable, but it is only the starting point, not the whole decision.

2. Computational efficiency

Computational efficiency asks whether the studio’s existing infrastructure can train and run the model without delays or hitting resource limits. This is where the case study connects to Theme A1. Diffusion models, for example, are expensive because generation is iterative, so producing many images for a campaign takes real time and money. Generative models are usually run on hardware built for parallel work, such as GPUs, and the choice of hardware affects both cost and speed. A model that is wonderful in a demo but too slow or too costly at production scale fails this factor.

3. Training stability

Training stability asks how reliably a model trains. Some models give repetitive or suboptimal output when training goes wrong; others train reliably but more slowly and at greater cost. Diffusion models are generally stable; GANs are notoriously sensitive. A studio that cannot afford a long, uncertain training process will weight this factor heavily.

4. Flexibility and scalability

Flexibility and scalability asks whether the model can adapt across different jobs (advertising campaigns, concept art, branded assets) and scale up to meet future demand. A model locked to one narrow task is less valuable to a studio that needs to reuse it across very different briefs.

5. Consistency

Consistency is critical for brand identity, and it has two parts.

  • Character consistency is keeping a recurring character or motif looking the same across many images. Generative models naturally vary their output run to run, so this is genuinely hard. Embedding-based approaches help: the character’s identity is captured as a learned vector that every image can be conditioned on.
  • Style adherence is keeping to a defined artistic style or theme. Conditional and text-to-image models have to match the style described in the prompt, image after image, so the campaign looks like one coherent body of work.

The five factors at a glance

Factor The question Visionary Studios asks
Output quality Are the images high-resolution, photorealistic, and on-brief?
Computational efficiency Can our infrastructure train and run this without delays or resource limits?
Training stability Does it train reliably, or give repetitive or suboptimal output?
Flexibility and scalability Can it adapt across jobs and scale to future demand?
Consistency Can it keep a character and a style consistent across many images?

These five are the backbone of a strong extended response. When a question asks you to choose or evaluate a model, walk the factors, say how the options compare on each, and let that drive your conclusion.

flowchart TD
    OQ["Output quality"] --> DEC{"Which model<br/>for this job?"}
    CE["Computational efficiency"] --> DEC
    TS["Training stability"] --> DEC
    FS["Flexibility and scalability"] --> DEC
    CO["Consistency"] --> DEC

The five factors are weighed together to choose a model. No single factor decides it on its own. (Original diagram.)

HL Section: comparing the model families

HL Only. This comparison involves GANs and hybrid models, which are HL-only. SL students can skip to the quick check.

For HL, the five factors give you a way to compare the three model families directly. Keep every claim grounded in what the booklet actually says, and avoid inventing precise numbers.

Factor Diffusion GANs Hybrid
Output quality Photorealistic, high quality Very sharp Can be highest, by combining strengths
Training stability Generally stable Unstable, mode-collapse risk More components to keep stable
Computational cost High (iterative generation) Sensitive, hard-to-tune training Highest, most complex to build
Flexibility Good, supports several modes Good Highest, fine control over attributes
Consistency Good with conditioning and embeddings Variable Best control over style and character

The pattern to take into the exam: diffusion is the stable, high-quality default; a GAN trades stability for sharpness; a hybrid trades simplicity for control. Which wins depends on the specific job.


Quick check

Q1. Which of these is not one of the five evaluation factors in the case study?

Q2. Visionary Studios needs its mascot to look identical across ten adverts. Which factor is this, and what helps?

Q3. A model produces stunning images but is far too slow and costly to run at the scale the studio needs. Which factor does it fail?

Q4. (HL) On the training-stability factor, how do diffusion models and GANs typically compare?


Practice exercises

Mark allocations and command terms match the case study’s exam style.

Core

  1. Identify (2 marks) - Identify two factors Visionary Studios uses to evaluate generative AI models.
  2. Outline (2 marks) - Outline what “computational efficiency” means for the studio.

Extension

  1. Describe (4 marks) - Describe why consistency is important for a brand, covering both character consistency and style adherence.
  2. Explain (4 marks) - Explain why output quality on its own is not enough to choose a model. Write in prose, with no diagram.

Challenge

  1. Justify (6 marks) - Visionary Studios must pick one model for a large, ongoing campaign. Justify which two of the five factors should matter most, and why, for this particular job.

Connections

  • Previous: Diffusion models - the model these factors are most often applied to.
  • Next: Ethics and law - the considerations that sit alongside the five factors.
  • Related (HL): Hybrid models - where the consistency and flexibility factors are strongest.

© EduCS.me — A resource hub for Computer Science education

This site uses Just the Docs, a documentation theme for Jekyll.