Challenges and Exam Prep

SL and HL. Booklet section: Challenges faced, plus how Section B is examined. This page turns everything you have learned into marks: the challenges, the markbands, an answer scaffold, the common mistakes, and worked questions for both levels.

The booklet ends with a list of challenges. These are not a side note; they pre-announce the territory the exam questions will sit on. Prepare them well and you will recognise the question on the day. This page shows you how the marks work and how to write answers that score.

The four challenges
How Section B is marked
The command terms you will meet
The five things the markband rewards
A scaffold for the extended response
Plan, then write
Common mistakes to avoid
Researching the case study
Worked example: SL Section B (12 marks)
1. Mark scheme and commentary (SL)
HL Section: Worked example, HL Section B (24 marks)
1. Mark scheme and commentary (HL)
Quick check
Practice exercises
1. Core
2. Extension
3. Challenge
Connections

The four challenges

The challenges are tiered. SL prepares the first two; HL prepares all four. Read each as a likely extended-response topic.

#	Challenge	SL	HL
1	Managing the iterative denoising process in diffusion models and their computational demands for photorealistic images	Yes	Yes
2	Navigating the ethics of large training datasets, focusing on intellectual property and bias mitigation	Yes	Yes
3	Balancing the generator and discriminator in GANs to produce realistic images	-	Yes
4	Evaluating how a hybrid model could balance the strengths and weaknesses of VAEs, GANs, flow-based models, and diffusion models	-	Yes

Challenges 1 and 2 map onto diffusion models and ethics and law. Challenges 3 and 4 map onto GANs and hybrid models.

How Section B is marked

Section B is one question with several parts, split into two kinds of marks.

Short, point-marked parts use Identify, Outline, and Describe (AO2). You state a precise point and expand it. “Outline two ways” means two distinct points, each developed. The mark scheme awards a mark for the point and a mark for the expansion.
One extended response uses Discuss, Evaluate, Justify, or To what extent (AO3). It is marked holistically against a level descriptor, not by counting points. SL writes a 6-mark response that tops out at Competent; HL writes a 12-mark response that can reach Proficient.

	SL	HL
Total Section B	12 marks	24 marks
Short parts	About 2	About 4
Extended response	6 marks, 4-band, caps at Competent	12 marks, 5-band, reaches Proficient

The command terms you will meet

Use	Command terms	What they ask for
Short parts (AO2)	Identify, Outline, Describe	A precise point, expanded with a reason or consequence
Extended response (AO3)	Discuss, Evaluate, Justify, To what extent	A balanced, reasoned argument that reaches a conclusion

Follow the command term exactly. “Identify” wants a named point; “Describe” wants that point developed; “Evaluate” and “To what extent” want both sides weighed and a judgement reached. Answering a different command term than the one asked is one of the most common ways to lose marks.

The five things the markband rewards

The extended response is where most marks are won or lost. Five qualities separate a top answer from a weak one:

Balanced analysis. Argue more than one side. A one-sided answer cannot reach the top band.
Case-study anchoring. Refer to Visionary Studios and the specific models, not generic “AI.”
Terminology throughout. Use the glossary terms accurately and often.
Evidence of research. Bring in real tools, events, or examples beyond the booklet. HL’s top band needs extensive research.
A conclusion linked to the analysis. Reach a reasoned judgement that follows from your own points, not a restatement of the question.

The single most common failure is to list points on one side and stop. “Discuss” and “To what extent” are not “list the issues.” They are “weigh the issues and decide.”

A scaffold for the extended response

A reliable structure that hits all five criteria:

Position. State the question as a decision and signal where you will land.
Argument for, with evidence. Develop genuine points in favour, anchored to the scenario.
Argument against, with evidence. Develop genuine points on the other side.
Weigh. Compare the strongest points on each side. This is the analysis the markband rewards.
Justified conclusion. Decide, and justify it by reference to the weighing you just did. A conditional conclusion (“yes, but only if…”) is often the strongest.

flowchart TD
    POS["1. Position"] --> FOR["2. Argument for, with evidence"]
    FOR --> AGN["3. Argument against, with evidence"]
    AGN --> WEIGH["4. Weigh the two sides"]
    WEIGH --> CONC["5. Justified conclusion"]

The extended-response scaffold: both sides with evidence, then a conclusion that follows from the weighing. (Original diagram.)

Plan, then write

A few minutes of planning is worth far more than the same minutes spent writing more. Before the extended response, spend about five minutes sketching your two or three points per side, your evidence, and your conclusion. A planned answer is balanced and concluded; an unplanned one tends to run out of time mid-argument with no judgement reached.

Common mistakes to avoid

Just retelling the booklet. Marks come from analysis and judgement, not from summarising the scenario back.
Copying definitions from the booklet. Restate terms in your own words; copied definitions are rejected.
Arguing only one side. Balance is required for the top band.
No conclusion, or a conclusion that ignores your own analysis.
Fabricating “facts” or citations. Made-up statistics and fake sources are worse than none. Use real, accurate research, and if unsure, say what you would verify.
Ignoring the command term. Do exactly what the term asks.

Researching the case study

Research is assessed, so treat the booklet as a starting point. Build an evidence base over the year:

Plan it systematically. Take one challenge at a time and gather how it works, the trade-offs, the ethics, and at least one real example or named source.
Judge your sources. Prefer reliable, current sources, and note where a claim comes from so you can cite it accurately.
Use AI tools honestly. A generative AI tool can help you summarise or get oriented, but you must verify its claims independently and never present unverified output as fact. Fitting, given the subject of this case study.
Bring real tools and real debates into your answers: DALL-E, Stable Diffusion, Midjourney, copyright disputes over training data, documented dataset bias, and content-disclosure approaches.

Worked example: SL Section B (12 marks)

Original practice in the case study’s exam style. Attempt it under timed conditions (about 18 minutes) before reading the mark scheme.

5. (a) Outline one benefit of using text-to-image generation to create advertising images. [2]

(b) Outline two ethical considerations Visionary Studios must address when curating its training data. [4]

Mark scheme and commentary (SL)

(a) [2] One mark for the point, one for the expansion. For example: text-to-image generation creates an image directly from a written prompt [1], so designers can produce concept images quickly from a description without drawing skill [1]. Accept other genuine benefits (speed, exploring many ideas).

(b) [4] Two considerations, point plus expansion each. Accept any two of: intellectual property (avoid copyrighted material [1] to comply with the law and not reproduce others’ work [1]); bias and fairness (check the dataset for bias [1] or generated images may exclude or misrepresent groups [1]); transparency and disclosure (disclose AI-generated content [1] to keep client and audience trust [1]).

(c) [6], markband. Command term To what extent (AO3). A strong answer weighs benefits (photorealistic quality, stable generation, flexibility across campaign needs) against drawbacks (computational cost from iterative denoising, plus the ethical and legal concerns around training data), then reaches a conclusion on whether, for this studio’s goals, the quality justifies the cost and the effort of managing the ethics.

SL level descriptor (6 marks):

Marks	Descriptor
0	No knowledge or understanding; no appropriate terminology.
Basic 1-2	Minimal knowledge and understanding; minimal terminology; may be little more than a list; no reference to the case study or independent research.
Adequate 3-4	Descriptive, limited knowledge and understanding; limited terminology; some analysis; evidence of some research.
Competent 5-6	Knowledge and understanding of the issues and concepts; terminology used appropriately in places; evidence of analysis; evidence of research; there is a conclusion.

SL tops out at Competent: there is no Proficient band. To reach 5-6 you need both sides, accurate terminology, some research, and a conclusion.

HL Section: Worked example, HL Section B (24 marks)

HL Only. This worked example uses GANs and hybrid models. SL students can stop at the SL example above.

Original practice in the case study’s exam style. Attempt it under timed conditions (about 35 minutes) before reading the mark scheme.

5. (a) (i) Identify two factors Visionary Studios uses to evaluate generative AI models. [2]

(a) (ii) Outline one reason diffusion models are computationally expensive. [2]

(b) Outline two ethical or legal considerations Visionary Studios must address when building its training datasets. [4]

(d) Evaluate whether Visionary Studios should use a diffusion model, a GAN, or a hybrid model to generate high-quality images for its advertising campaigns. [12]

Mark scheme and commentary (HL)

(a)(i) [2] One mark each for any two of: output quality, computational efficiency, training stability, flexibility and scalability, consistency.

(a)(ii) [2] Diffusion models work by an iterative denoising process [1], and each step runs the neural denoiser, so many repeated passes demand a lot of computation [1].

(b) [4] Two of: intellectual property, bias and fairness, transparency and disclosure, point plus expansion each (as in the SL example).

(c) [4] Two of: instability (the generator and discriminator must stay balanced [1] or training breaks down [1]); mode collapse (the generator produces limited variation [1], failing to capture the dataset’s full diversity [1]); adversarial balance (if the discriminator becomes too strong or too weak [1], the generator stops improving usefully [1]).

(d) [12], markband. Command term Evaluate (AO3). A strong answer weighs the three options: a diffusion model (photorealistic, stable, but compute-heavy), a GAN (very sharp, but unstable with mode-collapse risk and the need to balance generator and discriminator, challenge 3), and a hybrid model (combines strengths for control and consistency, but more complex and costly, challenge 4). It judges them against the studio’s goals (quality, compute budget, stability, consistency, and the ethics of the training data) and reaches a justified conclusion, commonly favouring diffusion for quality-with-stability, with a hybrid considered where finer control is worth the complexity.

HL level descriptor (12 marks):

Marks	Descriptor
0	No knowledge or understanding; no appropriate terminology.
Basic 1-3	Minimal knowledge and understanding; minimal terminology; may be little more than a list; no reference to the case study or independent research.
Adequate 4-6	Descriptive, limited knowledge and understanding; limited terminology; limited analysis; evidence of limited research.
Competent 7-9	Knowledge and understanding of the issues and concepts; terminology in places; some analysis; evidence of research.
Proficient 10-12	Detailed knowledge and clear understanding; terminology throughout; competent and balanced analysis; conclusions linked to the analysis; clear evidence of extensive research.

To reach Proficient you need balanced analysis of more than one option, anchoring to Visionary Studios, accurate terminology throughout, evidence of extensive research, and a conclusion that follows from your analysis.

Quick check

Q1. How many of the four challenges does an SL student prepare?

Q2. How is the extended response marked?

Q3. What happens to an extended response that makes no reference to the case study or independent research?

Q4. What is the highest band an SL 6-mark response can reach?

Q5. (HL) Which challenges are most likely to appear inside HL's 12-mark extended response?

Practice exercises

The best practice is a full, timed Section B. Mark your own against the descriptors above, checking the five criteria.

Core

Outline (4 marks) - Outline the two SL+HL challenges in your own words.
To what extent (6 marks) - Write a full SL extended response: To what extent should Visionary Studios prioritise output quality over computational efficiency when choosing a model? Plan first, then write in prose.

Extension

Self-assessment (no marks) - Mark your answer to question 2 against the SL descriptor. Identify which of the five criteria are present and which are missing.
Discuss (6 marks) - Discuss whether disclosing AI-generated content should be required for all of Visionary Studios’ campaigns.

Challenge

Evaluate (12 marks) (HL) - Write a full HL extended response: Evaluate whether a hybrid model is worth its added complexity for Visionary Studios. Bring in real research, weigh both sides, and reach a justified conclusion. Mark it against the HL descriptor.

Connections

Start of this edition: 2027 overview and scenario - the scenario and how it is examined.
The challenges in depth: diffusion models, ethics and law, GANs (HL), hybrid models (HL).
Course link: Ethics of Machine Learning - the Discuss command term, worked in detail.