Teacher Guide: Ethics of Machine Learning

Student page: Ethics of Machine Learning IB Syllabus: A4.4.1 – Discuss the ethical implications of machine learning in real-world scenarios (Discuss, AO3, SL + HL) Estimated periods: 2-3 (45-50 min each) Prerequisites: A working idea of what machine learning is (a model learns patterns from training data). No HL ML detail is required – A4.4.1 is core.

Why this topic is harder than it looks
Lesson Plan
Differentiation
IB exam relevance
Answer Keys
1. Quick Check (on-page MCQs)
2. Match the Dimension (fill-in)
Integration notes

Why this topic is harder than it looks

Students arrive thinking ethics is the “easy, no-right-answer” part of the course. The opposite is true at the top of the mark range. A4.4.1 is examined with Discuss (AO3), so a list of dangers earns a low-to-middle mark no matter how long it is. The whole lesson should drive toward one habit: two developed sides plus a conclusion that weighs them. Treat the nine dimensions as the vocabulary that lets students name issues precisely; treat the Discuss structure as the skill that earns the marks.

Watch for two predictable failure modes:

One-sidedness. Students moralise (“AI is dangerous and should be banned”) instead of weighing. Insist on a genuine benefit for every system, however uncomfortable.
Vagueness. “It’s bad for privacy” is worth little; “it raises a consent problem because the data was gathered for X and reused for Y without asking” is worth full marks. Reward the named concept every time.

Lesson Plan

Period 1: The nine dimensions and where bias comes from

Phase	Time	Activity	Student Page Section
Hook	5 min	Show a single scenario (e.g. a hiring model that screens CVs). Ask “what could go wrong?” and collect raw, unlabelled worries on the board	–
Teach	15 min	Introduce the nine dimensions, mapping the students’ raw worries onto the correct named dimension as you go. Emphasise overlap (the gate-camera example)	The Nine Ethical Dimensions
Teach	10 min	Where bias comes from: the five named bias types. Use the hiring model to show historical bias specifically	Where Bias Comes From
Quick Check	8 min	Q1-Q3 + Q6 on EduCS.me	Quick Check
Practice	7 min	Core 1 (define + exemplify three dimensions)	Practice Exercises

Period 2: Online communication ethics + the Discuss structure

Phase	Time	Activity	Student Page Section
Recap	4 min	Cold-call: name a dimension, give a one-line test question for it	–
Teach	8 min	Online communication ethics: misinformation, bias, harassment, anonymity, privacy. Anchor on the anonymity trade-off	Online Communication Ethics
Teach	12 min	The Discuss structure (one side / other side / conclusion). Walk the worked example live, building it on the board from a blank scenario	How to Answer a “Discuss” Question
Model	8 min	Read the worked example aloud; highlight where the conclusion is conditional (“yes, but only if…”)	Worked Example
Practice	13 min	Extension 5 (face-recognition attendance, 8-mark Discuss) – students draft the three-part skeleton, not the full prose	Practice Exercises

Period 3 (optional, recommended before assessment): timed Discuss practice

Use Challenge 6 (predictive policing) or 7 (train-from-scratch vs fine-tune) as a timed 15-minute write, then peer-mark against the top-band checklist. The peer-marking is where the structure finally lands.

Teaching notes:

Build the worked example live, do not just show it. The value is watching a blank scenario become a structured argument. If you reveal the finished version first, students copy its content instead of learning its shape.
The conclusion is the lesson. Most students can list pros and cons by Period 2. Almost none write a conclusion that references their own points. Spend disproportionate time here. Model the move: “I have said X (benefit) and Y (cost); X is weightier because…, therefore…”.
Keep the IB framing in callouts, not the explanation. The site is school-agnostic. Use the Discuss vocabulary freely with your class, but the dimensions themselves are universal – teach them as transferable judgment, which also happens to be exam-optimal.
Misconception to pre-empt: students think a “mathematical” model is automatically objective. The hiring example dismantles this fastest – the maths is fine; the data encodes the unfairness.

Differentiation

Supporting students who struggle:

Strategy	When to use	Example
Dimension cue cards	Cannot recall the nine	One card per dimension: name + one-line test question + one example. Students sort scenarios onto cards
Sentence stems for Discuss	Freeze on open prose	“One benefit is… This matters because…” / “However, this raises a … concern because…” / “On balance, … because…”
Two-column then bridge	Cannot reach a conclusion	Fill a benefits/risks T-chart first, then write only the sentence “The most important point is __ because __” as the conclusion seed

Extending stronger students:

Strategy	When to use	Example
Force a hard conclusion	Writes balanced analysis but sits on the fence	“You must recommend deploy or not deploy, and state the exact safeguards that change your answer”
Conflicting fairness definitions	Grasps bias quickly	Introduce that equal accuracy and equal selection rate can’t both hold; have them argue which a hospital should prioritise
TOK link	Philosophically minded	“Can a decision be fair to individuals and unfair to a group at the same time? Who should decide which definition of fairness a public system uses?”

IB exam relevance

Discuss the ethical implications of [ML scenario] – the headline question type, typically high-mark AO3. Needs both sides + linked conclusion.
Explain how bias enters a model – name the bias type (historical, sampling, selection, labelling, measurement), do not just say “bad data.”
Outline / Explain a single dimension – lower-mark warm-ups; reward the named concept + one developed point.
The top-band checklist: detailed accurate knowledge, correct terminology throughout, balanced analysis of more than one side, conclusion linked to the analysis. Teach these four as a self-check students run before they stop writing.

Answer Keys

These are non-programming questions, so the keys give indicative content and marking notes, not code. For the Discuss questions, accept any well-argued position: marks come from the quality and balance of the argument and the linkage of the conclusion, not from which side the student lands on.

Quick Check (on-page MCQs)

Q	Answer	Key teaching point
Q1	c	Historical bias – the data accurately records a biased past
Q2	b	Accountability gap – responsibility falls between parties
Q3	d	Transparency – an unexplainable decision can’t be challenged
Q4	a	Discuss must be two-sided with a linked conclusion
Q5	c	Environmental impact – retraining from scratch wastes energy
Q6	b	Consent must be informed and specific, even for public data

Match the Dimension (fill-in)

privacy · sampling bias · transparency · security · environmental impact

Marking note: accept “sampling bias” or “selection bias” for the skin-cancer item only if the student justifies it; the cleaner answer is sampling bias (the sample under-represents dark skin).

Core 1: Define and exemplify three dimensions (6 marks)

1 mark for each correct definition, 1 mark for each apt original example (3 × 2).

Accountability – who is responsible when an automated decision causes harm, and whether it can be appealed. Example: a credit system wrongly flags someone; a named officer, not “the algorithm,” must own and review the decision.
Consent – informed, specific permission for one’s data to be used in a stated way. Example: a fitness app sharing heart-rate data with insurers needs explicit consent for that specific use.
Transparency – whether a specific decision can be explained in human terms. Example: a rejected applicant is told which factors counted against them.

Common mistakes: examples that just restate the definition; reusing the page’s exact examples (push for the student’s own). Marking note: generic definition with no example caps at 1 of 2 for that dimension.

Core 2: Outline two ways bias enters via training data (4 marks)

1 mark per way + 1 mark per correctly named type (2 × 2). Any two of:

Historical bias – data records past unfair decisions.
Sampling bias – some group is under-represented in the data.
Selection bias – the way cases entered the dataset skews who appears.
Labelling bias – the “correct answers” carry human prejudice.
Measurement bias – a feature is an unequal proxy for the target.

Marking note: “the data was bad” without a named type scores the way-mark only, not the type-mark.

Core 3: Why average accuracy can still be unfair (4 marks)

Indicative points (any two developed): a model optimised for overall accuracy can be accurate for the majority group while performing poorly on a minority group, because errors on a small group barely move the average. The aggregate figure hides per-group performance. The fix is to measure accuracy separately for each group, not just overall.

Marking note: reward the insight that “average hides the minority”; full marks need the per-group evaluation point.

Extension 4: Recommendation feedback loop and misinformation (6 marks)

Indicative chain: optimising for watch time → sensational/false content tends to be more engaging → the model surfaces it more → users watch more of it → that behaviour becomes new training signal → the model learns to push it harder (a feedback loop) → misinformation is amplified without anyone intending it. Strong answers name the loop explicitly and note the system amplifies rather than merely reflects.

Marking note: 6 marks need the loop (behaviour feeding back into training), not just “the algorithm shows bad content.”

Extension 5: Face-recognition attendance -- Discuss (8 marks)

Mark on balance + linked conclusion, not position. Indicative content:

For: automatic, fast, removes manual roll-call, hard to spoof, consistent record.
Against / concerns: privacy (biometric data on minors), consent (can children meaningfully consent? do parents?), security (where is the biometric data stored; breach is permanent – you can’t reissue a face), bias (face recognition is often less accurate for some skin tones / ages → some students mis-flagged more often), accountability (who answers for a wrong match that marks a present student absent?), societal impact (normalising surveillance of children).
Conclusion: a defended position with conditions – e.g. acceptable only with strong consent, secure on-device storage, per-group accuracy testing, and an easy manual override; otherwise the privacy/consent cost outweighs the convenience.

Marking bands (apply the AO3 logic): one side only = lower band; both sides but weak/absent conclusion = middle; both sides developed + conclusion linked to the analysis + correct terminology = top. Requires ≥3 dimensions and prose.

Challenge 6: Predictive policing -- Discuss (10 marks)

The strongest single scenario for this topic. Indicative content:

For: efficient allocation of limited patrols; potential deterrence; data-driven rather than hunch-driven.
Against: historical/measurement bias – arrest records reflect where police already patrolled, not where crime occurs, so the model sends more patrols to already-over-policed areas → more arrests there → the data “confirms” the bias (a feedback loop); accountability – who answers for a community wrongly targeted; transparency – residents can’t see or challenge why their area is flagged; societal impact – erodes trust, entrenches inequality.
Conclusion: a calibrated judgment. Defensible positions include “not justifiable while trained on arrest data” and “justifiable only if trained on victim-report/calls-for-service data, audited for per-area fairness, transparent, and with human accountability.” Reward the conditions.

Marking note: top band requires the feedback-loop insight and a conclusion that states conditions rather than a flat verdict.

Challenge 7: Train-from-scratch vs fine-tune -- Evaluate (10 marks)

Indicative content: weigh (a) maximum accuracy of a new large model against (b) the far lower energy/hardware cost of fine-tuning a smaller existing one for slightly lower accuracy. Bring in environmental impact (compute energy, cooling, hardware), societal benefit (does the accuracy gain actually help users, or is it marginal?), and cost/sustainability. A strong Evaluate reaches a recommendation and justifies it against criteria – typically fine-tuning unless the accuracy gain delivers a benefit large enough to justify the energy cost.

Marking note: “Evaluate” needs an explicit recommendation with weighed criteria; balanced discussion with no recommendation caps below top band.

Integration notes

In class: the nine dimensions, the bias types, and the live-built worked example. The structure must be modelled, not assigned. Homework: Core 1-3 after Period 1; one Extension Discuss after Period 2. Assessment prep: a timed Challenge Discuss with peer-marking against the four-point checklist.

Connects to: Computing in Daily Life (the societal half, A4.4.2); Databases (data protection, privacy); Networks (encryption, security).