Criterion E: Evaluation

Marks: 4 • Recommended words: 400

Criterion E is where you look back at the finished product and say honestly how well it met the problem specification. Two things must be in the evaluation:

An evaluation of the extent to which each success criterion was met.
Improvements to the product, with the reasoning behind them.

The evaluation must be consistent with the problem specification and success criteria in Criterion A. Both halves of the criterion reference back to A – there is no way to write a good evaluation without strong success criteria.

A table alone is not enough for top-band marks. A two-column table summarising whether each success criterion was met can sit at the top of this section, but the upper band requires prose that actually evaluates – says how partially something was met, which inputs caused the partial result, what the impact on the user would be, and how the criteria compare to one another. Without that prose, the work scores in the 1–2 band regardless of how thorough the table looks.

1. Evaluating against the success criteria

Go through the success criteria from Criterion A one at a time. For each, say:

Extent – was it met fully, partially, or not at all?
Evidence – what in the product, testing results, or video shows this?
Why – what about the implementation led to this outcome?

A two-column table is a common format, but the 2027 guide expects more than a table alone for the upper band. Pair the table with a short prose section that does the real analytical work.

Example table (meal planner)

Success criterion	Met?	Evidence
SC1: CRUD recipes	Fully	Tests 1–3 passed; video 00:45–01:15 demonstrates add, edit, delete
SC2: Weekly plan assignment	Fully	Test 4 passed; video 01:15–01:40
SC3: Shopping list aggregation	Fully	Test 5–6 passed; video 02:10–02:40
SC4: Pantry comparison	Partially	Aggregation works (tests 5–6), but unit conversion (grams vs. kilograms) not implemented; edge cases in test 7 revealed mismatch
SC5: Swap recipe live update	Fully	Test 8 passed; video 02:40–03:00
SC6: Persistence between sessions	Fully	File-save test 9 passed; video 03:00–03:15 shows close-and-reopen
SC7: Export shopping list to text	Not met	Not implemented – deprioritised in favour of fixing SC4 mismatch

Then the prose does the evaluation:

Six of the seven success criteria were met, with SC4 only partially met and SC7 not implemented. The pantry comparison works correctly when both recipe ingredients and pantry items use the same unit, but fails when a recipe specifies grams and the pantry tracks kilograms. This was uncovered late during testing (test 7) and was deprioritised in favour of correctness for the other cases, but it undermines SC4 in any realistic use. SC7 (text export) was dropped when time ran short; the core meal-planning workflow was preserved at the cost of the export feature.

What counts as evidence

Test outcomes from your testing strategy – including failures.
Video timestamps showing specific functionality.
Screenshots or code references from the documentation / appendix.

Client or user feedback is not required by the 2027 syllabus, but including it (if you gathered any during the project) strengthens the evaluation. Reference it in the evaluation prose rather than just appending it as an appendix.

Do not invent feedback. Quoting feedback you did not actually collect during the 35 hours, or gathering it after the fact specifically to dress up the evaluation, is academic-integrity territory. Only include feedback that informed your evaluation.

2. Improvements to the product

For each improvement:

State what the improvement is – concrete and specific.
Justify why – what problem in the evaluation does it address? Does it close a success-criterion gap, handle a real edge case, or extend the product in a user-motivated way?
Keep it realistic – in scope, in technology, and in complexity. Recommending “add network multiplayer” to a single-user desktop app reads as unrealistic.

The 2027 guide does not specify a minimum number of improvements. Quality over quantity – two well-justified improvements score better than five generic ones.

Example (meal planner)

Add unit-conversion handling in pantry comparison. The partial failure of SC4 is traceable to a single assumption that all quantities are in the same unit. Adding a unit-normalisation layer (kg → g, L → ml) before comparison would close the gap. This is a targeted fix: the pantry and recipe data structures already store quantity-unit pairs, so only the comparison function needs to change.

Implement the shopping-list export (SC7). The export format is straightforward (aggregated list, one ingredient per line with quantity and unit). The underlying data is already computed; only the file-writing and a file-picker dialog need to be added. Estimated effort ~1–2 hours, based on comparable file I/O elsewhere in the project.

Compare this to the low-band version:

The program could be made better by adding more features.

That earns no marks because it is neither specific nor justified.

Mark bands

Marks	Level descriptor
0	The response does not reach the standard described below.
1–2	The response: states the extent to which the success criteria were met; describes improvements to the product.
3–4	The response: evaluates the extent to which the success criteria were met; justifies improvements to the product.

The two command-term shifts from 1–2 to 3–4 drive everything:

States → evaluates – 1–2 says “SC4 was partially met.” 3–4 says how partially, which specific inputs caused the partial result, what the impact on the user would be, and how this compares to the other criteria.
Describes → justifies – 1–2 says “I would add a text export.” 3–4 says why the text export matters for the problem, what evidence from testing or evaluation motivates it, and what the trade-offs of implementing it would be.

Word count and formatting

Recommended word count: 400 (excluding tables).
The evaluation table can be short; invest the 400 words in the prose that does the analysis and justification.
Suggested section heading in the single-PDF documentation: “Criterion E: Evaluation”.

Common pitfalls

Relying on a table alone. The upper band needs evaluation prose, not just ticks in a table.
Generic improvements. “Add more features” or “improve the UI” do not justify anything. Every improvement must be specific, motivated by something in the evaluation, and realistic.
Contradicting Criteria C/D. If you said in C that the testing strategy was robust and in D that all techniques were justified, but then in E you discover the product hardly works, one of those was untrue. Keep the three sections honest.
Over-claiming fulfilment. If a success criterion was only partially met, say so. Examiners reading the video against the evaluation will catch over-claims, and it damages the credibility of the whole evaluation.
Client feedback without analysis. Quoted feedback does not count as evaluation unless you analyse what it means for the product.