Normalisation
IB Syllabus: A3.2.5: Explain the difference between normal forms (1NF, 2NF, 3NF). A3.2.6: Construct a database normalised to 3NF for a range of real-world scenarios. A3.2.7: Evaluate the need for denormalising databases.
Key Concepts
What Normalisation Is
Normalisation is the process of arranging the columns of a relational database into tables so that each fact about the world is stored in exactly one place. The result is a set of small, focused tables linked by keys, rather than one giant table that repeats values everywhere.
You normalise a database in stages, each producing a stricter structure than the last:
- First Normal Form (1NF): atomic values, no repeating groups, every row uniquely identifiable
- Second Normal Form (2NF): 1NF plus no partial-key dependencies
- Third Normal Form (3NF): 2NF plus no transitive dependencies
A database in 3NF is the standard target most production designs aim for. Higher forms (BCNF, 4NF, 5NF) exist but are rarely needed in practice.
Why Normalise?
Picture a single mega-table for a small sushi takeaway, Wave Sushi, that records every order on one row:
ORDER (unnormalised)
| OrderID | Customer | CustPhone | Items | ItemPx | Qty |
|---|---|---|---|---|---|
| 7001 | Aroha K. | 022-555-0182 | “Spicy Tuna, Rainbow Roll” | “5.00, 9.50” | “2, 1” |
| 7002 | Marlon T. | 022-555-0419 | “California Roll” | “6.50” | “3” |
| 7003 | Aroha K. | 022-555-0182 | “Rainbow Roll, Salmon Nigiri” | “9.50, 4.00” | “1, 2” |
This table is full of problems:
| Problem | Example |
|---|---|
| Data duplication | Aroha’s phone number 022-555-0182 appears in every one of her order rows. |
| Multi-valued cells | Items and Qty each hold a comma-separated list. |
| Update anomaly | When Aroha changes phone numbers, the new number must be typed into every row, miss one and the data disagrees with itself. |
| Insert anomaly | A new menu item with no orders cannot be added at all, there is no row to add it to. |
| Delete anomaly | Deleting all of Marlon’s orders deletes his customer details with them. |
Normalisation is the systematic process that eliminates these problems by splitting the mega-table into smaller, focused tables. Mark schemes for “Outline a reason to normalise” reward five families of benefit: reduced data duplication, eliminated update/insert/delete anomalies, improved data integrity, improved query performance, and easier scaling.
Functional, Partial-Key, and Transitive Dependencies
Normalisation is defined in terms of how one column’s value depends on another’s. Three named dependencies matter for 1NF -> 2NF -> 3NF:
Functional dependency (FD). Column B functionally depends on column A (written A -> B) if every value of A always corresponds to the same value of B. Example: CustomerID -> CustPhone, a given customer has one phone number, so given the customer ID you know the phone number.
Partial-key dependency. A column depends on only part of a composite primary key, rather than the whole key. This is the 2NF violation. Example: in a table with composite key (OrderID, ItemID), if ItemName depends only on ItemID (not on OrderID), that is a partial-key dependency. ItemName belongs in a separate ITEM table where ItemID is the full key.
Transitive dependency. A non-key column depends on another non-key column, rather than directly on the primary key. This is the 3NF violation. Example: in a table with primary key OrderID and columns CustomerID, CustPhone, the CustPhone depends on CustomerID rather than on OrderID. The chain is OrderID -> CustomerID -> CustPhone. CustPhone belongs in a separate CUSTOMER table where CustomerID is the primary key.
(The new 2027 syllabus also mentions multi-valued dependencies, which are addressed at 4NF, beyond the scope of A3.2.5, but the symptom of multi-valued cells is what 1NF fixes.)
First Normal Form (1NF)
A table is in 1NF when:
- Every cell holds a single, atomic value. No comma-separated lists, no nested records, no JSON blobs.
- There are no repeating groups of columns. If you find yourself naming columns
Item1,Item2,Item3, you have a repeating group that should become its own table. - Every row is uniquely identifiable. Add a primary key if one does not exist naturally.
- Column names are distinct. No two columns share a name.
Applied to the Wave Sushi mega-table, 1NF means:
- Splitting the
Items,ItemPx, andQtylists so each order line is its own row. - Adding a primary key. The natural key is the combination of
(OrderID, ItemID), a composite key, because the same order can contain several different items.
ORDER_LINE (1NF) composite primary key: (OrderID, ItemID)
| OrderID | ItemID | Customer | CustPhone | ItemNm | ItemPx | Qty |
|---|---|---|---|---|---|---|
| 7001 | M01 | Aroha K. | 022-555-0182 | Spicy Tuna | 5.00 | 2 |
| 7001 | M07 | Aroha K. | 022-555-0182 | Rainbow Roll | 9.50 | 1 |
| 7002 | M03 | Marlon T. | 022-555-0419 | California | 6.50 | 3 |
| 7003 | M07 | Aroha K. | 022-555-0182 | Rainbow Roll | 9.50 | 1 |
| 7003 | M12 | Aroha K. | 022-555-0182 | Salmon Nigiri | 4.00 | 2 |
This is in 1NF, atomic, no repeating groups, every row uniquely identifiable by (OrderID, ItemID). But there are still big problems: the same customer name and phone are repeated across rows, and so are the item name and price.
Second Normal Form (2NF)
A table is in 2NF when:
- It is already in 1NF.
- Every non-key column depends on the whole primary key, not just part of it.
2NF only applies to tables with composite primary keys, if the primary key is a single column, there is nothing to be only “partially” dependent on, and a 1NF table with a single-column PK is automatically in 2NF.
Looking at our 1NF ORDER_LINE table with composite key (OrderID, ItemID):
Customerdepends only onOrderID(the order determines which customer placed it; the item does not matter).CustPhonedepends only onOrderID(same reasoning, transitively).ItemNmdepends only onItemID(each item has one fixed name regardless of which order it appears in).ItemPxdepends only onItemID(the menu price for an item is the same in every order).Qtydepends on the whole key(OrderID, ItemID), “how many California Rolls in order 7002” needs both pieces.
To reach 2NF, split off the partial-key dependencies into their own tables:
ORDER (2NF) PK: OrderID
| OrderID | Customer | CustPhone |
|---|---|---|
| 7001 | Aroha K. | 022-555-0182 |
| 7002 | Marlon T. | 022-555-0419 |
| 7003 | Aroha K. | 022-555-0182 |
ITEM (2NF) PK: ItemID
| ItemID | ItemNm | ItemPx |
|---|---|---|
| M01 | Spicy Tuna | 5.00 |
| M03 | California | 6.50 |
| M07 | Rainbow Roll | 9.50 |
| M12 | Salmon Nigiri | 4.00 |
ORDER_LINE (2NF) composite PK: (OrderID, ItemID); FKs: OrderID -> ORDER, ItemID -> ITEM
| OrderID | ItemID | Qty |
|---|---|---|
| 7001 | M01 | 2 |
| 7001 | M07 | 1 |
| 7002 | M03 | 3 |
| 7003 | M07 | 1 |
| 7003 | M12 | 2 |
Now every non-key column depends on the whole key of its table. No more partial-key dependencies.
Third Normal Form (3NF)
A table is in 3NF when:
- It is already in 2NF.
- No non-key column depends on another non-key column (no transitive dependencies).
Look at the 2NF ORDER table: OrderID -> Customer -> CustPhone. The phone number depends on the customer, not on the order. Two different orders by the same person would store the same phone twice, and updating Aroha’s phone would still require touching multiple rows.
Split out a CUSTOMER table:
CUSTOMER (3NF) PK: CustID
| CustID | Customer | CustPhone |
|---|---|---|
| 4101 | Aroha K. | 022-555-0182 |
| 4102 | Marlon T. | 022-555-0419 |
ORDER (3NF) PK: OrderID; FK: CustID -> CUSTOMER
| OrderID | CustID |
|---|---|
| 7001 | 4101 |
| 7002 | 4102 |
| 7003 | 4101 |
Now every column in every table depends on the key, the whole key, and nothing but the key, the classic phrasing for 3NF.
The “Key, Whole Key, Nothing But the Key” Mantra
A widely quoted way to remember the three normal forms:
- 1NF: every column depends on the key (single value per cell, every row uniquely identifiable).
- 2NF: every column depends on the whole key (no partial-key dependencies on composite keys).
- 3NF: every column depends on nothing but the key (no transitive dependencies via other non-key columns).
Update, Insert, and Delete Anomalies
Normalisation exists primarily to eliminate three kinds of anomaly. Each is a concrete failure that the unnormalised mega-table suffers from but a 3NF schema does not:
| Anomaly | Unnormalised version | 3NF version |
|---|---|---|
| Update | Changing Aroha’s phone number requires updating every order row she appears in. Miss one and the data disagrees with itself. | Update CUSTOMER.CustPhone once. Every order automatically reflects the new number through the foreign-key link. |
| Insert | Cannot add a new menu item until someone orders it, there is no row to put it on. | Add a row to ITEM. No order is required. |
| Delete | Deleting the last order of a customer also deletes the customer’s contact details. | Delete the rows in ORDER and ORDER_LINE. The CUSTOMER row stays intact. |
When a question asks “Outline why the table is not in 3NF” or “Explain why this table should be normalised,” cite the specific anomaly that the redundancy enables.
Process for Constructing a 3NF Schema
For a typical 5–8 mark “Construct the 3NF” question, follow this routine:
- Identify all the attributes in the unnormalised table.
- Make every value atomic: if a column contains a list, split each list element onto its own row.
- Add or identify a primary key: often a composite key on what was the original row identifier plus the item being repeated.
- Look for partial-key dependencies (only if the PK is composite). Move columns that depend on only part of the key into their own table; the partial key becomes the new table’s primary key.
- Look for transitive dependencies. Any column that depends on another non-key column moves into a new table keyed by that non-key column.
- Add the foreign keys that re-link the new tables back together.
- Sanity check. Read each table aloud as
TABLE(PK, attr1, attr2, ...). Every attribute should be a fact about the primary key, nothing else.
Evaluating Denormalisation (A3.2.7)
Normalisation is the default. But there are situations where you deliberately re-introduce some redundancy, called denormalisation, to gain something in return. This is the topic of A3.2.7.
When denormalisation is sometimes worth it:
- Read-heavy workloads. A reporting dashboard that runs the same JOIN of six tables thousands of times per minute may be much faster if the joined view is materialised into a single denormalised table.
- Analytical / warehouse systems. Data warehouses (covered in Data Warehouses) are deliberately denormalised because they are queried far more often than they are written.
- Storing historical snapshots. The price of an item on the day it was ordered is not the same as the current menu price. The order row should store the historical price even though that price is also in the
ITEMtable, otherwise re-pricing a menu item would silently rewrite the value of every past order. - Eliminating expensive joins on a hot path. Pre-joining customer address into an order row trades disk space for query speed when latency matters.
What you lose:
- Update anomalies come back. A denormalised customer-address-on-every-order needs to be maintained somewhere, usually via triggers or application code.
- Storage and write cost go up. Every redundant column costs disk space and slows down writes.
- Risk of inconsistency. Without careful maintenance, the two copies of the data drift apart and you no longer know which is correct.
How to evaluate the decision:
| Question | If yes, lean towards… |
|---|---|
| Does the same expensive JOIN dominate the workload? | Denormalise |
| Are reads much more frequent than writes? | Denormalise |
| Is the data historically immutable (a past invoice)? | Denormalise (snapshot the value) |
| Do you need a single, easy-to-explain canonical version of each fact? | Normalise |
| Are writes frequent and updates must be atomic? | Normalise |
| Are storage and disk-write costs already tight? | Normalise |
The command term Evaluate asks you to weigh both sides and reach a justified conclusion. A strong answer names a specific scenario, identifies the trade-off, and recommends one option with reasoning.
Worked Examples
Example 1: Normalising Wave Sushi to 3NF
We have already worked through this above in stages. The final 3NF schema is:
CUSTOMER(CustID, CustomerName, CustPhone)
ORDER(OrderID, CustID*, OrderDate) * = foreign key
ITEM(ItemID, ItemName, ItemPrice)
ORDER_LINE(OrderID*, ItemID*, Qty, LinePrice) composite PK + 2 FKs
Notes:
ORDER_LINE.LinePriceis the snapshotted price at order time, a deliberate denormalisation that preserves history even if the menu price later changes.CUSTOMER.CustIDis a surrogate key (auto-increment integer), not the customer name, names are not unique and change over time.ORDER_LINEhas a composite primary key(OrderID, ItemID)because the same item can appear in many orders and an order contains many items.
Example 2: Spotting Dependencies in a Sports Club Table
The Harbour Run sports club starts with a single spreadsheet of bookings:
BOOKING (unnormalised)
| BookingID | MemberID | MemberName | ClassID | ClassTitle | CoachName |
|---|---|---|---|---|---|
| B9001 | 1001 | Aroha K. | C401 | Sunrise 5k | Lila O. |
| B9002 | 1002 | Marlon T. | C401 | Sunrise 5k | Lila O. |
| B9003 | 1002 | Marlon T. | C402 | Track Sprints | Lila O. |
| B9004 | 1004 | Priya N. | C403 | Hill Repeats | Daniel P. |
Primary key: BookingID is a single column that uniquely identifies a booking. Because the PK is a single column, 2NF is automatically satisfied (no partial-key dependencies possible). The remaining question is whether the table is in 3NF.
Transitive dependencies:
MemberID -> MemberName(MemberNamedepends onMemberID, not directly onBookingID). Violates 3NF.ClassID -> ClassTitleandClassID -> CoachName(both depend onClassID, not onBookingID). Violates 3NF.
To reach 3NF:
MEMBER(MemberID, MemberName)
CLASS(ClassID, ClassTitle, CoachName)
BOOKING(BookingID, MemberID*, ClassID*, BookedOn)
Each table now has every non-key column depending only on the primary key.
Example 3: A Mark-Scheme-Style 3NF Construction
A small art gallery wants to track which artworks have been displayed in which exhibitions, by which artist, and at which price. Their starting (unnormalised) table:
GALLERY_LOG (unnormalised)
| ExhibitID | ArtworkID | Title | ArtistID | ArtistName | ArtistCountry | DisplayPrice |
|---|---|---|---|---|---|---|
| EX01 | A1001 | Tide | AR05 | Sebastián Q. | Peru | 1200.00 |
| EX01 | A1002 | Anchor | AR05 | Sebastián Q. | Peru | 850.00 |
| EX01 | A1003 | Mistral | AR12 | Priya N. | India | 2400.00 |
| EX02 | A1001 | Tide | AR05 | Sebastián Q. | Peru | 1350.00 |
Step 1: Primary key. The same artwork appears in different exhibitions at different prices. The natural composite PK is (ExhibitID, ArtworkID).
Step 2: Partial-key dependencies.
Titledepends only onArtworkID.ArtistIDdepends only onArtworkID(each artwork has one artist).ArtistNamedepends onArtistID, which depends onArtworkID, a chain.ArtistCountrydepends onArtistIDsimilarly.DisplayPricedepends on the whole key (the price is specific to this exhibit + this artwork).
So Title, ArtistID, ArtistName, ArtistCountry are partial-key dependent.
Step 3: Transitive dependencies. Within the soon-to-be ARTWORK table, ArtistName -> ArtistCountry is transitive (artist country depends on the artist, not the artwork). So ArtistName and ArtistCountry should move into an ARTIST table keyed by ArtistID.
Step 4: 3NF schema.
ARTIST(ArtistID, ArtistName, ArtistCountry)
ARTWORK(ArtworkID, Title, ArtistID*)
EXHIBIT(ExhibitID, ExhibitName, OpenDate, CloseDate)
DISPLAY(ExhibitID*, ArtworkID*, DisplayPrice)
Mark-scheme structure (8 marks roughly):
- 1 mark per correct table (4 tables = 4 marks)
- 1 mark for each correctly identified primary key
- 1 mark for each correctly identified foreign key
- 1 mark for atomic, well-named fields throughout
Quick Check
Q1. Which of these is a requirement of first normal form (1NF)?
Q2. Moving from 1NF to 2NF means eliminating which kind of dependency?
Q3. Moving from 2NF to 3NF means eliminating which kind of dependency?
Q4. Aroha's phone number appears in 47 rows of an unnormalised order table. When her number changes and one of those rows is not updated, the resulting problem is an example of which kind of anomaly?
Q5. A table is in 1NF and has a single-column primary key. Which is true?
Q6. Which scenario most strongly favours denormalising a database?
Spot the Dependency
For each scenario, type the kind of dependency violated.
Type one of: functional, partial-key, transitive.
| Scenario | Dependency type |
|---|---|
Table ORDER_LINE(OrderID, ItemID, ItemName, Qty) with composite PK (OrderID, ItemID). ItemName depends only on ItemID. | |
Table EMPLOYEE(EmpID, DeptID, DeptName) with PK EmpID. DeptName depends on DeptID, which depends on EmpID. | |
Table CUSTOMER(CustID, Email), knowing the CustID always gives you the same one Email. | |
Table ENROLMENT(StudentID, ClassID, ClassTitle) with composite PK. ClassTitle depends only on ClassID. | |
Table BOOK(BookID, PublisherID, PublisherCountry) with PK BookID. PublisherCountry depends on PublisherID, which depends on BookID. |
Fill in the Blanks
Complete the description of the three normal forms.
NORMAL FORMS
============
1NF requires every cell to hold a single, value
and that there are no repeating groups of columns.
2NF requires the table to be in 1NF and that there are no
dependencies. This is only relevant for
tables with a composite primary key.
3NF requires the table to be in 2NF and that there are no
dependencies between non-key columns.
The mnemonic phrase is "the key, the key,
and but the key."
Spot the Error
A student wrote revision notes about the three normal forms. One line contains an error. Click the line with the error, then choose the correct fix.
Pick the correct fix for line 4:
Identify the Normal Form
The table ORDER(OrderID, OrderDate, CustomerID, CustomerEmail) has primary key OrderID. The value of CustomerEmail always matches the email of the customer identified by CustomerID.
What is the highest normal form this table is currently in? (Type 1NF, 2NF, or 3NF.)
The table ORDER_LINE(OrderID, ItemID, Qty, ItemName) has composite primary key (OrderID, ItemID). ItemName always matches the name of the menu item identified by ItemID, regardless of which order it appears in.
What is the highest normal form this table is currently in? (Type 1NF, 2NF, or 3NF.)
Practice Exercises
Core
-
[Core]Normalisation[3 marks]State what is required for a table to be in each of 1NF, 2NF, and 3NF (one sentence per form). -
[Core]Normalisation[2 marks]Outline one reason why a relational database is typically normalised to 3NF. -
[Core]Normalisation[2 marks]Distinguish between 1NF and 2NF using one specific structural difference.
Extension
-
[Extension]Normalisation[4 marks]A booking table is given asBOOKING(BookingID, MemberID, MemberName, ClassID, ClassTitle, CoachName)with primary keyBookingID. Outline why this table is not in 3NF and identify the transitive dependencies that violate it. -
[Extension]Normalisation[5 marks]The following table is in 1NF with composite primary key(OrderID, ItemID):ORDER_LINE(OrderID, ItemID, OrderDate, ItemName, ItemPrice, Qty)Construct the 2NF schema by splitting the table appropriately. Show the primary key and any foreign keys.
-
[Extension]Normalisation[6 marks]Take the 2NF schema from question 5 and continue normalising. Construct the 3NF schema, identifying any transitive dependencies you remove. Show every primary key and every foreign key.
Challenge
-
[Challenge]Normalisation[8 marks]A community music school records lessons using this single table:LESSON(LessonID, StudentID, StudentName, StudentPhone, TeacherID, TeacherName, Instrument, LessonDate, RoomNumber, RoomBuilding)Assume each student takes lessons from many teachers, each teacher teaches many students, each lesson uses one room, and the room’s building is determined by the room number alone.
Construct the 3NF schema. Identify all primary keys, foreign keys, and the dependencies you removed at each step.
-
[Challenge]Normalisation[6 marks]An online news site’s homepage displays each headline alongside the author’s name and photograph. The site loads the homepage millions of times per day; authors change their photograph perhaps once a year. Evaluate whether the developers should denormalise by storing the author’s name and photo URL on each article row instead of joining to anAUTHORtable on every page load.
Exam-Style
[Exam-Style]Normalisation[5 marks]The unnormalised tableEVENT(EventName, Type, SubType, Gender, OlympicRecord, WorldRecord)has the sameTypeandSubTypefor eachEventNameregardless of gender. Construct the 2NF schema and state the partial-key dependency you removed.
Note for IB CS learners: A3.2.5 and A3.2.6 are central to most exam papers. The mark scheme for “Construct the 3NF” rewards: 1 mark per correct table with PK and FK, 1 mark for splitting the three description fields atomically, 1 mark for identifying the right foreign keys. A typical 6-8 mark construction question requires three or four tables in 3NF. A3.2.7 (denormalisation) is new in the 2027 syllabus, expect questions on read-heavy vs write-heavy trade-offs, historical-snapshot decisions, and the canonical ‘integrity vs performance’ tension.
Connections
- Previous: Data Types and Keys. Normalisation produces tables; the keys are what link them.
- Related: Entity-Relationship Diagrams. A 3NF schema can usually be drawn as an ERD with one entity per table and foreign keys as relationships.
- Next: SQL Queries.
JOINclauses traverse the foreign-key links that normalisation creates. - Forward (HL): Data Warehouses. The canonical example of a deliberately denormalised database, optimised for analytical queries rather than transactional updates.
- Forward (HL): Transactions and Views. Views can provide a denormalised-looking interface over a normalised schema, giving you the best of both worlds.