Data Types and Keys

IB Syllabus: A3.2.3: Outline the data types used in relational databases. A3.2.4: Construct tables for relational databases (primary, foreign, composite, concatenated keys).

Key Concepts

Why Data Types Matter

A data type declares what kind of values a column is allowed to hold, whole numbers only, text up to a certain length, true/false, a calendar date, and so on. The DBMS enforces the declared type: a DATE column refuses to store the value "next Tuesday", an INT column refuses "hello", and a BOOLEAN column refuses 42.

Choosing the right type is a design decision with three knock-on effects:

Validity. Bad data is rejected at the boundary, before it has a chance to corrupt downstream calculations.
Operations. The type determines what you can do with the value. Numeric types support arithmetic and range comparison; date types support chronological sorting and date arithmetic; strings support pattern matching with LIKE.
Storage. A 4-byte integer takes less space than a 20-character string of the same number, and indexes on small fixed-size types are faster.

There is no single standard list of types to memorise (different DBMSs offer slightly different palettes), but you should recognise the categories below and be able to pick an appropriate type for a given attribute.

The Main Data Type Categories

Category	Examples in MySQL/PostgreSQL	Use it for
Whole numbers (integer)	`INT`, `BIGINT`, `SMALLINT`, `TINYINT`	Counts, IDs, ranks, ages in whole years, pay grade
Decimal numbers	`DECIMAL(p, s)`, `NUMERIC`	Money, exact measurements where precision matters
Floating-point numbers	`FLOAT`, `REAL`, `DOUBLE PRECISION`	Scientific values, ratios, approximate measurements
Fixed-length text	`CHAR(n)`	Codes that are always exactly n characters (e.g. country codes `"GB"`, `"NZ"`)
Variable-length text	`VARCHAR(n)`, `TEXT`	Names, addresses, descriptions, IDs containing letters
Boolean	`BOOLEAN`, `BIT`	True/false flags (is_active, is_completed)
Date / time	`DATE`, `TIME`, `DATETIME`, `TIMESTAMP`	Birth date, booking date and time, server log entries
Binary / blob	`BLOB`, `BYTEA`	Images, PDFs, raw byte data (use sparingly, consider object storage instead)

CHAR(n) always uses n characters of storage; VARCHAR(n) uses only what is needed plus a small overhead, with n as the maximum. DECIMAL(p, s) stores a number with p total digits and s of them after the decimal point, DECIMAL(7, 2) accepts values up to 99999.99 with no rounding.

Picking the Right Type: Three Standard Traps

Trap 1: leading zeros lost in a numeric type.

UK phone numbers start with 0. A British mobile number 07911123456 stored in an INT column becomes 7911123456, the leading zero is gone and the number is wrong. The fix is to store phone numbers as strings (VARCHAR), not numbers, because they are identifiers you never do arithmetic on.

The same trap catches student IDs that begin with a letter (S1023), postcodes (SW1A 1AA), and most product codes. Anything that looks like a number but is really an identifier should be a string.

Trap 2: decimal precision lost in an integer type.

A column storing times of 9.63, 19.30, 22.52 cannot be INT, the decimal point disappears and 19.30 becomes 19. Use DECIMAL (exact) or FLOAT/REAL (approximate, faster). For money, prefer DECIMAL so rounding errors do not accumulate.

Trap 3: dates stored as strings in the wrong order.

A date stored as the string "15/06/2026" cannot be sorted chronologically, the DBMS sorts strings character-by-character, so "15/06/2026" sorts after "14/12/2025". Either:

Use a real DATE column (preferred, supports date arithmetic and validation), or
Store dates as strings in ISO 8601 format ("2026-06-15"), which happens to sort chronologically because the year comes first.

Importance of Data-Type Consistency

When two columns in different tables hold the same kind of value (for example, a MemberID in MEMBER and a MemberID in BOOKING), they must have the same declared type. If one is INT and the other is VARCHAR(8), the database cannot enforce the foreign key cleanly and joins become slow or impossible.

Consistency also matters across rows of the same table: if a Price column is DECIMAL(7, 2), every row stores its price to the same precision. A column with mixed precision (some rows storing 12.5, some 12.500) is a symptom that the type was not chosen carefully.

Justifying a Type Choice in an Exam

When a question says “Outline why X is an appropriate data type for the Y field”, the mark scheme rewards three angles:

Domain fit: the type matches the nature of the value (PayGrade is a whole number, so INT).
Operational benefit: the type enables an operation you need (INT supports sorting, arithmetic, range queries).
Storage / efficiency: the type uses less space than the alternatives (a 4-byte INT is smaller than a 5-character string).

For “outline why the wrong type causes a problem”, flip the same three angles: domain mismatch (leading zero lost), operation broken (cannot sort dates), storage waste (storing a 4-byte number as a 20-character string).

Keys: The Identifiers That Link Tables

A key in a relational database is a column (or small group of columns) used to identify rows uniquely or to link tables together. The four kinds you need to know for A3.2.4 are:

Key type	Purpose
Primary key (PK)	Uniquely identifies each row within its own table.
Foreign key (FK)	Holds the primary-key value of a row in another table, creating a link between the two.
Composite key	A primary key made of two or more columns together. Used when no single column is unique on its own.
Concatenated key	A primary key whose value is built by joining several pieces of information into one string. Closely related to composite keys; see below.

Primary Keys

A primary key is the column whose value tells the DBMS “this is row X and no other.” It has two strict rules:

Unique: no two rows share the same primary-key value.
Not null: every row must have a value; missing or unknown is not allowed.

Examples: MemberID in a MEMBER table, ISBN in a BOOK table, OrderID in an ORDER table.

Primary keys come in two flavours:

Type	What it is	Trade-off
Natural key	A real-world identifier with meaning to humans (`ISBN`, `Email`, `Passport_Number`)	Familiar and human-readable, but may change over time (people change email addresses), and may not be globally unique (ISBN switched from 10 to 13 digits in 2007)
Surrogate key	An artificial identifier generated by the database (`MemberID` as an auto-increment integer)	Always unique, never changes, fast for joins; but meaningless to humans and exposes nothing about the row

Most modern designs use surrogate keys for primary keys (small, stable, easy to join on) and add unique constraints on natural identifiers like email separately.

Foreign Keys

A foreign key is a column whose values must match a primary-key value in another table. It is the mechanism by which one table refers to another.

In the Harbour Run schema:

BOOKING.MemberID is a foreign key referencing MEMBER.MemberID.
BOOKING.ClassID is a foreign key referencing CLASS.ClassID.

Foreign keys enforce referential integrity: the DBMS rejects any attempt to insert a booking whose MemberID does not exist in MEMBER. This is one of the headline benefits of the relational model, you cannot accidentally create orphan rows that point nowhere.

Referential integrity also covers what happens when a referenced row is updated or deleted. Common options are:

CASCADE: propagate the change to dependent rows. Deleting a member also deletes their bookings.
SET NULL: set the foreign-key column to NULL in dependent rows. The bookings survive but no longer belong to anyone.
RESTRICT / NO ACTION: block the delete or update if there are dependent rows. The DBMS refuses the operation until the dependents are dealt with first.

The DBA picks one of these for each foreign-key constraint based on the business meaning of the relationship. Bookings probably should cascade-delete with the member; orders probably should not be deleted with the customer because a deleted customer’s order history still matters for accounting.

Composite Keys

A composite key is a primary key made of two or more columns together, when no single column is unique on its own. The combination must be unique; individual columns may repeat.

Example: a junction table BOOKING_ROOM records which room each booking uses, and the same member may use different rooms across different bookings. The natural primary key is (BookingID, RoomID) together.

A common scenario where composite keys are clearly needed is an attendance log:

ATTENDANCE composite primary key: (StudentID, ClassID, Date)

StudentID	ClassID	Date	Status	Note
1001	C401	2026-05-28	present
1001	C401	2026-05-29	absent	same student, same class, different date
1002	C401	2026-05-28	present

Without Date in the key, a student could only have one attendance record per class, which is wrong. With the date included, the three columns together uniquely identify every row.

Mark-scheme reasoning for “explain why X is included in the composite primary key” follows a standard pattern:

Without column X, two rows could share the same key values.
Give a concrete example (same nurse, same patient, different days; or same student, same class, different dates).
State that adding X restores uniqueness.

Concatenated Keys

A concatenated key is closely related to a composite key. Two interpretations appear in the literature; both are commonly accepted:

Concatenated = composite. Many sources use the two terms interchangeably, a primary key formed from two or more columns. Mark schemes routinely accept either word.
Concatenated = combined into one column. Some traditions reserve “concatenated key” for the case where you take the values from several columns and join them into a single string (e.g. "S1001-C401-20260528") which becomes the primary key in its own right.

Treat composite and concatenated as the same idea: a key that combines multiple pieces of information to guarantee uniqueness. Both terms are accepted in exam answers, and most resources do not draw a hard distinction between them.

Putting It Together: Table Construction

A worked construction of a small schema, showing all four key types:

erDiagram
    STUDENT ||--o{ ENROLMENT : "enrols in"
    CLASS   ||--o{ ENROLMENT : "is taken by"
    STUDENT {
        int StudentID PK
        string FirstName
        string Surname
        date DOB
    }
    CLASS {
        string ClassID PK
        string Title
        string Coach
    }
    ENROLMENT {
        int StudentID PK "FK to STUDENT"
        string ClassID PK "FK to CLASS"
        date EnrolledOn
    }

STUDENT.StudentID and CLASS.ClassID are simple primary keys (one column each).
ENROLMENT has a composite primary key of (StudentID, ClassID) because the combination of those two columns uniquely identifies each enrolment record.
The same two columns are each foreign keys, StudentID references STUDENT, ClassID references CLASS. The same column can play two roles at once.
EnrolledOn is a regular attribute of the enrolment itself.

This is the canonical pattern: junction tables get composite primary keys made of foreign keys pointing into the entities they link.

Worked Examples

Example 1: Picking Types for a Bookshop Schema

A small bookshop tracks its inventory in a BOOK table with these columns:

Column	ISBN	Title	Author	Price	Stock
Sample value	`?`	`?`	`?`	`?`	`?`

Justify the data type for each column:

Column	Type	Reasoning
`ISBN`	`VARCHAR(13)` (or `CHAR(13)` if always 13 digits)	A 13-digit identifier with no arithmetic. Numeric type would drop leading zeros and miss the dash if `978-0...` formatting is kept.
`Title`	`VARCHAR(200)`	Variable-length text, much shorter than the maximum on average. Cannot use a numeric type.
`Author`	`VARCHAR(120)`	Same reasoning. (In a normalised design, `Author` would be a foreign key to an `AUTHOR` table, but the question is about types, not normalisation.)
`Price`	`DECIMAL(6, 2)`	Money needs exact precision, `FLOAT` can lose pennies due to rounding. `DECIMAL(6, 2)` accepts values up to `9999.99`, more than enough for a book.
`Stock`	`INT`	A whole number with no upper bound assumption, supports arithmetic for stock-takes and ordering reorder reports.

Example 2: Composite Primary Key

A music streaming service tracks every play of every track by every user. The natural primary key is not a surrogate PlayID, it is the combination of who, what, and when:

PLAY composite primary key: (UserID, TrackID, PlayedAt)

UserID	TrackID	PlayedAt	DurationSec
9012	T0451	2026-05-22 18:14:03	207
9012	T0451	2026-05-22 19:02:47	207
9012	T0892	2026-05-22 18:18:00	184
9013	T0451	2026-05-22 18:14:03	207

Question: Why is PlayedAt part of the composite key?

Answer (mark-scheme style):

Without PlayedAt, the same user playing the same track twice would produce two rows with identical (UserID, TrackID), violating uniqueness.
The first two rows in the table above are exactly that case (user 9012 played T0451 at 18:14 and again at 19:02).
Including PlayedAt distinguishes the two plays, so the composite key (UserID, TrackID, PlayedAt) is unique on every row.

The two foreign keys (UserID -> USER, TrackID -> TRACK) are part of the composite primary key but each also has its own role for referential integrity.

Example 3: Choosing the Wrong Type, Consequences

A school sets up a phone-number column for emergency contacts. The developer picks INT because “phone numbers are made of digits.”

What goes wrong:

Leading zero lost. 0207 946 0958 becomes 2079460958, the school’s switchboard cannot reach the number stored.
Formatting lost. Spaces, dashes, and country codes (+44 20...) all disappear.
Overflow. An INT in MySQL holds up to about 2.1 billion; international numbers exceed that and either truncate or trigger an error.
No partial search. Searching for “all numbers starting with 020” needs string operations, which a numeric type does not support.

The fix is VARCHAR(20) or similar, with the international prefix included if relevant. This is a classic exam question that appears in many guises: phone-number formatting, student-ID with letter prefix, school phone numbers with leading zeros.

Quick Check

Q1. Which type of key uniquely identifies each row within its own table?

Q2. Which is the most appropriate data type for a UK phone number stored as 0207 946 0958?

Q3. Which is the most appropriate data type for a price column storing values like 19.99 and 120.50?

Q4. An ENROLMENT table has columns StudentID, ClassID, and EnrolledOn. Each student-class pair appears at most once. Which best describes the primary key?

Q5. Which feature ensures that every foreign-key value matches an existing primary-key value in the referenced table?

Q6. A column stores dates as the strings '15/06/2026' and '14/12/2025'. Why is this a poor design choice?

Pick the Type

For each column, type the most appropriate data-type category, one of INT, DECIMAL, VARCHAR, DATE, or BOOLEAN.

Column	Best type
`MemberID`, auto-increment whole number starting at 1001
`PostCode`, values like `SW1A 1AA`, `EH3 6BX`
`UnitPrice`, values like `9.99`, `120.50`
`DateOfBirth`, a calendar date
`IsActive`, true if the membership is currently active
`PayGrade`, a whole number from 1 to 12 used for sorting and arithmetic

Fill in the Blanks

Complete the description of relational keys.

KEYS IN A RELATIONAL DATABASE
=============================
A  uniquely identifies each row within its table.
It cannot be null and cannot be duplicated.

A  is a column whose values must match a primary key
in another table, creating a link between the two.

A  is a primary key made of two or more columns
together, used when no single column is unique on its own.

The rule that every foreign-key value must match an existing
primary-key value is called .

A  is an artificial identifier generated by the
database (e.g. an auto-increment integer), as opposed to a
real-world identifier like an email address.

Spot the Error

A junior developer wrote a CREATE TABLE statement for a contacts list. One line has the wrong data type. Click the line with the error, then choose the correct fix.

1CREATE TABLE Contact ( 2 ContactID INT PRIMARY KEY, Phone INT NOT NULL, 3 FullName VARCHAR(120) NOT NULL, 4 DateOfBirth DATE, 5 IsPrimary BOOLEAN DEFAULT FALSE

Pick the correct fix for line 2:

Identify the Key Type

"BOOKING.MemberID is a column whose values must always match an existing MemberID value in the MEMBER table. It is the mechanism by which the BOOKING table links to MEMBER."

Which kind of key is BOOKING.MemberID? (Type the answer, e.g. primary key.)

"An ATTENDANCE table uses (StudentID, ClassID, Date) together as its primary key. No single one of those three columns is unique on its own, but the combination of all three is."

What kind of key is this? (Type the answer.)

Practice Exercises

Core

[Core] Data types [4 marks] For each column, state the most appropriate data type and justify your choice in one short sentence.
- (a) StudentID, values like S1023, S1024
- (b) Height_cm, a person’s height to one decimal place
- (c) IsActive, true if the student is currently enrolled
- (d) DateOfBirth, a calendar date
[Core] Data types [2 marks] A staff database has a PayGrade column holding values 1 to 12. Outline why integer is an appropriate data type for PayGrade.
[Core] Keys [3 marks] Look at the following table:

RENTAL(RentalID, CustomerID, CarID, StartDate, ReturnDate)
- (a) State the primary key.
- (b) Identify the two foreign keys you would expect, and the tables they reference.

Extension

[Extension] Data types [2 marks] An athletics database has an OlympicRecord column holding values like 9.63 and 19.30 (seconds). Outline why integer is not an appropriate data type for this column.
[Extension] Keys [3 marks] An attendance system has the table ATTENDANCE(StudentID, ClassID, Date, Status) with primary key (StudentID, ClassID, Date). Explain why all three columns are needed in the composite primary key.
[Extension] Data types [3 marks] A booking system stores BookingDate as the string "dd/mm/yyyy". Outline two problems this causes and suggest a better approach.

Challenge

[Challenge] Keys [6 marks] A school library is moving from a spreadsheet to a relational database. It needs to track MEMBER, BOOK, and LOAN. Construct a schema for the three tables, showing the primary key of each, all foreign keys, and any composite keys. State the data type you would use for every column and briefly justify each non-obvious choice.
[Challenge] Keys [6 marks] Discuss the advantages and disadvantages of using a surrogate primary key (e.g. an auto-increment integer) compared with a natural primary key (e.g. email address) for a USER table. Refer to uniqueness, stability over time, join performance, and human readability.

Note for IB CS learners: A3.2.4 recall items (State the primary key, Identify the foreign key) are 1-mark gifts, scan the schema, find the underlined or labelled column, write the name. Composite-key explanation items (2-3 marks) reward the identify -> example -> uniqueness structure shown above.

Connections

Previous: Entity-Relationship Diagrams. The ERD shows which entities link to which; the keys are how that link is actually stored.
Related: Relational Database Fundamentals. The overall vocabulary of tables, rows, and keys.
Next: Normalisation. The process of arranging columns into tables so each fact lives in one place, with the right keys to glue them back together.
Forward: SQL Queries. JOIN clauses follow the foreign-key links defined here.
Forward (HL): Transactions and Views. Transactions are how the DBMS preserves referential integrity under concurrent updates.