Data Types and Keys
IB Syllabus: A3.2.3: Outline the data types used in relational databases. A3.2.4: Construct tables for relational databases (primary, foreign, composite, concatenated keys).
Key Concepts
Why Data Types Matter
A data type declares what kind of values a column is allowed to hold, whole numbers only, text up to a certain length, true/false, a calendar date, and so on. The DBMS enforces the declared type: a DATE column refuses to store the value "next Tuesday", an INT column refuses "hello", and a BOOLEAN column refuses 42.
Choosing the right type is a design decision with three knock-on effects:
- Validity. Bad data is rejected at the boundary, before it has a chance to corrupt downstream calculations.
- Operations. The type determines what you can do with the value. Numeric types support arithmetic and range comparison; date types support chronological sorting and date arithmetic; strings support pattern matching with
LIKE. - Storage. A 4-byte integer takes less space than a 20-character string of the same number, and indexes on small fixed-size types are faster.
There is no single standard list of types to memorise (different DBMSs offer slightly different palettes), but you should recognise the categories below and be able to pick an appropriate type for a given attribute.
The Main Data Type Categories
| Category | Examples in MySQL/PostgreSQL | Use it for |
|---|---|---|
| Whole numbers (integer) | INT, BIGINT, SMALLINT, TINYINT | Counts, IDs, ranks, ages in whole years, pay grade |
| Decimal numbers | DECIMAL(p, s), NUMERIC | Money, exact measurements where precision matters |
| Floating-point numbers | FLOAT, REAL, DOUBLE PRECISION | Scientific values, ratios, approximate measurements |
| Fixed-length text | CHAR(n) | Codes that are always exactly n characters (e.g. country codes "GB", "NZ") |
| Variable-length text | VARCHAR(n), TEXT | Names, addresses, descriptions, IDs containing letters |
| Boolean | BOOLEAN, BIT | True/false flags (is_active, is_completed) |
| Date / time | DATE, TIME, DATETIME, TIMESTAMP | Birth date, booking date and time, server log entries |
| Binary / blob | BLOB, BYTEA | Images, PDFs, raw byte data (use sparingly, consider object storage instead) |
CHAR(n) always uses n characters of storage; VARCHAR(n) uses only what is needed plus a small overhead, with n as the maximum. DECIMAL(p, s) stores a number with p total digits and s of them after the decimal point, DECIMAL(7, 2) accepts values up to 99999.99 with no rounding.
Picking the Right Type: Three Standard Traps
Trap 1: leading zeros lost in a numeric type.
UK phone numbers start with 0. A British mobile number 07911123456 stored in an INT column becomes 7911123456, the leading zero is gone and the number is wrong. The fix is to store phone numbers as strings (VARCHAR), not numbers, because they are identifiers you never do arithmetic on.
The same trap catches student IDs that begin with a letter (S1023), postcodes (SW1A 1AA), and most product codes. Anything that looks like a number but is really an identifier should be a string.
Trap 2: decimal precision lost in an integer type.
A column storing times of 9.63, 19.30, 22.52 cannot be INT, the decimal point disappears and 19.30 becomes 19. Use DECIMAL (exact) or FLOAT/REAL (approximate, faster). For money, prefer DECIMAL so rounding errors do not accumulate.
Trap 3: dates stored as strings in the wrong order.
A date stored as the string "15/06/2026" cannot be sorted chronologically, the DBMS sorts strings character-by-character, so "15/06/2026" sorts after "14/12/2025". Either:
- Use a real
DATEcolumn (preferred, supports date arithmetic and validation), or - Store dates as strings in ISO 8601 format (
"2026-06-15"), which happens to sort chronologically because the year comes first.
Importance of Data-Type Consistency
When two columns in different tables hold the same kind of value (for example, a MemberID in MEMBER and a MemberID in BOOKING), they must have the same declared type. If one is INT and the other is VARCHAR(8), the database cannot enforce the foreign key cleanly and joins become slow or impossible.
Consistency also matters across rows of the same table: if a Price column is DECIMAL(7, 2), every row stores its price to the same precision. A column with mixed precision (some rows storing 12.5, some 12.500) is a symptom that the type was not chosen carefully.
Justifying a Type Choice in an Exam
When a question says “Outline why X is an appropriate data type for the Y field”, the mark scheme rewards three angles:
- Domain fit: the type matches the nature of the value (
PayGradeis a whole number, soINT). - Operational benefit: the type enables an operation you need (
INTsupports sorting, arithmetic, range queries). - Storage / efficiency: the type uses less space than the alternatives (a 4-byte
INTis smaller than a 5-character string).
For “outline why the wrong type causes a problem”, flip the same three angles: domain mismatch (leading zero lost), operation broken (cannot sort dates), storage waste (storing a 4-byte number as a 20-character string).
Keys: The Identifiers That Link Tables
A key in a relational database is a column (or small group of columns) used to identify rows uniquely or to link tables together. The four kinds you need to know for A3.2.4 are:
| Key type | Purpose |
|---|---|
| Primary key (PK) | Uniquely identifies each row within its own table. |
| Foreign key (FK) | Holds the primary-key value of a row in another table, creating a link between the two. |
| Composite key | A primary key made of two or more columns together. Used when no single column is unique on its own. |
| Concatenated key | A primary key whose value is built by joining several pieces of information into one string. Closely related to composite keys; see below. |
Primary Keys
A primary key is the column whose value tells the DBMS “this is row X and no other.” It has two strict rules:
- Unique: no two rows share the same primary-key value.
- Not null: every row must have a value; missing or unknown is not allowed.
Examples: MemberID in a MEMBER table, ISBN in a BOOK table, OrderID in an ORDER table.
Primary keys come in two flavours:
| Type | What it is | Trade-off |
|---|---|---|
| Natural key | A real-world identifier with meaning to humans (ISBN, Email, Passport_Number) | Familiar and human-readable, but may change over time (people change email addresses), and may not be globally unique (ISBN switched from 10 to 13 digits in 2007) |
| Surrogate key | An artificial identifier generated by the database (MemberID as an auto-increment integer) | Always unique, never changes, fast for joins; but meaningless to humans and exposes nothing about the row |
Most modern designs use surrogate keys for primary keys (small, stable, easy to join on) and add unique constraints on natural identifiers like email separately.
Foreign Keys
A foreign key is a column whose values must match a primary-key value in another table. It is the mechanism by which one table refers to another.
In the Harbour Run schema:
BOOKING.MemberIDis a foreign key referencingMEMBER.MemberID.BOOKING.ClassIDis a foreign key referencingCLASS.ClassID.
Foreign keys enforce referential integrity: the DBMS rejects any attempt to insert a booking whose MemberID does not exist in MEMBER. This is one of the headline benefits of the relational model, you cannot accidentally create orphan rows that point nowhere.
Referential integrity also covers what happens when a referenced row is updated or deleted. Common options are:
- CASCADE: propagate the change to dependent rows. Deleting a member also deletes their bookings.
- SET NULL: set the foreign-key column to
NULLin dependent rows. The bookings survive but no longer belong to anyone. - RESTRICT / NO ACTION: block the delete or update if there are dependent rows. The DBMS refuses the operation until the dependents are dealt with first.
The DBA picks one of these for each foreign-key constraint based on the business meaning of the relationship. Bookings probably should cascade-delete with the member; orders probably should not be deleted with the customer because a deleted customer’s order history still matters for accounting.
Composite Keys
A composite key is a primary key made of two or more columns together, when no single column is unique on its own. The combination must be unique; individual columns may repeat.
Example: a junction table BOOKING_ROOM records which room each booking uses, and the same member may use different rooms across different bookings. The natural primary key is (BookingID, RoomID) together.
A common scenario where composite keys are clearly needed is an attendance log:
ATTENDANCE composite primary key: (StudentID, ClassID, Date)
| StudentID | ClassID | Date | Status | Note |
|---|---|---|---|---|
| 1001 | C401 | 2026-05-28 | present | |
| 1001 | C401 | 2026-05-29 | absent | same student, same class, different date |
| 1002 | C401 | 2026-05-28 | present |
Without Date in the key, a student could only have one attendance record per class, which is wrong. With the date included, the three columns together uniquely identify every row.
Mark-scheme reasoning for “explain why X is included in the composite primary key” follows a standard pattern:
- Without column X, two rows could share the same key values.
- Give a concrete example (same nurse, same patient, different days; or same student, same class, different dates).
- State that adding X restores uniqueness.
Concatenated Keys
A concatenated key is closely related to a composite key. Two interpretations appear in the literature; both are commonly accepted:
- Concatenated = composite. Many sources use the two terms interchangeably, a primary key formed from two or more columns. Mark schemes routinely accept either word.
- Concatenated = combined into one column. Some traditions reserve “concatenated key” for the case where you take the values from several columns and join them into a single string (e.g.
"S1001-C401-20260528") which becomes the primary key in its own right.
Treat composite and concatenated as the same idea: a key that combines multiple pieces of information to guarantee uniqueness. Both terms are accepted in exam answers, and most resources do not draw a hard distinction between them.
Putting It Together: Table Construction
A worked construction of a small schema, showing all four key types:
erDiagram
STUDENT ||--o{ ENROLMENT : "enrols in"
CLASS ||--o{ ENROLMENT : "is taken by"
STUDENT {
int StudentID PK
string FirstName
string Surname
date DOB
}
CLASS {
string ClassID PK
string Title
string Coach
}
ENROLMENT {
int StudentID PK "FK to STUDENT"
string ClassID PK "FK to CLASS"
date EnrolledOn
}
STUDENT.StudentIDandCLASS.ClassIDare simple primary keys (one column each).ENROLMENThas a composite primary key of(StudentID, ClassID)because the combination of those two columns uniquely identifies each enrolment record.- The same two columns are each foreign keys,
StudentIDreferencesSTUDENT,ClassIDreferencesCLASS. The same column can play two roles at once. EnrolledOnis a regular attribute of the enrolment itself.
This is the canonical pattern: junction tables get composite primary keys made of foreign keys pointing into the entities they link.
Worked Examples
Example 1: Picking Types for a Bookshop Schema
A small bookshop tracks its inventory in a BOOK table with these columns:
| Column | ISBN | Title | Author | Price | Stock |
|---|---|---|---|---|---|
| Sample value | ? | ? | ? | ? | ? |
Justify the data type for each column:
| Column | Type | Reasoning |
|---|---|---|
ISBN | VARCHAR(13) (or CHAR(13) if always 13 digits) | A 13-digit identifier with no arithmetic. Numeric type would drop leading zeros and miss the dash if 978-0... formatting is kept. |
Title | VARCHAR(200) | Variable-length text, much shorter than the maximum on average. Cannot use a numeric type. |
Author | VARCHAR(120) | Same reasoning. (In a normalised design, Author would be a foreign key to an AUTHOR table, but the question is about types, not normalisation.) |
Price | DECIMAL(6, 2) | Money needs exact precision, FLOAT can lose pennies due to rounding. DECIMAL(6, 2) accepts values up to 9999.99, more than enough for a book. |
Stock | INT | A whole number with no upper bound assumption, supports arithmetic for stock-takes and ordering reorder reports. |
Example 2: Composite Primary Key
A music streaming service tracks every play of every track by every user. The natural primary key is not a surrogate PlayID, it is the combination of who, what, and when:
PLAY composite primary key: (UserID, TrackID, PlayedAt)
| UserID | TrackID | PlayedAt | DurationSec |
|---|---|---|---|
| 9012 | T0451 | 2026-05-22 18:14:03 | 207 |
| 9012 | T0451 | 2026-05-22 19:02:47 | 207 |
| 9012 | T0892 | 2026-05-22 18:18:00 | 184 |
| 9013 | T0451 | 2026-05-22 18:14:03 | 207 |
Question: Why is PlayedAt part of the composite key?
Answer (mark-scheme style):
- Without
PlayedAt, the same user playing the same track twice would produce two rows with identical(UserID, TrackID), violating uniqueness. - The first two rows in the table above are exactly that case (user
9012playedT0451at 18:14 and again at 19:02). - Including
PlayedAtdistinguishes the two plays, so the composite key(UserID, TrackID, PlayedAt)is unique on every row.
The two foreign keys (UserID -> USER, TrackID -> TRACK) are part of the composite primary key but each also has its own role for referential integrity.
Example 3: Choosing the Wrong Type, Consequences
A school sets up a phone-number column for emergency contacts. The developer picks INT because “phone numbers are made of digits.”
What goes wrong:
- Leading zero lost.
0207 946 0958becomes2079460958, the school’s switchboard cannot reach the number stored. - Formatting lost. Spaces, dashes, and country codes (
+44 20...) all disappear. - Overflow. An
INTin MySQL holds up to about 2.1 billion; international numbers exceed that and either truncate or trigger an error. - No partial search. Searching for “all numbers starting with 020” needs string operations, which a numeric type does not support.
The fix is VARCHAR(20) or similar, with the international prefix included if relevant. This is a classic exam question that appears in many guises: phone-number formatting, student-ID with letter prefix, school phone numbers with leading zeros.
Quick Check
Q1. Which type of key uniquely identifies each row within its own table?
Q2. Which is the most appropriate data type for a UK phone number stored as 0207 946 0958?
Q3. Which is the most appropriate data type for a price column storing values like 19.99 and 120.50?
Q4. An ENROLMENT table has columns StudentID, ClassID, and EnrolledOn. Each student-class pair appears at most once. Which best describes the primary key?
Q5. Which feature ensures that every foreign-key value matches an existing primary-key value in the referenced table?
Q6. A column stores dates as the strings '15/06/2026' and '14/12/2025'. Why is this a poor design choice?
Pick the Type
For each column, type the most appropriate data-type category, one of INT, DECIMAL, VARCHAR, DATE, or BOOLEAN.
| Column | Best type |
|---|---|
MemberID, auto-increment whole number starting at 1001 | |
PostCode, values like SW1A 1AA, EH3 6BX | |
UnitPrice, values like 9.99, 120.50 | |
DateOfBirth, a calendar date | |
IsActive, true if the membership is currently active | |
PayGrade, a whole number from 1 to 12 used for sorting and arithmetic |
Fill in the Blanks
Complete the description of relational keys.
KEYS IN A RELATIONAL DATABASE
=============================
A uniquely identifies each row within its table.
It cannot be null and cannot be duplicated.
A is a column whose values must match a primary key
in another table, creating a link between the two.
A is a primary key made of two or more columns
together, used when no single column is unique on its own.
The rule that every foreign-key value must match an existing
primary-key value is called .
A is an artificial identifier generated by the
database (e.g. an auto-increment integer), as opposed to a
real-world identifier like an email address.
Spot the Error
A junior developer wrote a CREATE TABLE statement for a contacts list. One line has the wrong data type. Click the line with the error, then choose the correct fix.
Pick the correct fix for line 2:
Identify the Key Type
"BOOKING.MemberID is a column whose values must always match an existing MemberID value in the MEMBER table. It is the mechanism by which the BOOKING table links to MEMBER."
Which kind of key is BOOKING.MemberID? (Type the answer, e.g. primary key.)
"An ATTENDANCE table uses (StudentID, ClassID, Date) together as its primary key. No single one of those three columns is unique on its own, but the combination of all three is."
What kind of key is this? (Type the answer.)
Practice Exercises
Core
[Core]Data types[4 marks]For each column, state the most appropriate data type and justify your choice in one short sentence.- (a)
StudentID, values likeS1023,S1024 - (b)
Height_cm, a person’s height to one decimal place - (c)
IsActive, true if the student is currently enrolled - (d)
DateOfBirth, a calendar date
- (a)
-
[Core]Data types[2 marks]A staff database has aPayGradecolumn holding values 1 to 12. Outline why integer is an appropriate data type forPayGrade. -
[Core]Keys[3 marks]Look at the following table:RENTAL(RentalID, CustomerID, CarID, StartDate, ReturnDate)- (a) State the primary key.
- (b) Identify the two foreign keys you would expect, and the tables they reference.
Extension
-
[Extension]Data types[2 marks]An athletics database has anOlympicRecordcolumn holding values like9.63and19.30(seconds). Outline why integer is not an appropriate data type for this column. -
[Extension]Keys[3 marks]An attendance system has the tableATTENDANCE(StudentID, ClassID, Date, Status)with primary key(StudentID, ClassID, Date). Explain why all three columns are needed in the composite primary key. -
[Extension]Data types[3 marks]A booking system storesBookingDateas the string"dd/mm/yyyy". Outline two problems this causes and suggest a better approach.
Challenge
-
[Challenge]Keys[6 marks]A school library is moving from a spreadsheet to a relational database. It needs to trackMEMBER,BOOK, andLOAN. Construct a schema for the three tables, showing the primary key of each, all foreign keys, and any composite keys. State the data type you would use for every column and briefly justify each non-obvious choice. -
[Challenge]Keys[6 marks]Discuss the advantages and disadvantages of using a surrogate primary key (e.g. an auto-increment integer) compared with a natural primary key (e.g. email address) for aUSERtable. Refer to uniqueness, stability over time, join performance, and human readability.
Note for IB CS learners: A3.2.4 recall items (
State the primary key,Identify the foreign key) are 1-mark gifts, scan the schema, find the underlined or labelled column, write the name. Composite-key explanation items (2-3 marks) reward the identify -> example -> uniqueness structure shown above.
Connections
- Previous: Entity-Relationship Diagrams. The ERD shows which entities link to which; the keys are how that link is actually stored.
- Related: Relational Database Fundamentals. The overall vocabulary of tables, rows, and keys.
- Next: Normalisation. The process of arranging columns into tables so each fact lives in one place, with the right keys to glue them back together.
- Forward: SQL Queries.
JOINclauses follow the foreign-key links defined here. - Forward (HL): Transactions and Views. Transactions are how the DBMS preserves referential integrity under concurrent updates.