Relational Database Fundamentals
IB Syllabus: A3.1.1: Explain the features, benefits and limitations of a relational database.
Key Concepts
What a Database Is
A database is a structured, persistent collection of related data that can be queried, updated, and shared by many users and applications. The software that manages it (creating tables, enforcing rules, running queries, handling concurrent users) is called a Database Management System (DBMS). Examples include MySQL, PostgreSQL, Oracle Database, and Microsoft SQL Server.
A relational database is the most common kind. It organises data into tables of rows and columns, and uses keys to express how the tables link together. The relational model was introduced by Edgar Codd in a 1970 research paper and has dominated business systems ever since.
Core Vocabulary
| Term | Meaning | Everyday equivalent |
|---|---|---|
| Table | A two-dimensional grid that stores data about one type of thing | A single spreadsheet sheet |
| Record (or row) | One entry in a table, one specific instance of the thing | A single row of a spreadsheet |
| Field (or column, attribute) | A single piece of information that every record carries | A single column header |
| Entity | The real-world thing the table represents (Member, Class, Booking) | The name on the sheet tab |
| Instance | One occurrence of an entity (member #4291, Marlon Tan) | One row’s “real” identity |
| Schema | The structure of the database: which tables exist, which fields they have, which types those fields are, and how they link | The blueprint, not the data |
These are the words exam question stems and mark schemes use. Knowing the precise definitions saves marks on the recall items (State, Define).
“Field,” “column,” and “attribute” all mean the same thing, and so do “record” and “row”. Mark schemes accept any of them, but pick one set and stay consistent within an answer. Avoid the older term “relation” for a table, since the same word is also used for relationships (1:N, M:N), which is a different idea.
Five Core Features of the Relational Model
Five features together define what makes a database “relational”:
- Tables: data lives in named tables with a fixed set of columns. Each row in a table represents one instance of the entity that table is about.
- Primary keys: every table has a column (or small set of columns) whose value uniquely identifies each row. No two rows share the same primary key value, and a primary key value cannot be null.
- Foreign keys: a column in one table that holds the primary-key value of a row in another table. This is how the relational model expresses links between tables.
- Composite keys: a primary key made up of two or more columns together, when no single column is unique on its own (covered in detail in Data Types and Keys).
- Relationships: the logical connections between tables that the foreign keys implement. Relationships come in three shapes: one-to-one (one passport per person), one-to-many (one teacher, many students), and many-to-many (students take many classes, each class has many students).
The word “relational” comes from mathematical relation theory: in Codd’s 1970 paper, each table is modelled as a mathematical relation (a set of records). It is not named after the relationships in feature 5 between tables. Because the underlying word has two meanings in this field, prefer “table” for the data structure and “relationship” for the link between tables.
Benefits of the Relational Model
Ten benefits matter in practice. The first column uses the wording mark schemes tend to use; the second explains what each one actually buys you.
| Benefit | What it means |
|---|---|
| Data consistency | The same fact is stored in one place, so it cannot disagree with itself. Updating a customer’s address once updates it everywhere it appears. |
| Data integrity | Built-in rules (data types, primary keys, foreign keys, constraints) reject invalid data before it gets in, a date column will not accept “next Tuesday.” |
| Reduced data duplication | Repeating values (e.g. the supplier’s full address on every order) are factored out into their own tables and referenced by key. |
| Reduced redundancy | Closely related: each fact lives in exactly one place. Avoids the situation where the same fact is updated in some rows but not others. |
| Data retrieval | SQL lets you ask precise questions over millions of rows and get answers in milliseconds, much faster and more expressive than searching through a spreadsheet. |
| Concurrency control | The DBMS coordinates many users editing at once, using locks so that two simultaneous updates do not corrupt each other. |
| Reliable transaction processing | Groups of operations that must succeed or fail together (e.g. a bank transfer) are run as atomic transactions (covered in Transactions and Views at HL). |
| Security features | User accounts, roles, permissions, views, and audit logs let the database control who can see and change which rows. |
| Scalability | Well-designed relational databases handle millions to billions of rows on modest hardware; tuning options (indexes, partitioning, replication) extend that further. |
| Community support | The relational model has been the industry default for fifty years, huge knowledge bases, mature tooling, plentiful training, and a long list of stable products to choose from. |
When a question says “Explain two benefits” for a specific scenario, the mark scheme pattern is identify -> expand -> consequence per benefit, e.g. “Concurrency control allows multiple receptionists to take bookings at the same time without one overwriting the other, which is essential because the club has three front-desk staff working in parallel.” Pick benefits the scenario actually needs; do not list generic features.
Limitations of the Relational Model
Six limitations are worth discussing alongside the benefits. These are not “the relational model is bad”, they are the situations where a different approach may fit better.
| Limitation | What goes wrong |
|---|---|
| Rigid schema | The set of columns is fixed at design time. Adding a new field to a table with billions of rows is expensive, and varying the columns row-by-row is not allowed. |
| Design complexity | Producing a clean, normalised design takes skill, bad ERDs and missed normal forms lead to redundancy and update anomalies down the line. |
| Hierarchical data handling | Tree- or graph-shaped data (organisation charts, social networks, comment threads with replies) is awkward to represent and slow to traverse using joins. |
| Object-relational impedance mismatch | OOP code thinks in objects (a Member with a List<Booking> field). The database thinks in rows split across joined tables. Translating between the two takes glue code (ORMs, mappers) and often forces compromises in either the class design or the schema. |
| “Big data” scalability issues | Relational databases scale vertically very well (bigger machine) but horizontally badly (many machines). At web-scale workloads measured in petabytes, the join-and-transaction guarantees become a bottleneck. |
| Unstructured data handling | Free-text documents, images, video, and sensor streams do not fit cleanly into rows and columns; relational databases can store them as blobs but cannot query into them effectively. |
The HL extension topic Alternative Databases covers the NoSQL, cloud, spatial, and in-memory models that have grown up to address these specific limitations.
Database vs Spreadsheet
A common SL exam pattern is to ask why a real-world organisation should use a database instead of a spreadsheet. The contrast matters:
| Concern | Spreadsheet | Relational database |
|---|---|---|
| Data types | Loose, a cell can hold anything | Strict, each column has a declared type that rejects bad input |
| Duplicates | Easy and common (the same address typed into 500 rows) | Designed out by normalisation |
| Concurrent edits | One person at a time, or chaos | DBMS coordinates many writers safely |
| Querying | Filter and sort, plus formulas | Full SQL, including joins across multiple tables |
| Size | Practical limit around hundreds of thousands of rows | Comfortable with hundreds of millions; tuned databases reach billions |
| Security | All-or-nothing access; whoever opens the file sees everything | Per-user, per-row, per-column permissions |
| Data integrity | Up to the user to enforce | Enforced by primary keys, foreign keys, constraints, and transactions |
For a single user keeping a personal list, a spreadsheet is fine. For any shared, multi-user, multi-table dataset that has to stay correct over time, the trade-offs strongly favour a relational database.
The Object-Relational Impedance Mismatch
This concept is new in the 2027 syllabus and worth a careful read.
Object-oriented programs and relational databases were designed independently and represent data in fundamentally different ways:
| OOP world | Relational world |
|---|---|
| Objects have identity (the same object is the same object) | Rows are identified by their primary-key value |
| Objects hold references to other objects directly | Tables hold foreign keys that must be resolved by joins |
| Classes form inheritance hierarchies | Tables are flat, there is no built-in concept of “table B extends table A” |
| Objects can contain collections as fields | Collections are modelled as separate tables linked by foreign keys |
| Objects encapsulate behaviour (methods) alongside data | Tables hold data only; behaviour lives in the application or in stored procedures |
| Method calls follow object graphs naturally | The same traversal becomes a chain of JOINs that may be slow |
The friction caused by mapping back and forth between these two worlds is the object-relational impedance mismatch. It is the reason whole frameworks (Hibernate in Java, Entity Framework in C#, Django ORM in Python) exist purely to translate between objects and tables, and it is one of the main reasons developers sometimes choose document-oriented NoSQL databases instead.
The OOP model — one class that contains its bookings directly:
classDiagram
class Member {
-int id
-String name
-List~Booking~ bookings
}
The relational model — the same data split across three tables, linked by foreign keys:
erDiagram
MEMBER ||--o{ BOOKING : "makes"
CLASS ||--o{ BOOKING : "is booked in"
MEMBER {
int MemberID PK
string Name
}
BOOKING {
int BookingID PK
int MemberID FK
int ClassID FK
date BookedOn
}
CLASS {
int ClassID PK
string Title
datetime StartsAt
}
The single Member object with its embedded list of Booking objects becomes three separate tables joined together. Every save and load has to translate between the two shapes.
Worked Examples
Example 1: Identifying the Parts of a Schema
A small community sports club, Harbour Run, keeps the following table of members.
MEMBER table
| MemberID | FirstName | Surname | DateOfBirth | MembershipTier |
|---|---|---|---|---|
| 1001 | Aroha | Kahurangi | 2008-03-14 | Standard |
| 1002 | Marlon | Tan | 1995-07-22 | Premium |
| 1003 | Sebastián | Quispe | 1980-11-02 | Standard |
| 1004 | Priya | Nair | 2003-05-30 | Premium |
Analysis:
- Entity:
Member, the table holds one record per member. - Records: there are four (
1001to1004). - Fields: there are five (
MemberID,FirstName,Surname,DateOfBirth,MembershipTier). - Primary key:
MemberID, a non-null, unique integer assigned to each member. - Schema (notation: underline the primary key):
MEMBER(MemberID, FirstName, Surname, DateOfBirth, MembershipTier).
Example 2: Spotting a Foreign Key
Harbour Run also runs fitness classes, and tracks which member booked which class.
CLASS table
| ClassID | Title | Coach | StartsAt |
|---|---|---|---|
| C401 | Sunrise 5k | Lila Okafor | 2026-06-01 06:30 |
| C402 | Track Sprints | Lila Okafor | 2026-06-02 17:30 |
| C403 | Hill Repeats | Daniel Park | 2026-06-03 18:00 |
BOOKING table
| BookingID | MemberID | ClassID | BookedOn |
|---|---|---|---|
| B9001 | 1001 | C401 | 2026-05-28 |
| B9002 | 1002 | C401 | 2026-05-28 |
| B9003 | 1002 | C402 | 2026-05-29 |
| B9004 | 1004 | C403 | 2026-05-30 |
Analysis:
- The
BOOKINGtable’sMemberIDcolumn is a foreign key, every value in it must appear as aMemberIDinMEMBER. Likewise,ClassIDinBOOKINGis a foreign key intoCLASS. - This is how the database expresses “this booking belongs to this member, for this class” without copying the member’s full name or the class title into every booking row.
- The relationship between
MEMBERandBOOKINGis one-to-many (one member can have many bookings; one booking belongs to one member). Same forCLASSandBOOKING. - Together, the two one-to-many links resolve a many-to-many relationship between
MEMBERandCLASS(one member books many classes; each class has many members). TheBOOKINGtable is the junction table that makes this possible.
Example 3: Database vs Spreadsheet for a Real Scenario
A school librarian currently tracks book loans in a single shared spreadsheet. Each row has the book title, the borrower’s name and form class, the date borrowed, and the date due back. Two librarians edit the file from their own laptops.
Three problems a relational database would solve:
- Conflicting edits. If both librarians have the file open and each adds a loan, one of those loans will be lost when the file is next saved. A relational database serialises concurrent writes, both loans land safely. (Benefit: concurrency control.)
- Redundant student data. Every loan row repeats the borrower’s name and form class. When a student changes form class at the start of a new year, the entry has to be updated in dozens of rows by hand, and it is easy to miss one. A relational design factors
STUDENTinto its own table so the form class lives in exactly one row. (Benefit: reduced data duplication, reduced redundancy.) - No protection from bad input. A librarian could accidentally type
"Yesterday"into the Date Borrowed cell and the spreadsheet would accept it. A relationalDATEcolumn rejects anything that is not a valid calendar date. (Benefit: data integrity.)
A spreadsheet is still appropriate when there is one user, the dataset is small, and the data only needs to live for a short time. The trade-offs flip the moment more than one of those assumptions fails.
Quick Check
Q1. In a relational database, what term describes a single row in a table?
Q2. Which type of key creates a link between two tables by referring to the primary key of another table?
Q3. Two receptionists are taking bookings for the same fitness class at the same time on different laptops. Which feature of a relational database ensures their simultaneous updates do not overwrite each other?
Q4. Which best describes the object-relational impedance mismatch?
Q5. In Worked Example 2 above, what type of relationship exists between MEMBER and CLASS (resolved through the BOOKING junction table)?
Q6. Which of the following is a limitation rather than a benefit of the relational model?
Read the Schema
Use the Harbour Run schema from Worked Example 2 (MEMBER, CLASS, BOOKING) to fill in the missing values.
Schema reading practice. For each table or column listed, fill in the correct identifying term.
| Question | Answer |
|---|---|
The primary key of the MEMBER table is | |
One foreign key in the BOOKING table is | |
The other foreign key in the BOOKING table is | |
Number of records in the CLASS table | |
Number of fields in the BOOKING table | |
Relationship type between MEMBER and BOOKING (use the form 1:1, 1:M, or M:M) |
Fill in the Blanks
Complete the following statements about the relational model.
CORE TERMINOLOGY
================
A is a two-dimensional grid of rows and columns
that stores data about one type of entity.
The column (or small set of columns) that uniquely identifies each
row in a table is called the .
A column whose values must match a primary key in another table
is called a .
A relationship in which one member can book many classes AND
each class can have many members is called a
relationship.
The DBMS feature that lets multiple users update the same data
at the same time without losing changes is called
.
Spot the Error
A student wrote revision notes about keys in a relational database. One line is wrong. Click the line with the error, then choose the correct fix.
Pick the correct fix for line 3:
Identify the Term
Read the description and type the correct term.
A small intermediate table that holds two foreign keys, one into each of the two main tables, in order to resolve a many-to-many relationship into two one-to-many relationships. The BOOKING table in Worked Example 2 is an example.
Read the description and type the correct term.
The benefit of a relational database that comes from declared data types and constraints rejecting invalid input, for example, a DATE column refusing to store the value "next Tuesday", or a NOT NULL constraint refusing to accept a missing required field.
Practice Exercises
Core
[Core]Vocabulary[4 marks]Define each of the following terms in the context of a relational database:- (a) record
- (b) field
- (c) primary key
- (d) foreign key
-
[Core]Benefits and limits[6 marks]A regional medical clinic is replacing its paper-based patient record system with a relational database. Explain two benefits the clinic will gain from the new system. [Core]Benefits and limits[4 marks]Outline two reasons why a community sports club managing 800 members and weekly class bookings would use a relational database rather than a shared spreadsheet.
Extension
[Extension]Schemas[5 marks]Look at the Harbour Run schema in Worked Example 2. Identify:- (a) the primary key of each table
- (b) every foreign key in the schema, and the table it references
- (c) the type (1:1, 1:M, M:M) of each direct relationship
- (d) the entity that is represented as a junction table
[Extension]Benefits and limits[4 marks]Describe two limitations of the relational model that would push the developers of a global social media platform (with billions of users and a feed of free-text posts, images, and videos) towards a non-relational solution. Explain why each limitation matters for that specific workload.
Challenge
-
[Challenge]Benefits and limits[6 marks]A student is building a school library application in Java. They have aBookclass with aList<Loan>field, aMemberclass with aList<Loan>field, and aLoanclass withBookandMemberfields. They want to persist this data to a relational database. Explain how the object-relational impedance mismatch makes this mapping difficult. Refer to at least three differences between the OOP and relational worlds (identity vs primary key, references vs joins, collections, inheritance, or behaviour-with-data). -
[Challenge]Alternative DBs (HL)[8 marks]A start-up is choosing between a relational database and a NoSQL document store for its core product. Discuss the trade-offs they should consider. Refer to at least three benefits and three limitations of the relational model. Include a justified recommendation that depends on the type of product being built.
Exam-Style
-
[Exam-Style]Keys[3 marks]The following table holds enrolment information for an after-school programming club.ENROLMENT(EnrolmentID, StudentID, CourseID, EnrolmentDate)- (a) State the primary key of
ENROLMENT. [1] - (b) Identify the two foreign keys you would expect in this table, and the tables they point to. [2]
- (a) State the primary key of
Note for IB CS learners: A3.1.1 questions cluster around three formats: 1-mark “Define” or “State” recall items on terminology; 2-mark “Outline” items on a single benefit; and 6-mark “Explain two” items where each benefit must be developed across three marks (identify, expand, consequence). Practising the 3-mark expansion structure on a real scenario is the single best preparation.
Connections
- Next: Database Schemas. The three layers (conceptual, logical, physical) that separate “what we model” from “how it is stored.”
- Related: Data Types and Keys. The detailed treatment of primary, foreign, composite and concatenated keys introduced here.
- Related: Entity-Relationship Diagrams. The visual notation for tables, relationships, cardinality, and modality.
- Related: Normalisation. The process that produces a clean, low-redundancy set of tables from a messy starting point.
- Forward (HL): Alternative Databases, NoSQL, cloud, spatial, and in-memory models that address the relational limitations covered here.
- Programming: OOP Fundamentals. The object world that the object-relational impedance mismatch tries to bridge.