Databases Layer
IB Syllabus: A3: Databases (relational fundamentals, design, programming, alternative approaches)
Overview
A database is an organised, persistent collection of related data that can be queried, updated, and shared by many applications and users at the same time. Almost every system you interact with on a normal day, school timetables, online shopping carts, hospital records, train booking portals, social media feeds, is backed by a database.
The Databases layer covers four ideas:
- What a relational database is: the model used by the overwhelming majority of business systems, where data is stored in linked tables of rows and columns.
- How to design one: schemas, entity-relationship diagrams, data types, keys, and the process of normalisation that produces a clean, non-redundant set of tables.
- How to talk to one, SQL (Structured Query Language), the standard language for asking questions and making changes.
- How alternative approaches differ (HL), NoSQL, cloud, spatial, and in-memory databases; data warehouses for analytical work; and distributed databases for scale and resilience.
By the end of these pages you should be able to design a normalised relational database for a real-world scenario, write SQL queries that combine several tables, and reason about when a relational approach is the right tool and when something else fits better.
Sub-pages
A3.1, Database Fundamentals (SL + HL)
| # | Topic | Syllabus | Key Concepts | Level |
|---|---|---|---|---|
| 1 | Relational Database Fundamentals | A3.1.1 | Tables, records, fields, attributes; features, benefits, limitations; object-relational impedance mismatch | SL + HL |
A3.2, Database Design (SL + HL)
| # | Topic | Syllabus | Key Concepts | Level |
|---|---|---|---|---|
| 2 | Database Schemas | A3.2.1 | Conceptual, logical, and physical schema layers | SL + HL |
| 3 | Entity-Relationship Diagrams | A3.2.2 | Entities, attributes, relationships, cardinality, modality | SL + HL |
| 4 | Data Types and Keys | A3.2.3, A3.2.4 | Integer, decimal, char, varchar, date, boolean; primary, foreign, composite, concatenated keys; referential integrity | SL + HL |
| 5 | Normalisation | A3.2.5, A3.2.6, A3.2.7 | Functional, partial, transitive dependencies; 1NF, 2NF, 3NF; constructing 3NF; when to denormalise | SL + HL |
A3.3, Database Programming with SQL
| # | Topic | Syllabus | Key Concepts | Level |
|---|---|---|---|---|
| 6 | SQL Queries | A3.3.1, A3.3.2, A3.3.4 (HL) | DDL vs DML; SELECT, FROM, WHERE, JOIN, BETWEEN, LIKE, ORDER BY, GROUP BY, HAVING, AND/OR/NOT; aggregate functions (HL) | SL + HL |
| 7 | SQL Updates | A3.3.3 | INSERT INTO, UPDATE SET, DELETE; effect of indexed columns on performance | SL + HL |
| 8 | Transactions and Views | A3.3.5, A3.3.6 | ACID properties; BEGIN TRANSACTION, COMMIT, ROLLBACK; virtual vs materialised views | HL |
A3.4, Alternative Databases and Data Warehouses (HL only)
| # | Topic | Syllabus | Key Concepts | Level |
|---|---|---|---|---|
| 9 | Alternative Databases | A3.4.1 | NoSQL, cloud, spatial, in-memory databases and their use-cases | HL |
| 10 | Data Warehouses | A3.4.2, A3.4.3 | Warehouse objectives; OLTP vs OLAP; data mining techniques (clustering, classification, association, anomaly detection) | HL |
| 11 | Distributed Databases | A3.4.4 | Concurrency control, data consistency, partitioning, security, transparency, fault tolerance, replication, scalability | HL |
Teaching Sequence
Databases are typically taught in a single block of around 11 hours at SL and 18 hours at HL. A natural order is:
- Fundamentals (A3.1.1), what a relational database is and why it dominates.
- Design (A3.2.1 to A3.2.4), the design vocabulary: schemas, ERDs, data types, keys.
- Normalisation (A3.2.5 to A3.2.7), the process that turns a messy spreadsheet into a clean relational database, plus when to break the rules.
- SQL (A3.3.1 to A3.3.3), writing queries and updates against an existing design.
- HL extensions: aggregates and views, transactions, alternative database models, data warehouses, and distributed databases.
The HL extension topics build on the SL core, so HL students should work through A3.1 to A3.3 first.
Learning Objectives
After working through these pages, you should be able to:
- Explain the features, benefits and limitations of a relational database, and recognise where the relational model is and is not a good fit (A3.1.1)
- Describe the conceptual, logical and physical schema layers and explain why they exist as separate concerns (A3.2.1)
- Construct an ERD for a real-world scenario, showing entities, attributes, relationship types, cardinality and modality (A3.2.2)
- Outline the common data types used in relational databases and justify the choice of type for each attribute (A3.2.3)
- Construct relational tables with appropriate primary, foreign, composite and concatenated keys, and explain how referential integrity is enforced (A3.2.4)
- Explain the differences between 1NF, 2NF and 3NF in terms of functional, partial-key and transitive dependencies (A3.2.5)
- Construct a database normalised to 3NF for a range of real-world scenarios (A3.2.6)
- Evaluate the need for denormalising a database in specific situations (A3.2.7)
- Outline the differences between DDL and DML within SQL (A3.3.1)
- Construct SQL queries between two tables using SELECT, JOIN, WHERE, BETWEEN, LIKE, ORDER BY, GROUP BY, HAVING and the AND/OR/NOT operators (A3.3.2)
- Explain how INSERT INTO, UPDATE SET and DELETE are used to modify data in a database (A3.3.3)
- Construct SQL queries using AVG, COUNT, MAX, MIN and SUM aggregate functions (A3.3.4 HL)
- Describe the difference between virtual and materialised views (A3.3.5 HL)
- Describe how transactions and the ACID properties maintain data integrity (A3.3.6 HL)
- Outline the characteristics of NoSQL, cloud, spatial and in-memory databases (A3.4.1 HL)
- Explain the primary objectives of data warehouses and the role of OLAP and data mining (A3.4.2, A3.4.3 HL)
- Describe the features of distributed databases (A3.4.4 HL)
Connections
- Information Layer, Number Systems and Representing Data explain how the values in each cell are physically stored as bits.
- Hardware Layer, Secondary Storage is where database files actually live; Primary Memory and caching explain why in-memory databases are dramatically faster for some workloads.
- Operating Systems Layer, OS Fundamentals covers the file-system and process management that a DBMS sits on top of; Multitasking underpins the concurrency control that transactions rely on.
- Communication Layer, Network Architecture underlies cloud and distributed databases; Network Security and Encryption cover the access controls that protect database contents.
- Programming Layer: the object-relational impedance mismatch is most visible when an OOP program reads and writes rows from a relational store; File Processing is the simpler alternative when full database machinery is overkill.
- Machine Learning: data warehouses and OLAP supply the historical, clean data that supervised ML models need for training.