Databases Layer

IB Syllabus: A3: Databases (relational fundamentals, design, programming, alternative approaches)

Overview

A database is an organised, persistent collection of related data that can be queried, updated, and shared by many applications and users at the same time. Almost every system you interact with on a normal day, school timetables, online shopping carts, hospital records, train booking portals, social media feeds, is backed by a database.

The Databases layer covers four ideas:

What a relational database is: the model used by the overwhelming majority of business systems, where data is stored in linked tables of rows and columns.
How to design one: schemas, entity-relationship diagrams, data types, keys, and the process of normalisation that produces a clean, non-redundant set of tables.
How to talk to one, SQL (Structured Query Language), the standard language for asking questions and making changes.
How alternative approaches differ (HL), NoSQL, cloud, spatial, and in-memory databases; data warehouses for analytical work; and distributed databases for scale and resilience.

By the end of these pages you should be able to design a normalised relational database for a real-world scenario, write SQL queries that combine several tables, and reason about when a relational approach is the right tool and when something else fits better.

Sub-pages

A3.1, Database Fundamentals (SL + HL)

#	Topic	Syllabus	Key Concepts	Level
1	Relational Database Fundamentals	A3.1.1	Tables, records, fields, attributes; features, benefits, limitations; object-relational impedance mismatch	SL + HL

A3.2, Database Design (SL + HL)

#	Topic	Syllabus	Key Concepts	Level
2	Database Schemas	A3.2.1	Conceptual, logical, and physical schema layers	SL + HL
3	Entity-Relationship Diagrams	A3.2.2	Entities, attributes, relationships, cardinality, modality	SL + HL
4	Data Types and Keys	A3.2.3, A3.2.4	Integer, decimal, char, varchar, date, boolean; primary, foreign, composite, concatenated keys; referential integrity	SL + HL
5	Normalisation	A3.2.5, A3.2.6, A3.2.7	Functional, partial, transitive dependencies; 1NF, 2NF, 3NF; constructing 3NF; when to denormalise	SL + HL

A3.3, Database Programming with SQL

#	Topic	Syllabus	Key Concepts	Level
6	SQL Queries	A3.3.1, A3.3.2, A3.3.4 (HL)	DDL vs DML; SELECT, FROM, WHERE, JOIN, BETWEEN, LIKE, ORDER BY, GROUP BY, HAVING, AND/OR/NOT; aggregate functions (HL)	SL + HL
7	SQL Updates	A3.3.3	INSERT INTO, UPDATE SET, DELETE; effect of indexed columns on performance	SL + HL
8	Transactions and Views	A3.3.5, A3.3.6	ACID properties; BEGIN TRANSACTION, COMMIT, ROLLBACK; virtual vs materialised views	HL

A3.4, Alternative Databases and Data Warehouses (HL only)

#	Topic	Syllabus	Key Concepts	Level
9	Alternative Databases	A3.4.1	NoSQL, cloud, spatial, in-memory databases and their use-cases	HL
10	Data Warehouses	A3.4.2, A3.4.3	Warehouse objectives; OLTP vs OLAP; data mining techniques (clustering, classification, association, anomaly detection)	HL
11	Distributed Databases	A3.4.4	Concurrency control, data consistency, partitioning, security, transparency, fault tolerance, replication, scalability	HL

Teaching Sequence

Databases are typically taught in a single block of around 11 hours at SL and 18 hours at HL. A natural order is:

Fundamentals (A3.1.1), what a relational database is and why it dominates.
Design (A3.2.1 to A3.2.4), the design vocabulary: schemas, ERDs, data types, keys.
Normalisation (A3.2.5 to A3.2.7), the process that turns a messy spreadsheet into a clean relational database, plus when to break the rules.
SQL (A3.3.1 to A3.3.3), writing queries and updates against an existing design.
HL extensions: aggregates and views, transactions, alternative database models, data warehouses, and distributed databases.

The HL extension topics build on the SL core, so HL students should work through A3.1 to A3.3 first.

Learning Objectives

After working through these pages, you should be able to:

Explain the features, benefits and limitations of a relational database, and recognise where the relational model is and is not a good fit (A3.1.1)
Describe the conceptual, logical and physical schema layers and explain why they exist as separate concerns (A3.2.1)
Construct an ERD for a real-world scenario, showing entities, attributes, relationship types, cardinality and modality (A3.2.2)
Outline the common data types used in relational databases and justify the choice of type for each attribute (A3.2.3)
Construct relational tables with appropriate primary, foreign, composite and concatenated keys, and explain how referential integrity is enforced (A3.2.4)
Explain the differences between 1NF, 2NF and 3NF in terms of functional, partial-key and transitive dependencies (A3.2.5)
Construct a database normalised to 3NF for a range of real-world scenarios (A3.2.6)
Evaluate the need for denormalising a database in specific situations (A3.2.7)
Outline the differences between DDL and DML within SQL (A3.3.1)
Construct SQL queries between two tables using SELECT, JOIN, WHERE, BETWEEN, LIKE, ORDER BY, GROUP BY, HAVING and the AND/OR/NOT operators (A3.3.2)
Explain how INSERT INTO, UPDATE SET and DELETE are used to modify data in a database (A3.3.3)
Construct SQL queries using AVG, COUNT, MAX, MIN and SUM aggregate functions (A3.3.4 HL)
Describe the difference between virtual and materialised views (A3.3.5 HL)
Describe how transactions and the ACID properties maintain data integrity (A3.3.6 HL)
Outline the characteristics of NoSQL, cloud, spatial and in-memory databases (A3.4.1 HL)
Explain the primary objectives of data warehouses and the role of OLAP and data mining (A3.4.2, A3.4.3 HL)
Describe the features of distributed databases (A3.4.4 HL)

Connections

Information Layer, Number Systems and Representing Data explain how the values in each cell are physically stored as bits.
Hardware Layer, Secondary Storage is where database files actually live; Primary Memory and caching explain why in-memory databases are dramatically faster for some workloads.
Operating Systems Layer, OS Fundamentals covers the file-system and process management that a DBMS sits on top of; Multitasking underpins the concurrency control that transactions rely on.
Communication Layer, Network Architecture underlies cloud and distributed databases; Network Security and Encryption cover the access controls that protect database contents.
Programming Layer: the object-relational impedance mismatch is most visible when an OOP program reads and writes rows from a relational store; File Processing is the simpler alternative when full database machinery is overkill.
Machine Learning: data warehouses and OLAP supply the historical, clean data that supervised ML models need for training.