Databases Layer

IB Syllabus: A3: Databases (relational fundamentals, design, programming, alternative approaches)


Overview

A database is an organised, persistent collection of related data that can be queried, updated, and shared by many applications and users at the same time. Almost every system you interact with on a normal day, school timetables, online shopping carts, hospital records, train booking portals, social media feeds, is backed by a database.

The Databases layer covers four ideas:

  1. What a relational database is: the model used by the overwhelming majority of business systems, where data is stored in linked tables of rows and columns.
  2. How to design one: schemas, entity-relationship diagrams, data types, keys, and the process of normalisation that produces a clean, non-redundant set of tables.
  3. How to talk to one, SQL (Structured Query Language), the standard language for asking questions and making changes.
  4. How alternative approaches differ (HL), NoSQL, cloud, spatial, and in-memory databases; data warehouses for analytical work; and distributed databases for scale and resilience.

By the end of these pages you should be able to design a normalised relational database for a real-world scenario, write SQL queries that combine several tables, and reason about when a relational approach is the right tool and when something else fits better.


Sub-pages

A3.1, Database Fundamentals (SL + HL)

# Topic Syllabus Key Concepts Level
1 Relational Database Fundamentals A3.1.1 Tables, records, fields, attributes; features, benefits, limitations; object-relational impedance mismatch SL + HL

A3.2, Database Design (SL + HL)

# Topic Syllabus Key Concepts Level
2 Database Schemas A3.2.1 Conceptual, logical, and physical schema layers SL + HL
3 Entity-Relationship Diagrams A3.2.2 Entities, attributes, relationships, cardinality, modality SL + HL
4 Data Types and Keys A3.2.3, A3.2.4 Integer, decimal, char, varchar, date, boolean; primary, foreign, composite, concatenated keys; referential integrity SL + HL
5 Normalisation A3.2.5, A3.2.6, A3.2.7 Functional, partial, transitive dependencies; 1NF, 2NF, 3NF; constructing 3NF; when to denormalise SL + HL

A3.3, Database Programming with SQL

# Topic Syllabus Key Concepts Level
6 SQL Queries A3.3.1, A3.3.2, A3.3.4 (HL) DDL vs DML; SELECT, FROM, WHERE, JOIN, BETWEEN, LIKE, ORDER BY, GROUP BY, HAVING, AND/OR/NOT; aggregate functions (HL) SL + HL
7 SQL Updates A3.3.3 INSERT INTO, UPDATE SET, DELETE; effect of indexed columns on performance SL + HL
8 Transactions and Views A3.3.5, A3.3.6 ACID properties; BEGIN TRANSACTION, COMMIT, ROLLBACK; virtual vs materialised views HL

A3.4, Alternative Databases and Data Warehouses (HL only)

# Topic Syllabus Key Concepts Level
9 Alternative Databases A3.4.1 NoSQL, cloud, spatial, in-memory databases and their use-cases HL
10 Data Warehouses A3.4.2, A3.4.3 Warehouse objectives; OLTP vs OLAP; data mining techniques (clustering, classification, association, anomaly detection) HL
11 Distributed Databases A3.4.4 Concurrency control, data consistency, partitioning, security, transparency, fault tolerance, replication, scalability HL

Teaching Sequence

Databases are typically taught in a single block of around 11 hours at SL and 18 hours at HL. A natural order is:

  1. Fundamentals (A3.1.1), what a relational database is and why it dominates.
  2. Design (A3.2.1 to A3.2.4), the design vocabulary: schemas, ERDs, data types, keys.
  3. Normalisation (A3.2.5 to A3.2.7), the process that turns a messy spreadsheet into a clean relational database, plus when to break the rules.
  4. SQL (A3.3.1 to A3.3.3), writing queries and updates against an existing design.
  5. HL extensions: aggregates and views, transactions, alternative database models, data warehouses, and distributed databases.

The HL extension topics build on the SL core, so HL students should work through A3.1 to A3.3 first.


Learning Objectives

After working through these pages, you should be able to:

  • Explain the features, benefits and limitations of a relational database, and recognise where the relational model is and is not a good fit (A3.1.1)
  • Describe the conceptual, logical and physical schema layers and explain why they exist as separate concerns (A3.2.1)
  • Construct an ERD for a real-world scenario, showing entities, attributes, relationship types, cardinality and modality (A3.2.2)
  • Outline the common data types used in relational databases and justify the choice of type for each attribute (A3.2.3)
  • Construct relational tables with appropriate primary, foreign, composite and concatenated keys, and explain how referential integrity is enforced (A3.2.4)
  • Explain the differences between 1NF, 2NF and 3NF in terms of functional, partial-key and transitive dependencies (A3.2.5)
  • Construct a database normalised to 3NF for a range of real-world scenarios (A3.2.6)
  • Evaluate the need for denormalising a database in specific situations (A3.2.7)
  • Outline the differences between DDL and DML within SQL (A3.3.1)
  • Construct SQL queries between two tables using SELECT, JOIN, WHERE, BETWEEN, LIKE, ORDER BY, GROUP BY, HAVING and the AND/OR/NOT operators (A3.3.2)
  • Explain how INSERT INTO, UPDATE SET and DELETE are used to modify data in a database (A3.3.3)
  • Construct SQL queries using AVG, COUNT, MAX, MIN and SUM aggregate functions (A3.3.4 HL)
  • Describe the difference between virtual and materialised views (A3.3.5 HL)
  • Describe how transactions and the ACID properties maintain data integrity (A3.3.6 HL)
  • Outline the characteristics of NoSQL, cloud, spatial and in-memory databases (A3.4.1 HL)
  • Explain the primary objectives of data warehouses and the role of OLAP and data mining (A3.4.2, A3.4.3 HL)
  • Describe the features of distributed databases (A3.4.4 HL)

Connections

  • Information Layer, Number Systems and Representing Data explain how the values in each cell are physically stored as bits.
  • Hardware Layer, Secondary Storage is where database files actually live; Primary Memory and caching explain why in-memory databases are dramatically faster for some workloads.
  • Operating Systems Layer, OS Fundamentals covers the file-system and process management that a DBMS sits on top of; Multitasking underpins the concurrency control that transactions rely on.
  • Communication Layer, Network Architecture underlies cloud and distributed databases; Network Security and Encryption cover the access controls that protect database contents.
  • Programming Layer: the object-relational impedance mismatch is most visible when an OOP program reads and writes rows from a relational store; File Processing is the simpler alternative when full database machinery is overkill.
  • Machine Learning: data warehouses and OLAP supply the historical, clean data that supervised ML models need for training.

Table of contents


© EduCS.me — A resource hub for Computer Science education

This site uses Just the Docs, a documentation theme for Jekyll.