Database Schemas

IB Syllabus: A3.2.1: Describe database schemas (conceptual, logical, physical).

Key Concepts

What a Schema Is

A schema is the structure of a database, the blueprint that says which tables exist, which columns each table has, what type those columns are, which columns identify rows, and how the tables link together. The schema does not contain any actual data; it describes the shape the data will take when it arrives.

Designing a database means producing the schema before a single row is stored. Skipping this step is what produces the painful redesigns, redundant data, and broken integrity that come up in every “why we use a database” discussion.

Schema is to a database what an architect’s drawings are to a building. You can change the drawings cheaply on paper; once the foundations are poured, changes get expensive fast. A rough, wrong schema may seem to work on day one and only cause trouble months later when the database has grown.

Three Levels of Schema

The standard split is three layers of schema. Each layer describes the same database, but at a different level of abstraction. Together they separate “what we are modelling” (conceptual) from “how the data is organised inside the database” (logical) from “how the bits sit on disk” (physical).

Level	What it covers	Typical artefact
Conceptual	High-level view: which entities exist and the main relationships between them. Vocabulary-first; almost no implementation detail.	A high-level entity-relationship sketch with entity names only
Logical	Full detail of tables, columns, data types, primary keys, foreign keys, normalised structure. DBMS-independent.	An ERD with attributes plus a normalised relational model
Physical	How those tables are actually stored: file layout, indexes, partitioning, page size, storage engine, on-disk format. DBMS- and hardware-specific.	DDL statements with `CREATE INDEX`, tablespace assignments, storage parameters

A useful mnemonic: conceptual = “what” / logical = “how” / physical = “where and how fast”.

Conceptual Schema (the “What”)

The conceptual schema gives a high-level, business-facing picture of the database. It answers questions like:

What are the main things we need to keep track of? (Member, Class, Booking)
How do they relate to each other at a glance? (Members make bookings; classes have bookings)

The conceptual schema:

Names the entities but does not yet list every attribute
Shows the main relationships but not the full cardinality and modality detail
Is DBMS-independent: nothing here would change if you switched from MySQL to PostgreSQL
Is easy to discuss with non-technical stakeholders because it is in plain business terms

Think of it as a one-page sketch that a head teacher, club manager, or business owner can read without any database background.

Logical Schema (the “How”)

The logical schema fills in all the structural detail needed to actually build the database, but still without committing to a specific storage technology. It shows:

Every entity that becomes a table
The full set of attributes (columns) on each table, with data types
The primary key of each table
Every foreign key and the table it references
Relationships with their full cardinality (1:1, 1:M, M:M) and modality (mandatory or optional)
The result of normalisation, typically a fully 3NF structure

The logical schema is the artefact a developer hands to a database designer and says “build me this.” It is DBMS-independent: the same logical schema can be implemented in MySQL, PostgreSQL, SQL Server, or Oracle without rewriting it.

Physical Schema (the “Where and How Fast”)

The physical schema describes how the data is laid out on the storage medium. It depends on the chosen DBMS and on the performance characteristics the system needs. It covers:

How rows are organised on disk (heap files, clustered indexes, columnar storage)
Which columns have indexes (which speed up reads but slow down writes)
Partitioning: splitting a large table across multiple files or machines
Storage engine choices in DBMSs that offer several (InnoDB vs MyISAM in MySQL, for instance)
Page size, fill factor, compression options, tablespace placement
Replication and backup arrangements

Two databases with identical logical schemas can have wildly different physical schemas, one tuned for a small embedded system, the other for a high-traffic web platform serving millions of users, and behave very differently in practice.

The Three Levels Together

Conceptual schema What we model

What entities and relationships?

Member, Class, Booking
Members make Bookings
Classes have Bookings

↓

Logical schema How we organise

What tables, columns, keys, types?

MEMBER(MemberID, FirstName, Surname, DOB, Tier)
CLASS(ClassID, Title, Coach, StartsAt)
BOOKING(BookingID, MemberID*, ClassID*, BookedOn)
* = foreign key

↓

Physical schema Where, how fast

How is it stored on disk?

InnoDB storage engine, 16 KB pages
Clustered index on BOOKING(BookedOn)
Secondary index on MEMBER(Surname)
Partition BOOKING by year

Each lower level adds detail to the one above. Each higher level is independent of the choices below, moving from MySQL to PostgreSQL changes the physical schema completely but leaves the logical and conceptual schemas alone.

Why Three Layers?

The three-layer split exists because it separates concerns that change at different speeds and that different people care about:

The business changes the conceptual schema when it adds a new entity (“we now also track Trainers”). This requires re-modelling and is rare.
The developers change the logical schema when they refactor tables or add new fields. This is more common but still deliberate.
The DBA changes the physical schema when they tune performance, adding indexes, repartitioning, switching storage engines. This can happen without users noticing.

Without the separation, every performance tweak would touch the business-facing model, and every business change would force a rewrite of the storage layer.

Why Schemas Matter (Data Modelling)

Mark schemes for “Explain the importance of data modelling” reward the following points:

Identifies entities and tables correctly: if the wrong things are entities, the database cannot answer the questions the business will ask.
Forces deliberate choice of attributes: ensures columns are necessary (the data is actually needed) and sufficient (nothing important is missing).
Establishes keys early: primary and foreign keys are picked while the design is still cheap to change.
Captures relationships explicitly: so the right joins are possible when users want cross-table queries.
Drives normalisation: reduces redundancy and the anomalies that come with it (covered in Normalisation).
Provides a shared artefact: a diagram or schema document that analysts, developers, DBAs, and clients can all read and reason about.
Catches problems on paper: it is far cheaper to redraw an ERD than to migrate a live production database.

A database built without a schema design phase tends to end up with redundant data, missing constraints, and queries that are far slower than they need to be.

Worked Examples

Example 1: One Database, Three Schemas

The Harbour Run sports club (introduced in Relational Fundamentals) wants a database to track members, classes, and bookings. Here are the same system viewed at all three levels.

Conceptual schema (a one-paragraph business description that a manager would sign off):

The system tracks members of the club, the classes that coaches run, and the bookings that link members to classes. Each member can book many classes; each class can be booked by many members. A booking always belongs to exactly one member and exactly one class. Bookings are made on a specific date and have a status (confirmed, cancelled, attended, no-show).

Notice that nothing here talks about columns, types, or storage. It is purely about entities and relationships in business language.

Logical schema (DBMS-independent, ready to implement):

Table	Columns	Notes
`MEMBER`	`MemberID` (INT, PK), `FirstName` (VARCHAR 50), `Surname` (VARCHAR 50), `DateOfBirth` (DATE), `MembershipTier` (VARCHAR 20)
`CLASS`	`ClassID` (VARCHAR 8, PK), `Title` (VARCHAR 80), `Coach` (VARCHAR 60), `StartsAt` (DATETIME)
`BOOKING`	`BookingID` (VARCHAR 8, PK), `MemberID` (INT, FK -> MEMBER), `ClassID` (VARCHAR 8, FK -> CLASS), `BookedOn` (DATE), `Status` (VARCHAR 20)	Junction table resolving the M:M relationship between MEMBER and CLASS

Physical schema (specific to MySQL on the club’s small Linux server):

- Storage engine: InnoDB (transactional, supports foreign keys)
- Default page size: 16 KB
- Indexes:
    PRIMARY KEY on each table (auto-clustered by InnoDB)
    Secondary index on MEMBER(Surname, FirstName) for member search
    Secondary index on BOOKING(BookedOn) for daily booking reports
    Composite index on BOOKING(MemberID, BookedOn) for "my bookings" queries
- Backup: nightly mysqldump to off-site cloud bucket
- Replication: none initially (single-server deployment)

Example 2: Same Logical Schema, Two Physical Implementations

A national chain of sports clubs runs the same software at every venue. The logical schema is identical for all 200 clubs. The physical schema differs sharply between the two extremes:

Small village club, 80 members, one server:

Single MySQL instance on a shared Linux box
No partitioning, the entire BOOKING table is one file
Three indexes total
Daily file-based backup; recovery time is hours, which is acceptable

National head-office reporting database, 1.2 million members, 40 million bookings per year:

Clustered PostgreSQL deployment across three machines
BOOKING partitioned by year for fast range queries
Eight indexes including covering indexes for the most common reports
Continuous streaming replication to a hot standby
Read replicas serve the dashboards so the primary stays free for writes

Same conceptual and logical schemas, completely different physical schemas. This is exactly the situation the three-layer model was designed to handle.

Example 3: Distinguishing “What” From “How”

Sort the following design decisions into the correct schema layer:

Decision	Layer
“The system needs to know about Customers, Orders, and Products.”	Conceptual
“The `Customer` table will have columns `CustomerID`, `Name`, `Email`, `JoinDate`.”	Logical
“We will add a B-tree index on `Order.CustomerID` to speed up the customer-history report.”	Physical
“Each Order belongs to exactly one Customer and one Customer can have many Orders.”	Conceptual (relationship at high level)
“`Customer.Email` will be `VARCHAR(120)` and have a unique constraint.”	Logical
“Orders older than 5 years will be moved to an archive tablespace on slower disks.”	Physical

A reliable test: if the decision describes what the business wants tracked, it is conceptual. If it describes how the data is organised into tables, columns, keys, and types, it is logical. If it describes how the data is stored on disk for performance, it is physical.

Quick Check

Q1. Which best defines the term schema?

Q2. Which schema level identifies the main entities and the main relationships between them, without showing column names or data types?

Q3. Adding a B-tree index on a frequently searched column to speed up queries is a decision at which schema level?

Q4. Which is NOT a characteristic of a logical schema?

Q5. What is the typical order in which the three schema levels are designed?

Classify Each Decision

For each design decision, identify which schema layer it belongs to. Use conceptual, logical, or physical.

Decision	Schema layer
"The system tracks Patients, Doctors, and Appointments."
`Appointment.DurationMinutes` will be type `INT` with a CHECK constraint that the value is positive.
Add a covering index on `(DoctorID, AppointmentDate)` to speed up daily-schedule queries.
The primary key of `Patient` will be `PatientID`.
One Doctor may have many Appointments; one Appointment is held by exactly one Doctor.
Partition the `Appointment` table by month so old records sit on slower archival disks.

Fill in the Blanks

Complete the description of the three-schema architecture.

THREE-SCHEMA ARCHITECTURE
=========================
The  schema is the high-level view: it identifies
the main entities and the main relationships, in business language.

The  schema shows full detail of tables, columns,
data types, primary keys, and foreign keys, but is DBMS-independent.

The  schema describes how the data is stored on
disk: file layout, indexes, partitioning, and storage engine.

The three are usually designed in the order
 ->  -> .

Spot the Error

A student wrote revision notes about the three-schema architecture. One line is wrong. Click the line with the error, then choose the correct fix.

1A schema is the structure of the database, not the data itself 2The conceptual schema names entities and main relationships without column-level detail 3The logical schema is DBMS-specific because it picks the storage engine and indexes 4The physical schema covers indexes, partitioning, and how rows are stored on disk 5The three schemas are usually designed conceptual -> logical -> physical

Pick the correct fix for line 3:

Identify the Layer

"The database tracks Books, Members, and Loans. Each Loan involves exactly one Book and one Member. A Member can have many Loans; a Book can be involved in many Loans over time."

Which schema layer does this description correspond to? (Type conceptual, logical, or physical.)

"The DBA decides to add a B-tree index on Loan.LoanDate because the monthly overdue-loans report has become too slow."

Which schema layer does this decision belong to? (Type conceptual, logical, or physical.)

Practice Exercises

Core

[Core] Schemas [2 marks] Define the term schema as used in a relational database, and state the three levels of schema.
[Core] Schemas [4 marks] Describe the difference between a conceptual schema and a logical schema. Give one example of what each would and would not include.
[Core] Schemas [3 marks] Identify three characteristics of a logical schema.

Extension

[Extension] Schemas [6 marks] Distinguish between the conceptual, logical, and physical schemas of a database. For each level, state one decision that belongs at that level and one decision that does not.
[Extension] Schemas [5 marks] Explain the importance of data modelling in the design of a new relational database for a chain of veterinary clinics.

Challenge

[Challenge] Schemas [6 marks] A chain of bookshops runs the same shop-management software in two contexts: one branch is a small village shop, and one is the chain’s head-office reporting hub. Explain why these two installations might share a logical schema yet require very different physical schemas. Refer to at least three physical-level concerns (indexes, partitioning, storage engine, replication, backup, etc.).
[Challenge] Schemas [8 marks] To what extent is the three-schema architecture still useful in a world where many small applications are built rapidly using a single DBMS and a single developer? Refer to specific advantages of the separation and at least one case where it might be over-engineering.

Note for IB CS learners: A3.2.1 questions tend to be short (1-4 marks) and focused on terminology or layer characteristics. The most common formats are “Define schema” (1 mark), “Identify characteristics of the X schema” (2-3 marks), and “Distinguish between X and Y schemas” (3-4 marks). For Distinguish items, structure your answer as paired contrasts (“X does …, whereas Y does …”) rather than two separate paragraphs.

Connections

Previous: Relational Database Fundamentals. The model the schema describes.
Next: Entity-Relationship Diagrams. The standard visual notation for capturing the conceptual and logical schemas.
Related: Data Types and Keys. The logical-schema decisions about column types and key choices.
Related: Normalisation. The process that refines the logical schema into a low-redundancy form.
Forward: SQL Queries. The DDL CREATE TABLE statements that turn a logical schema into a working database, and CREATE INDEX statements that implement physical-schema choices.