Database Schemas
IB Syllabus: A3.2.1: Describe database schemas (conceptual, logical, physical).
Key Concepts
What a Schema Is
A schema is the structure of a database, the blueprint that says which tables exist, which columns each table has, what type those columns are, which columns identify rows, and how the tables link together. The schema does not contain any actual data; it describes the shape the data will take when it arrives.
Designing a database means producing the schema before a single row is stored. Skipping this step is what produces the painful redesigns, redundant data, and broken integrity that come up in every “why we use a database” discussion.
Schema is to a database what an architect’s drawings are to a building. You can change the drawings cheaply on paper; once the foundations are poured, changes get expensive fast. A rough, wrong schema may seem to work on day one and only cause trouble months later when the database has grown.
Three Levels of Schema
The standard split is three layers of schema. Each layer describes the same database, but at a different level of abstraction. Together they separate “what we are modelling” (conceptual) from “how the data is organised inside the database” (logical) from “how the bits sit on disk” (physical).
| Level | What it covers | Typical artefact |
|---|---|---|
| Conceptual | High-level view: which entities exist and the main relationships between them. Vocabulary-first; almost no implementation detail. | A high-level entity-relationship sketch with entity names only |
| Logical | Full detail of tables, columns, data types, primary keys, foreign keys, normalised structure. DBMS-independent. | An ERD with attributes plus a normalised relational model |
| Physical | How those tables are actually stored: file layout, indexes, partitioning, page size, storage engine, on-disk format. DBMS- and hardware-specific. | DDL statements with CREATE INDEX, tablespace assignments, storage parameters |
A useful mnemonic: conceptual = “what” / logical = “how” / physical = “where and how fast”.
Conceptual Schema (the “What”)
The conceptual schema gives a high-level, business-facing picture of the database. It answers questions like:
- What are the main things we need to keep track of? (Member, Class, Booking)
- How do they relate to each other at a glance? (Members make bookings; classes have bookings)
The conceptual schema:
- Names the entities but does not yet list every attribute
- Shows the main relationships but not the full cardinality and modality detail
- Is DBMS-independent: nothing here would change if you switched from MySQL to PostgreSQL
- Is easy to discuss with non-technical stakeholders because it is in plain business terms
Think of it as a one-page sketch that a head teacher, club manager, or business owner can read without any database background.
Logical Schema (the “How”)
The logical schema fills in all the structural detail needed to actually build the database, but still without committing to a specific storage technology. It shows:
- Every entity that becomes a table
- The full set of attributes (columns) on each table, with data types
- The primary key of each table
- Every foreign key and the table it references
- Relationships with their full cardinality (1:1, 1:M, M:M) and modality (mandatory or optional)
- The result of normalisation, typically a fully 3NF structure
The logical schema is the artefact a developer hands to a database designer and says “build me this.” It is DBMS-independent: the same logical schema can be implemented in MySQL, PostgreSQL, SQL Server, or Oracle without rewriting it.
Physical Schema (the “Where and How Fast”)
The physical schema describes how the data is laid out on the storage medium. It depends on the chosen DBMS and on the performance characteristics the system needs. It covers:
- How rows are organised on disk (heap files, clustered indexes, columnar storage)
- Which columns have indexes (which speed up reads but slow down writes)
- Partitioning: splitting a large table across multiple files or machines
- Storage engine choices in DBMSs that offer several (InnoDB vs MyISAM in MySQL, for instance)
- Page size, fill factor, compression options, tablespace placement
- Replication and backup arrangements
Two databases with identical logical schemas can have wildly different physical schemas, one tuned for a small embedded system, the other for a high-traffic web platform serving millions of users, and behave very differently in practice.
The Three Levels Together
What entities and relationships?
- Member, Class, Booking
- Members make Bookings
- Classes have Bookings
What tables, columns, keys, types?
MEMBER(MemberID, FirstName, Surname, DOB, Tier)CLASS(ClassID, Title, Coach, StartsAt)BOOKING(BookingID, MemberID*, ClassID*, BookedOn)
* = foreign key
How is it stored on disk?
- InnoDB storage engine, 16 KB pages
- Clustered index on
BOOKING(BookedOn) - Secondary index on
MEMBER(Surname) - Partition
BOOKINGby year
Each lower level adds detail to the one above. Each higher level is independent of the choices below, moving from MySQL to PostgreSQL changes the physical schema completely but leaves the logical and conceptual schemas alone.
Why Three Layers?
The three-layer split exists because it separates concerns that change at different speeds and that different people care about:
- The business changes the conceptual schema when it adds a new entity (“we now also track Trainers”). This requires re-modelling and is rare.
- The developers change the logical schema when they refactor tables or add new fields. This is more common but still deliberate.
- The DBA changes the physical schema when they tune performance, adding indexes, repartitioning, switching storage engines. This can happen without users noticing.
Without the separation, every performance tweak would touch the business-facing model, and every business change would force a rewrite of the storage layer.
Why Schemas Matter (Data Modelling)
Mark schemes for “Explain the importance of data modelling” reward the following points:
- Identifies entities and tables correctly: if the wrong things are entities, the database cannot answer the questions the business will ask.
- Forces deliberate choice of attributes: ensures columns are necessary (the data is actually needed) and sufficient (nothing important is missing).
- Establishes keys early: primary and foreign keys are picked while the design is still cheap to change.
- Captures relationships explicitly: so the right joins are possible when users want cross-table queries.
- Drives normalisation: reduces redundancy and the anomalies that come with it (covered in Normalisation).
- Provides a shared artefact: a diagram or schema document that analysts, developers, DBAs, and clients can all read and reason about.
- Catches problems on paper: it is far cheaper to redraw an ERD than to migrate a live production database.
A database built without a schema design phase tends to end up with redundant data, missing constraints, and queries that are far slower than they need to be.
Worked Examples
Example 1: One Database, Three Schemas
The Harbour Run sports club (introduced in Relational Fundamentals) wants a database to track members, classes, and bookings. Here are the same system viewed at all three levels.
Conceptual schema (a one-paragraph business description that a manager would sign off):
The system tracks members of the club, the classes that coaches run, and the bookings that link members to classes. Each member can book many classes; each class can be booked by many members. A booking always belongs to exactly one member and exactly one class. Bookings are made on a specific date and have a status (confirmed, cancelled, attended, no-show).
Notice that nothing here talks about columns, types, or storage. It is purely about entities and relationships in business language.
Logical schema (DBMS-independent, ready to implement):
| Table | Columns | Notes |
|---|---|---|
MEMBER | MemberID (INT, PK), FirstName (VARCHAR 50), Surname (VARCHAR 50), DateOfBirth (DATE), MembershipTier (VARCHAR 20) | |
CLASS | ClassID (VARCHAR 8, PK), Title (VARCHAR 80), Coach (VARCHAR 60), StartsAt (DATETIME) | |
BOOKING | BookingID (VARCHAR 8, PK), MemberID (INT, FK -> MEMBER), ClassID (VARCHAR 8, FK -> CLASS), BookedOn (DATE), Status (VARCHAR 20) | Junction table resolving the M:M relationship between MEMBER and CLASS |
Physical schema (specific to MySQL on the club’s small Linux server):
- Storage engine: InnoDB (transactional, supports foreign keys)
- Default page size: 16 KB
- Indexes:
PRIMARY KEY on each table (auto-clustered by InnoDB)
Secondary index on MEMBER(Surname, FirstName) for member search
Secondary index on BOOKING(BookedOn) for daily booking reports
Composite index on BOOKING(MemberID, BookedOn) for "my bookings" queries
- Backup: nightly mysqldump to off-site cloud bucket
- Replication: none initially (single-server deployment)
Example 2: Same Logical Schema, Two Physical Implementations
A national chain of sports clubs runs the same software at every venue. The logical schema is identical for all 200 clubs. The physical schema differs sharply between the two extremes:
Small village club, 80 members, one server:
- Single MySQL instance on a shared Linux box
- No partitioning, the entire
BOOKINGtable is one file - Three indexes total
- Daily file-based backup; recovery time is hours, which is acceptable
National head-office reporting database, 1.2 million members, 40 million bookings per year:
- Clustered PostgreSQL deployment across three machines
BOOKINGpartitioned by year for fast range queries- Eight indexes including covering indexes for the most common reports
- Continuous streaming replication to a hot standby
- Read replicas serve the dashboards so the primary stays free for writes
Same conceptual and logical schemas, completely different physical schemas. This is exactly the situation the three-layer model was designed to handle.
Example 3: Distinguishing “What” From “How”
Sort the following design decisions into the correct schema layer:
| Decision | Layer |
|---|---|
| “The system needs to know about Customers, Orders, and Products.” | Conceptual |
“The Customer table will have columns CustomerID, Name, Email, JoinDate.” | Logical |
“We will add a B-tree index on Order.CustomerID to speed up the customer-history report.” | Physical |
| “Each Order belongs to exactly one Customer and one Customer can have many Orders.” | Conceptual (relationship at high level) |
“Customer.Email will be VARCHAR(120) and have a unique constraint.” | Logical |
| “Orders older than 5 years will be moved to an archive tablespace on slower disks.” | Physical |
A reliable test: if the decision describes what the business wants tracked, it is conceptual. If it describes how the data is organised into tables, columns, keys, and types, it is logical. If it describes how the data is stored on disk for performance, it is physical.
Quick Check
Q1. Which best defines the term schema?
Q2. Which schema level identifies the main entities and the main relationships between them, without showing column names or data types?
Q3. Adding a B-tree index on a frequently searched column to speed up queries is a decision at which schema level?
Q4. Which is NOT a characteristic of a logical schema?
Q5. What is the typical order in which the three schema levels are designed?
Classify Each Decision
For each design decision, identify which schema layer it belongs to. Use conceptual, logical, or physical.
| Decision | Schema layer |
|---|---|
| "The system tracks Patients, Doctors, and Appointments." | |
Appointment.DurationMinutes will be type INT with a CHECK constraint that the value is positive. | |
Add a covering index on (DoctorID, AppointmentDate) to speed up daily-schedule queries. | |
The primary key of Patient will be PatientID. | |
| One Doctor may have many Appointments; one Appointment is held by exactly one Doctor. | |
Partition the Appointment table by month so old records sit on slower archival disks. |
Fill in the Blanks
Complete the description of the three-schema architecture.
THREE-SCHEMA ARCHITECTURE
=========================
The schema is the high-level view: it identifies
the main entities and the main relationships, in business language.
The schema shows full detail of tables, columns,
data types, primary keys, and foreign keys, but is DBMS-independent.
The schema describes how the data is stored on
disk: file layout, indexes, partitioning, and storage engine.
The three are usually designed in the order
-> -> .
Spot the Error
A student wrote revision notes about the three-schema architecture. One line is wrong. Click the line with the error, then choose the correct fix.
Pick the correct fix for line 3:
Identify the Layer
"The database tracks Books, Members, and Loans. Each Loan involves exactly one Book and one Member. A Member can have many Loans; a Book can be involved in many Loans over time."
Which schema layer does this description correspond to? (Type conceptual, logical, or physical.)
"The DBA decides to add a B-tree index on Loan.LoanDate because the monthly overdue-loans report has become too slow."
Which schema layer does this decision belong to? (Type conceptual, logical, or physical.)
Practice Exercises
Core
-
[Core]Schemas[2 marks]Define the term schema as used in a relational database, and state the three levels of schema. -
[Core]Schemas[4 marks]Describe the difference between a conceptual schema and a logical schema. Give one example of what each would and would not include. -
[Core]Schemas[3 marks]Identify three characteristics of a logical schema.
Extension
-
[Extension]Schemas[6 marks]Distinguish between the conceptual, logical, and physical schemas of a database. For each level, state one decision that belongs at that level and one decision that does not. -
[Extension]Schemas[5 marks]Explain the importance of data modelling in the design of a new relational database for a chain of veterinary clinics.
Challenge
-
[Challenge]Schemas[6 marks]A chain of bookshops runs the same shop-management software in two contexts: one branch is a small village shop, and one is the chain’s head-office reporting hub. Explain why these two installations might share a logical schema yet require very different physical schemas. Refer to at least three physical-level concerns (indexes, partitioning, storage engine, replication, backup, etc.). -
[Challenge]Schemas[8 marks]To what extent is the three-schema architecture still useful in a world where many small applications are built rapidly using a single DBMS and a single developer? Refer to specific advantages of the separation and at least one case where it might be over-engineering.
Note for IB CS learners: A3.2.1 questions tend to be short (1-4 marks) and focused on terminology or layer characteristics. The most common formats are “Define schema” (1 mark), “Identify characteristics of the X schema” (2-3 marks), and “Distinguish between X and Y schemas” (3-4 marks). For Distinguish items, structure your answer as paired contrasts (“X does …, whereas Y does …”) rather than two separate paragraphs.
Connections
- Previous: Relational Database Fundamentals. The model the schema describes.
- Next: Entity-Relationship Diagrams. The standard visual notation for capturing the conceptual and logical schemas.
- Related: Data Types and Keys. The logical-schema decisions about column types and key choices.
- Related: Normalisation. The process that refines the logical schema into a low-redundancy form.
- Forward: SQL Queries. The DDL
CREATE TABLEstatements that turn a logical schema into a working database, andCREATE INDEXstatements that implement physical-schema choices.