SQL Queries
IB Syllabus: A3.3.1: Outline DDL vs DML within SQL. A3.3.2: Construct queries between two tables in SQL. A3.3.4: Construct calculations using aggregate functions (HL).
Key Concepts
What SQL Is
SQL (Structured Query Language) is the standard language for relational databases. Almost every relational DBMS, MySQL, PostgreSQL, SQL Server, Oracle, SQLite, uses SQL with only small dialect differences.
SQL is a declarative language: you describe what data you want, not how the DBMS should fetch it. The DBMS’s query planner decides on the execution strategy. This is why the same SQL query can run sub-millisecond on one schema and grind to a halt on another with no syntax changes, the data structure and indexes matter as much as the query itself.
DDL vs DML (A3.3.1)
SQL is split into two language types you should be able to distinguish:
| Language type | Stands for | What it does | Key commands |
|---|---|---|---|
| DDL | Data Definition Language | Defines the structure of the database, creates, alters, or drops tables and indexes | CREATE TABLE, ALTER TABLE, DROP TABLE, CREATE INDEX, DROP INDEX |
| DML | Data Manipulation Language | Reads and modifies the data inside the structure | SELECT, INSERT, UPDATE, DELETE |
Two other categories exist (DCL for permissions, GRANT, REVOKE; TCL for transactions, BEGIN, COMMIT, ROLLBACK), but the main split worth learning is DDL versus DML. TCL is covered separately in Transactions and Views at HL.
A useful test: if the command changes the shape of the database (tables, columns, indexes), it is DDL. If it changes the contents (rows of data), it is DML.
A Minimal Working Example
We will use the following 3NF schema throughout the page. It is the Harbour Run sports club from Relational Database Fundamentals, normalised down to 3NF:
-- Created with DDL
CREATE TABLE Member (
MemberID INT PRIMARY KEY,
FirstName VARCHAR(50) NOT NULL,
Surname VARCHAR(50) NOT NULL,
DateOfBirth DATE NOT NULL,
Tier VARCHAR(20) NOT NULL
);
CREATE TABLE Class (
ClassID VARCHAR(8) PRIMARY KEY,
Title VARCHAR(80) NOT NULL,
Coach VARCHAR(60) NOT NULL,
StartsAt DATETIME NOT NULL,
Capacity INT NOT NULL
);
CREATE TABLE Booking (
BookingID VARCHAR(8) PRIMARY KEY,
MemberID INT NOT NULL,
ClassID VARCHAR(8) NOT NULL,
BookedOn DATE NOT NULL,
Status VARCHAR(20) NOT NULL,
FOREIGN KEY (MemberID) REFERENCES Member(MemberID),
FOREIGN KEY (ClassID) REFERENCES Class(ClassID)
);
The four CREATE TABLE statements above are pure DDL. Everything else on this page, the SELECT, INSERT, UPDATE examples, is DML.
The SELECT Statement: Anatomy
A SELECT query has up to seven clauses, evaluated in a specific order. You do not need every clause every time, but when several appear they must be in this sequence:
SELECT [DISTINCT] columns
FROM tables (with JOINs)
WHERE row-level conditions
GROUP BY grouping columns -- aggregate queries only
HAVING group-level conditions -- aggregate queries only
ORDER BY sort columns [ASC | DESC]
LIMIT row count -- optional, not strictly required
Each clause filters or transforms what the previous ones produced. The DBMS evaluates roughly: FROM -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY.
SELECT and FROM, the basics
The minimum SELECT statement names columns and a table:
SELECT FirstName, Surname FROM Member;
SELECT * returns every column. Avoid it in real code (the result changes if you add a column later), but it is useful when you want to inspect the data quickly.
SELECT * FROM Member;
WHERE, filtering rows
WHERE removes rows that do not match the condition. Common operators:
| Operator | Example |
|---|---|
= <> | WHERE Tier = 'Premium', WHERE Status <> 'cancelled' |
< <= > >= | WHERE Capacity >= 20 |
BETWEEN ... AND ... | WHERE BookedOn BETWEEN '2026-05-01' AND '2026-05-31' |
IN (...) | WHERE Tier IN ('Premium', 'VIP') |
LIKE '...%...' | WHERE Surname LIKE 'K%' (pattern match) |
IS NULL / IS NOT NULL | WHERE Coach IS NOT NULL |
Boolean operators combine conditions: AND, OR, NOT. Use parentheses when mixing them to make precedence explicit.
SELECT FirstName, Surname
FROM Member
WHERE Tier = 'Premium' AND DateOfBirth < '2005-01-01';
LIKE and the % wildcard
LIKE matches a string against a pattern. The % wildcard means zero or more characters; _ (rarely needed at SL) means exactly one character.
| Pattern | Matches |
|---|---|
'K%' | starts with K (Kahurangi, Knight) |
'%berg' | ends with berg |
'%ohan%' | contains ohan anywhere |
'_ohan' | one character then ohan (Mohan, Tohan) |
SELECT FirstName, Surname
FROM Member
WHERE Surname LIKE 'K%';
ORDER BY, sorting the result
ORDER BY sorts the output. ASC is ascending (the default), DESC is descending. You can sort by several columns; ties are broken by the next column in the list.
SELECT FirstName, Surname, DateOfBirth
FROM Member
ORDER BY Surname ASC, FirstName ASC;
DISTINCT, removing duplicates
SELECT DISTINCT collapses repeated rows in the result:
SELECT DISTINCT Coach FROM Class; -- each coach appears once
JOIN, combining two tables
A JOIN brings rows from two tables together based on a matching column, normally a foreign-key / primary-key link. The minimum you should be able to write is a query that joins two tables.
Modern explicit syntax (preferred):
SELECT Member.FirstName, Member.Surname, Class.Title
FROM Member
INNER JOIN Booking ON Member.MemberID = Booking.MemberID
INNER JOIN Class ON Booking.ClassID = Class.ClassID
WHERE Class.StartsAt > '2026-06-01';
(That query joins three tables. For just two, use a single INNER JOIN.)
Older implicit syntax (also widely accepted by examiners):
SELECT Member.FirstName, Member.Surname, Booking.BookedOn
FROM Member, Booking
WHERE Member.MemberID = Booking.MemberID
AND Member.Tier = 'Premium';
The implicit form lists tables separated by commas in FROM and puts the join condition in WHERE. The explicit INNER JOIN ... ON ... form keeps join logic separate from filter logic, which is easier to read for larger queries. Both forms are accepted in exam answers.
Qualified column names. When two tables share a column name (MemberID in both Member and Booking), you must qualify it as Member.MemberID or Booking.MemberID. The DBMS rejects an unqualified MemberID because it does not know which one you mean.
Mark-Scheme Anatomy of a Two-Table Query
Mark schemes for “Construct an SQL query between two tables” award marks in a predictable pattern. A 4-mark question typically gives:
- 1 mark for selecting the right columns.
- 1 mark for naming the correct tables and including a JOIN.
- 1 mark for the correct join condition (PK = FK).
- 1 mark for each correct
WHEREfilter (often two filters in a single question).
When you write the query, label each line mentally against this rubric, it forces you to include every piece.
GROUP BY and HAVING, summarising data
GROUP BY collapses rows that share a value in a particular column into single summary rows. Use it with aggregate functions (COUNT, SUM, AVG, MIN, MAX). HAVING filters those groups (it is to GROUP BY what WHERE is to FROM).
SELECT ClassID, COUNT(*) AS BookingCount
FROM Booking
GROUP BY ClassID
HAVING COUNT(*) > 10
ORDER BY BookingCount DESC;
The query above counts bookings per class, keeps only classes with more than 10 bookings, and sorts by the booking count, biggest first.
WHERE runs before grouping (removes rows). HAVING runs after grouping (removes groups). A common mistake is putting an aggregate condition in WHERE, the DBMS rejects it because aggregates do not exist yet at the WHERE stage.
Aggregate Functions (A3.3.4, HL only)
HL Only. A3.3.4 is an HL-only learning statement. SL students should recognise these functions but are not required to construct queries with them.
The five aggregate functions worth knowing:
| Function | What it returns |
|---|---|
COUNT(col) | Number of non-null values in the column. COUNT(*) counts all rows including those with nulls. |
SUM(col) | Sum of the values in the column. Only works on numeric columns. |
AVG(col) | Arithmetic mean of the values in the column. Some style guides spell this AVERAGE; both are acceptable. |
MIN(col) | Smallest value (numeric, date, or alphabetical). |
MAX(col) | Largest value (numeric, date, or alphabetical). |
These are typically used in two patterns:
Pattern 1, whole-table aggregate (one row output):
SELECT COUNT(*) AS TotalMembers,
MIN(DateOfBirth) AS Oldest,
MAX(DateOfBirth) AS Youngest
FROM Member;
Pattern 2, grouped aggregate (one row per group):
SELECT Tier, COUNT(*) AS MembersInTier
FROM Member
GROUP BY Tier;
Sample output:
| Tier | MembersInTier |
|---|---|
| Premium | 312 |
| Standard | 588 |
| VIP | 21 |
Adding HAVING filters the groups themselves:
SELECT Coach, COUNT(*) AS ClassesRun
FROM Class
GROUP BY Coach
HAVING COUNT(*) >= 5
ORDER BY ClassesRun DESC;
BETWEEN, IN, and Date Ranges
BETWEEN ... AND ... is shorthand for two >= and <= checks. It is inclusive on both ends.
SELECT Title, StartsAt
FROM Class
WHERE StartsAt BETWEEN '2026-06-01' AND '2026-06-30';
IN (...) matches any value in a list:
SELECT FirstName, Surname
FROM Member
WHERE Tier IN ('Premium', 'VIP');
For “any time on a particular day”, either BETWEEN '2026-06-01 00:00' AND '2026-06-01 23:59' or LIKE '2026-06-01%' works.
Standard Mistakes That Lose Marks
==instead of=. SQL equality is a single equals sign. (==is a programming-language habit that markers tend to accept reluctantly.)- Quoting numeric literals. Use
WHERE Capacity = 20, notWHERE Capacity = '20'. - Quoting columns and reserved words inconsistently. Use single quotes (or double, depending on dialect) for string values, never for column or table names.
- Aggregate in
WHERE.WHERE COUNT(*) > 5is invalid, useHAVING. - Unqualified ambiguous columns. If
MemberIDexists in two tables you joined, you must sayMember.MemberIDto disambiguate. - Missing semicolon. Most DBMSs require statements to end with
;. Examiners are lenient on this, but the habit is worth forming.
Worked Examples
Example 1: Single-Table SELECT with WHERE and ORDER BY
List the first name, surname, and tier of every Premium-tier member, sorted alphabetically by surname.
SELECT FirstName, Surname, Tier
FROM Member
WHERE Tier = 'Premium'
ORDER BY Surname ASC;
Mark-scheme breakdown for a 3-mark version of this question:
- 1 mark for the correct
SELECTcolumn list. - 1 mark for the correct
WHEREfilter. - 1 mark for the correct
ORDER BYclause.
Example 2: Two-Table JOIN with Compound WHERE
List the first name, surname, and class title of every booking made by a member named “Aroha Kahurangi”.
SELECT Member.FirstName, Member.Surname, Class.Title
FROM Member
INNER JOIN Booking ON Member.MemberID = Booking.MemberID
INNER JOIN Class ON Booking.ClassID = Class.ClassID
WHERE Member.FirstName = 'Aroha'
AND Member.Surname = 'Kahurangi';
This is a three-table join but follows the same pattern. For a strictly two-table version, ask “list every booking for any member whose surname is Kahurangi”:
SELECT Member.FirstName, Member.Surname, Booking.BookedOn
FROM Member
INNER JOIN Booking ON Member.MemberID = Booking.MemberID
WHERE Member.Surname = 'Kahurangi';
Mark-scheme breakdown for a 4-mark version:
- 1 mark for the
SELECTcolumns. - 1 mark for naming both tables and using
JOIN. - 1 mark for the join condition
Member.MemberID = Booking.MemberID. - 1 mark for the
WHEREfilter.
Example 3: Pattern Match with LIKE and Multiple Conditions
List every class whose title contains the word “Sunrise”, run by a coach whose surname begins with “O”, starting in June 2026.
SELECT ClassID, Title, Coach, StartsAt
FROM Class
WHERE Title LIKE '%Sunrise%'
AND Coach LIKE '% O%'
AND StartsAt BETWEEN '2026-06-01' AND '2026-06-30';
The pattern '% O%' matches a coach whose name contains a space followed by O (catching “Lila Okafor” or similar).
Example 4 (HL): Aggregate with GROUP BY and HAVING
For every coach, count the bookings made for their classes in June 2026, but only list coaches with at least 50 bookings. Sort by booking count, biggest first.
SELECT Class.Coach, COUNT(*) AS BookingCount
FROM Booking
INNER JOIN Class ON Booking.ClassID = Class.ClassID
WHERE Booking.BookedOn BETWEEN '2026-06-01' AND '2026-06-30'
GROUP BY Class.Coach
HAVING COUNT(*) >= 50
ORDER BY BookingCount DESC;
Mark-scheme breakdown:
- 1 mark for the join + correct join condition.
- 1 mark for the date-range filter in
WHERE. - 1 mark for
GROUP BY Class.Coach. - 1 mark for
COUNT(*)andHAVING COUNT(*) >= 50. - 1 mark for
ORDER BY BookingCount DESC.
Example 5 (HL): Aggregate Variety
Compute summary statistics about June 2026 bookings: total bookings, average bookings per day, oldest booking date in the month, and newest.
SELECT COUNT(*) AS TotalBookings,
AVG(BookingsPerDay) AS AvgPerDay,
MIN(BookedOn) AS Earliest,
MAX(BookedOn) AS Latest
FROM Booking
WHERE BookedOn BETWEEN '2026-06-01' AND '2026-06-30';
(The AVG(BookingsPerDay) example assumes a per-day pre-computed view, in pure form the average of bookings-per-day would be a slightly more complex grouped query; included here to illustrate the AVG function specifically.)
Quick Check
Q1. Which of the following is a DDL command?
Q2. Which strings does the pattern WHERE Surname LIKE '%berg' match?
Q3. Two tables, Member and Booking, both contain a column called MemberID. What must you do when selecting MemberID in a query that joins them?
Q4. Which clause filters groups produced by GROUP BY?
Q5. The condition WHERE BookedOn BETWEEN '2026-06-01' AND '2026-06-30' matches which dates?
Q6 (HL). Which of the following is NOT one of the five core SQL aggregate functions?
Trace the Query
Use the small sample data below to predict the output of each query.
Member
| MemberID | FirstName | Surname | DateOfBirth | Tier |
|---|---|---|---|---|
| 1001 | Aroha | Kahurangi | 2008-03-14 | Standard |
| 1002 | Marlon | Tan | 1995-07-22 | Premium |
| 1003 | Sebastián | Quispe | 1980-11-02 | Standard |
| 1004 | Priya | Nair | 2003-05-30 | Premium |
Booking
| BookingID | MemberID | ClassID | BookedOn |
|---|---|---|---|
| B9001 | 1001 | C401 | 2026-05-28 |
| B9002 | 1002 | C401 | 2026-05-28 |
| B9003 | 1002 | C402 | 2026-05-29 |
| B9004 | 1004 | C403 | 2026-05-30 |
Type the answer (a number, or a comma-separated list of names, depending on the question).
| Query | Result |
|---|---|
SELECT COUNT(*) FROM Member WHERE Tier = 'Premium'; | |
SELECT FirstName FROM Member WHERE Surname LIKE 'K%'; | |
SELECT COUNT(*) FROM Booking WHERE BookedOn = '2026-05-28'; | |
SELECT MemberID FROM Booking WHERE ClassID = 'C402'; | |
SELECT COUNT(DISTINCT MemberID) FROM Booking; |
Fill in the Blanks
Complete the SQL query that lists the first name and surname of every Premium-tier member who has at least one booking, sorted alphabetically by surname.
SELECT Member.FirstName, Member.Surname
FROM Member
INNER Booking ON Member.MemberID = Booking.MemberID
Member.Tier = 'Premium'
Member.MemberID, Member.FirstName, Member.Surname
HAVING COUNT(*) >= 1
Member.Surname ;
Spot the Error
A student is writing an SQL query to find classes with more than 10 bookings. One line is wrong. Click the line with the error, then choose the correct fix.
Pick the correct fix for line 4:
Predict the Output
Given the sample Member table from the Trace section, what does the following query return?
SELECT FirstName FROM Member WHERE Tier = 'Premium';Type the first names of the matching members, separated by commas in the order they appear in the table.
Given the sample Booking table from the Trace section, what does the following query return?
SELECT COUNT(*) FROM Booking WHERE BookedOn = '2026-05-28';Type the result as a single integer.
Practice Exercises
Core
-
[Core]SQL: SELECT[3 marks]Outline the difference between DDL and DML, naming two commands from each category. -
[Core]SQL: SELECT[3 marks]Write an SQL query against theMembertable that lists the first name and surname of every member born before 1 January 2000, sorted by surname alphabetically. -
[Core]SQL: SELECT[3 marks]Write an SQL query that lists every class whoseTitlecontains the word “Run”, sorted byStartsAt(earliest first).
Extension
-
[Extension]SQL: SELECT[4 marks]Construct an SQL query that lists the first name, surname, and booking date of every booking made by members whose tier is'Premium'. -
[Extension]SQL: SELECT[4 marks]Construct an SQL query that lists the class title of every class booked by Aroha Kahurangi (FirstName = 'Aroha',Surname = 'Kahurangi') in May 2026. -
[Extension]SQL: SELECT[3 marks]Write an SQL query that lists the distinct coaches of every class scheduled in June 2026, sorted alphabetically.
Challenge (HL)
-
[Challenge]Aggregates (HL)[5 marks]Construct an SQL query that, for each membership tier, returns the tier name and the average age (in years) of members in that tier. UseMember.DateOfBirthand assume the functionYEAR(date)extracts the year. -
[Challenge]Aggregates (HL)[5 marks]Construct an SQL query that lists every coach who has run more than 5 classes in 2026, ordered by the number of classes they have run, biggest first. -
[Challenge]Aggregates (HL)[6 marks]Construct an SQL query that lists, for each class title, the total number of confirmed bookings (Status = 'confirmed') made in June 2026, including only classes with at least 10 such bookings, sorted by booking count descending.
Exam-Style
[Exam-Style]SQL: SELECT[2 marks]Given the sample tables above, state the output of the querySELECT COUNT(DISTINCT MemberID) FROM Booking;
Note for IB CS learners: A3.3.2 has a major scope shift in the 2027 syllabus: students must now write SQL, not just describe queries in plain English. A3.3.3 (INSERT/UPDATE/DELETE) and A3.3.4 (HL aggregates) are also new SQL-writing expectations. Expect entirely new question patterns from 2027 onwards.
Connections
- Previous: Normalisation. A 3NF schema is what SQL queries operate on; the JOIN clauses follow the foreign-key links created during normalisation.
- Related: Data Types and Keys. Column data types determine which SQL operators and pattern matches make sense.
- Related: Entity-Relationship Diagrams. A JOIN traces one of the relationship lines on an ERD.
- Next: SQL Updates. The DML commands
INSERT INTO.UPDATE SET, andDELETEthat modify data. - Forward (HL): Transactions and Views. Transactions group multiple SQL statements into an all-or-nothing unit; views package complex SELECT queries as named, reusable objects.