GPU

IB Syllabus: A1.1.2 — GPU, A1.1.3 (HL) — CPU vs GPU comparison

Key Concepts
Worked Examples
1. Example 1: Which Processor?
2. Example 2: GPU Parallelism Analogy
Quick Code Check
Spot the Error
Fill in the Blanks
Predict the Output
Practice Exercises
1. Core
2. Extension
3. Challenge
Connections

Key Concepts

What is a GPU?

A GPU (Graphics Processing Unit) is a specialised processor designed for parallel computation. While a CPU has a small number of complex cores optimised for sequential tasks, a GPU contains thousands of simple cores that execute the same operation on many data points simultaneously.

GPUs were originally built for graphics rendering — calculating the colour and position of millions of pixels every frame. Today they are also used for general-purpose computing (GPGPU), where any problem that involves repetitive calculations on large datasets can benefit from GPU acceleration.

Real-World Applications

Gaming — rendering 3D graphics in real-time; calculating lighting, shadows, and textures for millions of pixels per frame
AI / Machine Learning — training neural networks involves massive matrix multiplications that map perfectly to parallel execution
Scientific simulations — weather modelling, molecular dynamics, fluid simulations — all require the same calculation applied to millions of data points
Video rendering — encoding and decoding video frames in parallel dramatically reduces processing time
Cryptocurrency mining — performing billions of hash calculations in parallel (historical note: this drove GPU shortages in 2017-2021)

GPU Architecture Overview

A GPU is organised into Streaming Multiprocessors (SMs), each containing many simple cores. Key architectural features include:

Massive parallelism — thousands of cores working simultaneously
High memory bandwidth — optimised for moving large amounts of data quickly (throughput), rather than accessing individual values quickly (latency)
SIMD processing — Same Instruction, Multiple Data — every core executes the same instruction on different data points at the same time
Simple cores — each core is far less capable than a CPU core, but the sheer number of them compensates when the workload is parallelisable

Enrichment: Rendering Pipeline

This goes beyond the IB syllabus but helps build understanding.

When a GPU renders a 3D scene, data flows through a series of stages called the rendering pipeline:

Vertex processing — transforms 3D coordinates of each vertex (corner point) into 2D screen positions; applies rotation, scaling, and perspective
Rasterization — converts the geometric shapes (triangles) into individual pixels on the screen; determines which pixels each triangle covers
Fragment processing — calculates the final colour of each pixel by applying textures, lighting, and shading effects
Display — the completed frame is sent to the monitor’s frame buffer for display

Each stage processes many elements in parallel — thousands of vertices, then millions of pixels — which is why GPUs need so many cores.

Enrichment: Specialised GPU Cores

This goes beyond the IB syllabus but helps build understanding.

Modern GPUs contain different types of specialised cores:

CUDA cores (NVIDIA) / Stream processors (AMD) — general-purpose parallel processing cores that handle a wide range of computations
Tensor cores — optimised specifically for matrix operations used in AI and machine learning; can perform mixed-precision matrix multiply-and-accumulate in a single operation
RT cores — dedicated hardware for real-time ray tracing, which simulates realistic lighting by tracing the path of light rays as they bounce off surfaces

HL Section: CPU vs GPU (A1.1.3)

HL Only — The following section on CPU vs GPU comparison is assessed at HL level only.

Detailed Comparison

Aspect	CPU	GPU
Design philosophy	Latency-optimised	Throughput-optimised
Core count	Few (4-64) complex cores	Many (thousands) simple cores
Core complexity	Full ALU, branch prediction, out-of-order execution	Simple ALU, in-order execution
Processing model	Sequential / few parallel threads	Massively parallel (SIMD/SIMT)
Memory access	Large caches, low latency	High bandwidth, higher latency
Best for	Complex logic, branching, sequential tasks	Repetitive calculations on large datasets
Power efficiency	Better per-core	Better per-operation (for parallel workloads)
Task coordination	OS handles threads	GPU scheduler assigns warps/wavefronts

Why GPUs Struggle with Branching

GPUs achieve their speed by executing the same instruction across all cores simultaneously (SIMD — Single Instruction, Multiple Data). When code contains many if/else branches, different cores may need to follow different paths. The GPU handles this by running both branches and discarding the unused results, which wastes processing power. This is called branch divergence.

For tasks with heavy conditional logic — such as compiling code, running an operating system, or processing complex business rules — a CPU’s sophisticated branch prediction and out-of-order execution make it far more efficient.

SIMD Concept

Single Instruction, Multiple Data means one instruction is broadcast to many cores, each operating on a different piece of data. For example, to brighten an image, the instruction “add 20 to pixel value” is sent once but executed on thousands of pixels simultaneously. This is why GPUs excel at image processing, matrix math, and physics simulations — they all involve applying the same operation to large datasets.

Worked Examples

Example 1: Which Processor?

For each task, determine whether a CPU or GPU would be more suitable, and explain why.

Task	Best processor	Reasoning
Word processing	CPU	Sequential operations with complex branching (spell-check, formatting rules); low parallelism
Training a neural network	GPU	Involves massive matrix multiplications — the same operation applied to millions of weights in parallel
Compiling code	CPU	Complex dependency analysis between code files; sequential stages with heavy branching
Real-time 3D rendering	GPU	Millions of pixels must be calculated every frame; same lighting/shading calculation applied to each pixel in parallel

The pattern: if a task involves repeating the same simple calculation on many data points, the GPU wins. If the task involves complex decision-making, branching, or sequential dependencies, the CPU wins.

Example 2: GPU Parallelism Analogy

Consider a school analogy to understand the difference:

CPU approach = 1 expert teacher grading 30 essays one at a time. Each essay requires deep reading, nuanced judgment, and different feedback. The teacher is fast per essay (complex reasoning) but must work sequentially. Total time: 30 essays at 10 minutes each = 300 minutes.
GPU approach = 1,000 teaching assistants each grading 1 simple multiple-choice quiz simultaneously. Each quiz is identical in format and requires only comparing answers to an answer key (same operation, different data). Each TA is slower than the expert teacher at complex tasks, but the work is done in parallel. Total time: 1,000 quizzes in about 1 minute.

The key insight: GPUs are not “faster” than CPUs in general. They are faster at specific types of work — simple, repetitive operations that can be split across many cores.

Quick Code Check

Q1. What is the main advantage of a GPU compared to a CPU?

Q2. Which of the following tasks is least suitable for GPU processing?

Q3. What does the acronym SIMD stand for?

Q4. Why are GPUs widely used for training AI models?

Spot the Error

A student wrote this description of GPU architecture for their exam. One line contains a common misconception. Click the incorrect line, then pick the fix.

1A GPU is a specialised processor designed for parallel computation. 2GPUs are used for graphics rendering, AI training, and scientific simulations. 3A GPU has fewer but more powerful cores than a CPU, making it faster at everything. 4GPUs use a processing model called SIMD (Single Instruction, Multiple Data). 5The GPU's high memory bandwidth is optimised for throughput rather than latency.

Pick the correct fix for this line:

Fill in the Blanks

Complete the description of GPU architecture by filling in each blank.

Fill in the blanks to complete this summary of how GPUs work:

A GPU contains  of simple cores, while a CPU has
 complex cores.

GPUs are optimised for  rather than latency.

The processing model used by GPUs is called 
(same instruction on multiple data points).

Two major GPU applications are training  models
and rendering  graphics, because these involve
repetitive calculations on large datasets.

Predict the Output

A program needs to sort a list of 1,000 names alphabetically, with many conditional comparisons and string operations. Each comparison depends on the result of previous swaps. Would a CPU or GPU be more suitable for this task?

Type your answer:

A photo editing application needs to increase the brightness of every pixel in a 4K image (over 8 million pixels) by adding 20 to each colour channel. Would a CPU or GPU be more suitable?