GPU
IB Syllabus: A1.1.2 — GPU, A1.1.3 (HL) — CPU vs GPU comparison
Table of Contents
- Key Concepts
- Worked Examples
- Quick Code Check
- Spot the Error
- Fill in the Blanks
- Predict the Output
- Practice Exercises
- Connections
Key Concepts
What is a GPU?
A GPU (Graphics Processing Unit) is a specialised processor designed for parallel computation. While a CPU has a small number of complex cores optimised for sequential tasks, a GPU contains thousands of simple cores that execute the same operation on many data points simultaneously.
GPUs were originally built for graphics rendering — calculating the colour and position of millions of pixels every frame. Today they are also used for general-purpose computing (GPGPU), where any problem that involves repetitive calculations on large datasets can benefit from GPU acceleration.
Real-World Applications
- Gaming — rendering 3D graphics in real-time; calculating lighting, shadows, and textures for millions of pixels per frame
- AI / Machine Learning — training neural networks involves massive matrix multiplications that map perfectly to parallel execution
- Scientific simulations — weather modelling, molecular dynamics, fluid simulations — all require the same calculation applied to millions of data points
- Video rendering — encoding and decoding video frames in parallel dramatically reduces processing time
- Cryptocurrency mining — performing billions of hash calculations in parallel (historical note: this drove GPU shortages in 2017-2021)
GPU Architecture Overview
A GPU is organised into Streaming Multiprocessors (SMs), each containing many simple cores. Key architectural features include:
- Massive parallelism — thousands of cores working simultaneously
- High memory bandwidth — optimised for moving large amounts of data quickly (throughput), rather than accessing individual values quickly (latency)
- SIMD processing — Same Instruction, Multiple Data — every core executes the same instruction on different data points at the same time
- Simple cores — each core is far less capable than a CPU core, but the sheer number of them compensates when the workload is parallelisable
Enrichment: Rendering Pipeline
This goes beyond the IB syllabus but helps build understanding.
When a GPU renders a 3D scene, data flows through a series of stages called the rendering pipeline:
- Vertex processing — transforms 3D coordinates of each vertex (corner point) into 2D screen positions; applies rotation, scaling, and perspective
- Rasterization — converts the geometric shapes (triangles) into individual pixels on the screen; determines which pixels each triangle covers
- Fragment processing — calculates the final colour of each pixel by applying textures, lighting, and shading effects
- Display — the completed frame is sent to the monitor’s frame buffer for display
Each stage processes many elements in parallel — thousands of vertices, then millions of pixels — which is why GPUs need so many cores.
Enrichment: Specialised GPU Cores
This goes beyond the IB syllabus but helps build understanding.
Modern GPUs contain different types of specialised cores:
- CUDA cores (NVIDIA) / Stream processors (AMD) — general-purpose parallel processing cores that handle a wide range of computations
- Tensor cores — optimised specifically for matrix operations used in AI and machine learning; can perform mixed-precision matrix multiply-and-accumulate in a single operation
- RT cores — dedicated hardware for real-time ray tracing, which simulates realistic lighting by tracing the path of light rays as they bounce off surfaces
HL Section: CPU vs GPU (A1.1.3)
HL Only — The following section on CPU vs GPU comparison is assessed at HL level only.
Detailed Comparison
| Aspect | CPU | GPU |
|---|---|---|
| Design philosophy | Latency-optimised | Throughput-optimised |
| Core count | Few (4-64) complex cores | Many (thousands) simple cores |
| Core complexity | Full ALU, branch prediction, out-of-order execution | Simple ALU, in-order execution |
| Processing model | Sequential / few parallel threads | Massively parallel (SIMD/SIMT) |
| Memory access | Large caches, low latency | High bandwidth, higher latency |
| Best for | Complex logic, branching, sequential tasks | Repetitive calculations on large datasets |
| Power efficiency | Better per-core | Better per-operation (for parallel workloads) |
| Task coordination | OS handles threads | GPU scheduler assigns warps/wavefronts |
Why GPUs Struggle with Branching
GPUs achieve their speed by executing the same instruction across all cores simultaneously (SIMD — Single Instruction, Multiple Data). When code contains many if/else branches, different cores may need to follow different paths. The GPU handles this by running both branches and discarding the unused results, which wastes processing power. This is called branch divergence.
For tasks with heavy conditional logic — such as compiling code, running an operating system, or processing complex business rules — a CPU’s sophisticated branch prediction and out-of-order execution make it far more efficient.
SIMD Concept
Single Instruction, Multiple Data means one instruction is broadcast to many cores, each operating on a different piece of data. For example, to brighten an image, the instruction “add 20 to pixel value” is sent once but executed on thousands of pixels simultaneously. This is why GPUs excel at image processing, matrix math, and physics simulations — they all involve applying the same operation to large datasets.
Worked Examples
Example 1: Which Processor?
For each task, determine whether a CPU or GPU would be more suitable, and explain why.
| Task | Best processor | Reasoning |
|---|---|---|
| Word processing | CPU | Sequential operations with complex branching (spell-check, formatting rules); low parallelism |
| Training a neural network | GPU | Involves massive matrix multiplications — the same operation applied to millions of weights in parallel |
| Compiling code | CPU | Complex dependency analysis between code files; sequential stages with heavy branching |
| Real-time 3D rendering | GPU | Millions of pixels must be calculated every frame; same lighting/shading calculation applied to each pixel in parallel |
The pattern: if a task involves repeating the same simple calculation on many data points, the GPU wins. If the task involves complex decision-making, branching, or sequential dependencies, the CPU wins.
Example 2: GPU Parallelism Analogy
Consider a school analogy to understand the difference:
-
CPU approach = 1 expert teacher grading 30 essays one at a time. Each essay requires deep reading, nuanced judgment, and different feedback. The teacher is fast per essay (complex reasoning) but must work sequentially. Total time: 30 essays at 10 minutes each = 300 minutes.
-
GPU approach = 1,000 teaching assistants each grading 1 simple multiple-choice quiz simultaneously. Each quiz is identical in format and requires only comparing answers to an answer key (same operation, different data). Each TA is slower than the expert teacher at complex tasks, but the work is done in parallel. Total time: 1,000 quizzes in about 1 minute.
The key insight: GPUs are not “faster” than CPUs in general. They are faster at specific types of work — simple, repetitive operations that can be split across many cores.
Quick Code Check
Q1. What is the main advantage of a GPU compared to a CPU?
Q2. Which of the following tasks is least suitable for GPU processing?
Q3. What does the acronym SIMD stand for?
Q4. Why are GPUs widely used for training AI models?
Spot the Error
A student wrote this description of GPU architecture for their exam. One line contains a common misconception. Click the incorrect line, then pick the fix.
Pick the correct fix for this line:
Fill in the Blanks
Complete the description of GPU architecture by filling in each blank.
Fill in the blanks to complete this summary of how GPUs work:
A GPU contains of simple cores, while a CPU has
complex cores.
GPUs are optimised for rather than latency.
The processing model used by GPUs is called
(same instruction on multiple data points).
Two major GPU applications are training models
and rendering graphics, because these involve
repetitive calculations on large datasets.
Predict the Output
A program needs to sort a list of 1,000 names alphabetically, with many conditional comparisons and string operations. Each comparison depends on the result of previous swaps. Would a CPU or GPU be more suitable for this task?
Type your answer:
A photo editing application needs to increase the brightness of every pixel in a 4K image (over 8 million pixels) by adding 20 to each colour channel. Would a CPU or GPU be more suitable?
Type your answer:
Practice Exercises
Core
-
Real-World Scenarios — Describe the GPU’s role in three different real-world scenarios. For each, explain what kind of data the GPU processes in parallel and why a CPU alone would be too slow.
-
Architecture Differences — List three key differences between CPU and GPU architecture. For each difference, explain how it affects what tasks the processor is best suited for.
-
Gaming and GPUs — Explain why GPUs are essential for modern gaming. In your answer, reference the number of pixels that must be calculated per frame and why parallel processing is necessary.
Extension
-
Database Evaluation — A school runs a student database system that handles enrollment records, generates report cards, and processes individual student queries. Evaluate whether adding a GPU would significantly improve performance. Justify your answer by analysing the types of operations involved.
-
Image Processing Comparison — Compare how a CPU and GPU would each handle processing 1 million image pixels (e.g., converting a photo from colour to greyscale). Describe the approach each processor would take and estimate the relative time difference.
Challenge
-
System Design — A company wants to train an AI model during the day and run their office productivity software (email, spreadsheets, word processing) throughout the day. Design a system that uses both CPU and GPU effectively, explaining the role of each processor for each type of workload.
-
(HL) Branch Divergence — Explain why a GPU would struggle with a program that has many
if/elsebranches and sequential dependencies. Use the concept of SIMD and branch divergence in your answer, and give a specific example of such a program.
Connections
- Prerequisites: CPU Architecture — understand CPU components and the FDE cycle before comparing to GPU
- Related: Logic Gates — logic gates form the fundamental building blocks of both CPU and GPU cores
- Forward: Cloud Computing — cloud providers offer GPU instances (e.g., AWS, Google Cloud) for AI/ML workloads
- Forward: Applications Layer — machine learning applications rely on GPU acceleration for training and inference