CPU Architecture
IB Syllabus: A1.1.1 – CPU components, A1.1.5 – FDE cycle, A1.1.6 (HL) – Pipelining
Table of Contents
- Key Concepts
- Fetch-Decode-Execute Cycle (A1.1.5)
- Enrichment: Half Adder and Full Adder
- Enrichment: Little Man Computer (LMC)
- HL Section: Pipelining (A1.1.6)
- Worked Examples
- Quick Code Check
- Trace Exercise
- Spot the Error
- Fill in the Blanks
- Predict the Output
- Practice Exercises
- Connections
Key Concepts
CPU Components (A1.1.1)
The Central Processing Unit (CPU) is the brain of the computer. It fetches instructions from memory, decodes them, and executes them. Every program you run – from a web browser to a game – is ultimately a sequence of instructions processed by the CPU.
The CPU contains three main functional units:
ALU (Arithmetic Logic Unit)
- Performs arithmetic operations: addition, subtraction, multiplication, division
- Performs logic operations: AND, OR, NOT, comparisons (equal, greater than, less than)
- Takes inputs from registers and writes results back to the Accumulator (AC)
CU (Control Unit)
- Directs the operation of the entire CPU
- Sends control signals to other components (ALU, memory, I/O devices)
- Manages the timing and sequencing of operations
- Interprets (decodes) instructions fetched from memory
Registers – small, ultra-fast storage locations inside the CPU:
| Register | Full Name | Purpose |
|---|---|---|
| PC | Program Counter | Holds the address of the next instruction to fetch |
| MAR | Memory Address Register | Holds the address currently being accessed in memory |
| MDR | Memory Data Register | Holds the data being read from or written to memory |
| IR | Instruction Register | Holds the current instruction being decoded and executed |
| AC | Accumulator | Stores the result of ALU operations |
A common exam mistake is confusing MAR and MDR. Remember: MAR holds the address (where to look), MDR holds the data (what was found there).
Buses
Buses are electrical pathways that carry signals between CPU components and memory:
| Bus | Direction | Purpose |
|---|---|---|
| Address Bus | Unidirectional (CPU -> memory) | Carries the memory address being accessed |
| Data Bus | Bidirectional (CPU <-> memory) | Carries the data being read or written |
| Control Bus | Bidirectional | Carries control signals: read/write, clock pulse, interrupt |
The address bus is the only unidirectional bus – the CPU tells memory where to look, but memory never sends addresses back. The data bus must be bidirectional because data flows in both directions (reading and writing).
CPU Architecture Diagram
+------------------- CPU -------------------+
| +---------+ +------------------+ |
| | CU | | Registers | |
| | | | PC MAR MDR | |
| | | | IR AC | |
| +----+----+ +--------+---------+ |
| | | |
| +----+-----------------------+-----+ |
| | ALU | |
| +----------------------------------+ |
+-------------------+-------------------+---+
|
+----------+----------+
| Address | Data | Control
| Bus | Bus | Bus
+----------+----------+
|
+-----+-----+
| Memory |
+-----------+
Single-Core vs Multi-Core Processors
A single-core processor has one processing unit and can execute one instruction stream at a time. It gives the illusion of multitasking by rapidly switching between tasks (context switching).
A multi-core processor contains two or more independent cores on the same chip. Each core can execute instructions simultaneously, enabling true parallel processing. A quad-core CPU can genuinely run four independent tasks at the same time.
However, not all tasks benefit equally from multiple cores. Tasks that are sequential (each step depends on the previous result) cannot be parallelised – they run on a single core regardless of how many cores are available.
Co-processors
A co-processor is a specialised processor that works alongside the CPU to handle specific types of computation more efficiently:
- GPU (Graphics Processing Unit) – optimised for parallel graphics and mathematical operations
- DSP (Digital Signal Processor) – optimised for audio, video, and signal processing
- TPU (Tensor Processing Unit) – optimised for machine learning operations
Co-processors free the CPU to focus on general-purpose tasks while offloading specialised work.
Fetch-Decode-Execute Cycle (A1.1.5)
The fetch-decode-execute (FDE) cycle is the fundamental process by which the CPU processes every instruction. It repeats continuously while the computer is running.
Phase 1: Fetch
- The address stored in the PC is copied to the MAR
- The CPU sends a read signal along the control bus and the address along the address bus
- The instruction stored at that memory address is fetched via the data bus into the MDR
- The instruction is copied from the MDR into the IR
- The PC is incremented by 1 (pointing to the next instruction)
Phase 2: Decode
- The CU examines the instruction in the IR
- The CU determines the type of operation (e.g., load, add, store, branch)
- The CU identifies any operands (data or memory addresses the instruction needs)
- The CU prepares the necessary control signals for execution
Phase 3: Execute
- The ALU or other components carry out the decoded instruction
- For arithmetic/logic operations, the ALU performs the calculation and stores the result in the AC
- For memory operations (load/store), data is transferred between registers and memory
- For branch instructions, the PC is updated to the target address
The cycle then repeats from Fetch, using the updated PC to get the next instruction.
The FDE cycle is one of the most frequently examined topics on IB exams. You must be able to describe each phase step by step, naming the specific registers involved.
Enrichment: Half Adder and Full Adder
This goes beyond the IB syllabus but helps build understanding of how the ALU performs arithmetic using logic gates.
The ALU does not “know” how to add numbers the way humans do. Instead, it uses combinations of logic gates to perform binary addition.
Half Adder
A half adder adds two single binary digits (A and B) and produces:
- Sum = A XOR B
- Carry = A AND B
| A | B | Sum (A XOR B) | Carry (A AND B) |
|---|---|---|---|
| 0 | 0 | 0 | 0 |
| 0 | 1 | 1 | 0 |
| 1 | 0 | 1 | 0 |
| 1 | 1 | 0 | 1 |
Full Adder
A full adder extends the half adder by accepting a carry-in from a previous addition. This allows multiple adders to be chained together to add multi-bit numbers.
- Sum = A XOR B XOR Carry-in
- Carry-out = (A AND B) OR (Carry-in AND (A XOR B))
Connection to the ALU
By chaining full adders together (one for each bit), the ALU can add numbers of any bit width. An 8-bit adder uses 8 full adders in sequence, with the carry-out of each feeding into the carry-in of the next. This is a ripple-carry adder.
This is why understanding logic gates is the foundation for understanding how CPUs actually compute.
Enrichment: Little Man Computer (LMC)
The LMC is a simplified model that demonstrates how a CPU processes instructions. It is not part of the IB syllabus but is a useful teaching tool.
The Little Man Computer is an instructional model of a simple CPU. It imagines a “little man” inside the computer who follows a fixed procedure to process instructions.
The LMC has a limited instruction set:
| Mnemonic | Code | Description |
|---|---|---|
| LDA | 5xx | Load the value from address xx into the accumulator |
| STA | 3xx | Store the accumulator value at address xx |
| ADD | 1xx | Add the value at address xx to the accumulator |
| SUB | 2xx | Subtract the value at address xx from the accumulator |
| INP | 901 | Read input into the accumulator |
| OUT | 902 | Output the value in the accumulator |
| BRA | 6xx | Branch (jump) to address xx unconditionally |
| BRZ | 7xx | Branch to address xx if accumulator is zero |
| BRP | 8xx | Branch to address xx if accumulator is positive or zero |
| HLT | 000 | Halt the program |
The LMC follows the same fetch-decode-execute pattern as a real CPU, making it an excellent way to trace through programs and understand how registers change at each step.
HL Section: Pipelining (A1.1.6)
HL Only – The following section on pipelining is assessed at HL level only.
What is Pipelining?
Without pipelining, the CPU completes all phases of one instruction before starting the next. This means the fetch unit sits idle while the ALU executes, and the ALU sits idle while the next instruction is fetched.
Pipelining overlaps the phases of multiple instructions so that different parts of the CPU are active simultaneously.
4-Stage Pipeline
| Clock Cycle | Fetch | Decode | Execute | Write-back |
|---|---|---|---|---|
| 1 | I1 | – | – | – |
| 2 | I2 | I1 | – | – |
| 3 | I3 | I2 | I1 | – |
| 4 | I4 | I3 | I2 | I1 |
| 5 | I5 | I4 | I3 | I2 |
| 6 | – | I5 | I4 | I3 |
| 7 | – | – | I5 | I4 |
| 8 | – | – | – | I5 |
Without pipelining, 5 instructions would take 20 clock cycles (4 stages x 5 instructions). With pipelining, the same 5 instructions complete in just 8 cycles. Once the pipeline is full (cycle 4 onwards), one instruction completes every clock cycle.
Why Pipelining Improves Throughput
- Each CPU component (fetch unit, decoder, ALU, write-back unit) is kept busy rather than idle
- Throughput increases – more instructions completed per unit of time
- Individual instruction latency does not decrease (each still takes 4 cycles), but overall program execution is faster
Pipeline Hazards
Pipelining does not always work perfectly. Hazards are situations that prevent the next instruction from executing in the expected clock cycle:
Data Dependency Hazard Instruction 2 needs the result of Instruction 1, but Instruction 1 has not finished executing yet. Example: ADD R1, R2 followed by STORE R1 – the second instruction needs the updated value of R1.
Branch Hazard A conditional branch instruction means the CPU does not know which instruction comes next until the branch condition is evaluated. Instructions fetched speculatively into the pipeline may need to be discarded.
Solutions: CPUs use techniques like forwarding (passing results directly between pipeline stages), stalling (inserting a wait), and branch prediction (guessing which way a branch will go).
Why Multi-Core Processors Help
Even with pipelining, a single core can only execute one instruction stream. Multi-core processors run multiple independent pipelines simultaneously, each processing a separate thread of execution. This is especially beneficial for:
- Running multiple applications at the same time
- Parallelisable tasks (e.g., rendering different parts of an image)
- Server workloads handling many simultaneous requests
Worked Examples
Example 1: Labelling the CPU
Label each component in the CPU diagram below:
+------------------- CPU -------------------+
| +---------+ +------------------+ |
| | (A) | | (B) Registers | |
| | | | (C) (D) (E) | |
| | | | (F) (G) | |
| +----+----+ +--------+---------+ |
| | | |
| +----+-----------------------+-----+ |
| | (H) | |
| +----------------------------------+ |
+-------------------+-------------------+---+
|
+----------+----------+
| (I) | (J) | (K)
+----------+----------+
|
+-----+-----+
| (L) |
+-----------+
Answers:
| Label | Component | Role |
|---|---|---|
| (A) | Control Unit (CU) | Directs operations, sends control signals |
| (B) | Registers | Fast internal storage |
| (C) | Program Counter (PC) | Address of next instruction |
| (D) | Memory Address Register (MAR) | Address being accessed |
| (E) | Memory Data Register (MDR) | Data being read/written |
| (F) | Instruction Register (IR) | Current instruction being decoded |
| (G) | Accumulator (AC) | Stores ALU results |
| (H) | Arithmetic Logic Unit (ALU) | Performs arithmetic and logic operations |
| (I) | Address Bus | Carries memory addresses (unidirectional) |
| (J) | Data Bus | Carries data (bidirectional) |
| (K) | Control Bus | Carries control signals (bidirectional) |
| (L) | Memory (RAM) | Stores instructions and data |
Example 2: FDE Cycle Trace
Trace the FDE cycle for these three instructions. Assume the initial memory state:
| Address | Contents |
|---|---|
| 5 | LOAD 20 |
| 6 | ADD 21 |
| 7 | STORE 22 |
| 20 | 42 |
| 21 | 8 |
| 22 | 0 |
PC starts at 5.
Instruction 1: LOAD 20 (load value from address 20 into AC)
| Phase | Step | PC | MAR | MDR | IR | AC |
|---|---|---|---|---|---|---|
| Fetch | PC -> MAR | 5 | 5 | – | – | – |
| Fetch | Memory[5] -> MDR | 5 | 5 | LOAD 20 | – | – |
| Fetch | MDR -> IR; PC+1 | 6 | 5 | LOAD 20 | LOAD 20 | – |
| Decode | CU reads IR | 6 | 5 | LOAD 20 | LOAD 20 | – |
| Execute | MAR = 20; Memory[20] -> MDR | 6 | 20 | 42 | LOAD 20 | – |
| Execute | MDR -> AC | 6 | 20 | 42 | LOAD 20 | 42 |
Instruction 2: ADD 21 (add value from address 21 to AC)
| Phase | Step | PC | MAR | MDR | IR | AC |
|---|---|---|---|---|---|---|
| Fetch | PC -> MAR | 6 | 6 | – | – | 42 |
| Fetch | Memory[6] -> MDR | 6 | 6 | ADD 21 | – | 42 |
| Fetch | MDR -> IR; PC+1 | 7 | 6 | ADD 21 | ADD 21 | 42 |
| Decode | CU reads IR | 7 | 6 | ADD 21 | ADD 21 | 42 |
| Execute | MAR = 21; Memory[21] -> MDR | 7 | 21 | 8 | ADD 21 | 42 |
| Execute | AC = AC + MDR | 7 | 21 | 8 | ADD 21 | 50 |
Instruction 3: STORE 22 (store AC value to address 22)
| Phase | Step | PC | MAR | MDR | IR | AC |
|---|---|---|---|---|---|---|
| Fetch | PC -> MAR | 7 | 7 | – | – | 50 |
| Fetch | Memory[7] -> MDR | 7 | 7 | STORE 22 | – | 50 |
| Fetch | MDR -> IR; PC+1 | 8 | 7 | STORE 22 | STORE 22 | 50 |
| Decode | CU reads IR | 8 | 7 | STORE 22 | STORE 22 | 50 |
| Execute | MAR = 22; MDR = AC | 8 | 22 | 50 | STORE 22 | 50 |
| Execute | MDR -> Memory[22] | 8 | 22 | 50 | STORE 22 | 50 |
After execution, Memory[22] = 50 (which is 42 + 8).
Quick Code Check
Q1. Which register holds the address of the next instruction to be fetched?
Q2. What is the role of the ALU?
Q3. Which bus is unidirectional?
Q4. What happens during the Fetch phase of the FDE cycle?
Q5. (HL) What is the main benefit of pipelining?
Trace Exercise
Trace the FDE cycle for two instructions. Fill in the register values at each phase.
Initial memory state:
| Address | Contents |
|---|---|
| 10 | LOAD 20 |
| 11 | ADD 21 |
| 20 | 42 |
| 21 | 8 |
PC starts at 10. All other registers start empty (shown as –).
Instruction 1: LOAD 20
| Phase | PC | MAR | MDR | IR | AC |
|---|---|---|---|---|---|
| Before Fetch | 10 | -- | -- | -- | -- |
| After Fetch | -- | ||||
| After Execute |
Instruction 2: ADD 21
| Phase | PC | MAR | MDR | IR | AC |
|---|---|---|---|---|---|
| After Fetch | |||||
| After Execute |
Final value in the Accumulator:
Spot the Error
A student describes the Fetch phase of the FDE cycle but gets the register roles wrong. One line contains an error about which register receives the fetched instruction. Click the line with the error, then pick the correct fix.
Pick the correct fix for line 3:
Remember the fetch pathway: PC -> MAR -> Memory -> MDR -> IR. The MAR only ever holds addresses. The MDR is the “middle step” that receives data from memory before it goes to its final destination (IR for instructions, AC for data values).
Fill in the Blanks
Complete the description of the FDE cycle by filling in the correct register names.
The Fetch-Decode-Execute cycle describes how the CPU processes each instruction. Fill in the blanks with the correct register or component names:
FETCH PHASE:
1. The address in the is copied to the .
2. The instruction at that address is fetched from memory into the .
3. The instruction is then copied from the MDR into the .
4. The is incremented by 1.
DECODE PHASE:
5. The interprets the instruction stored in the IR.
EXECUTE PHASE:
6. The performs the operation.
7. The result is stored in the .
Predict the Output
Given the following initial CPU state and memory contents, the CPU executes two complete FDE cycles. The first instruction loads a value into the accumulator, and the second adds another value to it. What is the final value in the AC after both instructions execute?
PC = 0
Memory:
Address 0: LOAD 5
Address 1: ADD 6
Address 5: 7
Address 6: 3
AC before execution: 0Final value in AC:
The CPU processes three instructions in sequence. The first loads a value, the second subtracts another value from it, and the third stores the result in memory. What value is stored at memory address 22 after all three instructions execute?
PC = 10, AC = 15
Memory:
Address 10: LOAD 20
Address 11: SUB 21
Address 12: STORE 22
Address 20: 42
Address 21: 12
Address 22: 0Value stored at address 22:
Practice Exercises
Core
-
Register roles – Describe the role of each of the following registers in one sentence each: PC, MAR, MDR, IR, AC.
-
FDE phases – Describe the three phases of the fetch-decode-execute cycle. For each phase, name the registers involved and explain what happens to them.
-
Bus types – Name the three types of bus in a computer system. For each bus, state its direction (unidirectional or bidirectional) and what it carries.
Extension
-
FDE trace – Given the following memory contents and PC = 0, trace all register values through each phase of the FDE cycle for all three instructions:
Address Contents 0 LOAD 10 1 ADD 11 2 STORE 12 10 25 11 17 12 0 -
Bus analysis – A program loads a value from memory address 100. Explain which buses are used during this operation, what signals travel on each bus, and in which direction.
Challenge
-
Single-core vs multi-core – A program has two tasks: Task A takes 4 seconds and Task B takes 3 seconds. Task B depends on the output of Task A (it cannot start until A finishes). Compare the total execution time on a single-core processor versus a dual-core processor. Then consider a different scenario where Task A and Task B are independent – how does the comparison change?
-
(HL) Pipeline hazards – Consider this sequence of instructions:
I1: ADD R1, R2, R3 (R1 = R2 + R3) I2: SUB R4, R1, R5 (R4 = R1 - R5) I3: LOAD R6, [100] (R6 = Memory[100])Explain why a data dependency hazard occurs between I1 and I2. Describe two techniques the CPU could use to handle this hazard. Would I3 be affected by the same hazard? Why or why not?
Connections
- Prerequisites: Logic Gates – gates build the ALU; understanding AND, OR, XOR is essential for understanding how arithmetic works at the hardware level
- Related: Primary Memory – the CPU reads instructions and data from memory and writes results back; understanding the memory hierarchy explains why registers are faster than RAM
- Related: GPU – a co-processor that works alongside the CPU; the HL comparison (A1.1.3) requires understanding CPU architecture first
- Forward: Operating Systems Layer – the OS schedules processes on CPU cores, manages context switching between tasks, and handles interrupts that interact with the FDE cycle