CPU Architecture

IB Syllabus: A1.1.1 – CPU components, A1.1.5 – FDE cycle, A1.1.6 (HL) – Pipelining

Key Concepts
Fetch-Decode-Execute Cycle (A1.1.5)
Enrichment: Half Adder and Full Adder
Enrichment: Little Man Computer (LMC)
HL Section: Pipelining (A1.1.6)
Worked Examples
1. Example 1: Labelling the CPU
2. Example 2: FDE Cycle Trace
Quick Code Check
Trace Exercise
Spot the Error
Fill in the Blanks
Predict the Output
Practice Exercises
1. Core
2. Extension
3. Challenge
Connections

Key Concepts

CPU Components (A1.1.1)

The Central Processing Unit (CPU) is the brain of the computer. It fetches instructions from memory, decodes them, and executes them. Every program you run – from a web browser to a game – is ultimately a sequence of instructions processed by the CPU.

The CPU contains three main functional units:

ALU (Arithmetic Logic Unit)

Performs arithmetic operations: addition, subtraction, multiplication, division
Performs logic operations: AND, OR, NOT, comparisons (equal, greater than, less than)
Takes inputs from registers and writes results back to the Accumulator (AC)

CU (Control Unit)

Directs the operation of the entire CPU
Sends control signals to other components (ALU, memory, I/O devices)
Manages the timing and sequencing of operations
Interprets (decodes) instructions fetched from memory

Registers – small, ultra-fast storage locations inside the CPU:

Register	Full Name	Purpose
PC	Program Counter	Holds the address of the next instruction to fetch
MAR	Memory Address Register	Holds the address currently being accessed in memory
MDR	Memory Data Register	Holds the data being read from or written to memory
IR	Instruction Register	Holds the current instruction being decoded and executed
AC	Accumulator	Stores the result of ALU operations

A common exam mistake is confusing MAR and MDR. Remember: MAR holds the address (where to look), MDR holds the data (what was found there).

Buses

Buses are electrical pathways that carry signals between CPU components and memory:

Bus	Direction	Purpose
Address Bus	Unidirectional (CPU -> memory)	Carries the memory address being accessed
Data Bus	Bidirectional (CPU <-> memory)	Carries the data being read or written
Control Bus	Bidirectional	Carries control signals: read/write, clock pulse, interrupt

The address bus is the only unidirectional bus – the CPU tells memory where to look, but memory never sends addresses back. The data bus must be bidirectional because data flows in both directions (reading and writing).

CPU Architecture Diagram

+------------------- CPU -------------------+
|  +---------+         +------------------+ |
|  |   CU    |         |    Registers     | |
|  |         |         |  PC  MAR  MDR    | |
|  |         |         |  IR  AC          | |
|  +----+----+         +--------+---------+ |
|       |                       |           |
|  +----+-----------------------+-----+     |
|  |              ALU                 |     |
|  +----------------------------------+     |
+-------------------+-------------------+---+
                    |
         +----------+----------+
         | Address  | Data     | Control
         | Bus      | Bus      | Bus
         +----------+----------+
                    |
              +-----+-----+
              |  Memory   |
              +-----------+

Single-Core vs Multi-Core Processors

A single-core processor has one processing unit and can execute one instruction stream at a time. It gives the illusion of multitasking by rapidly switching between tasks (context switching).

A multi-core processor contains two or more independent cores on the same chip. Each core can execute instructions simultaneously, enabling true parallel processing. A quad-core CPU can genuinely run four independent tasks at the same time.

However, not all tasks benefit equally from multiple cores. Tasks that are sequential (each step depends on the previous result) cannot be parallelised – they run on a single core regardless of how many cores are available.

Co-processors

A co-processor is a specialised processor that works alongside the CPU to handle specific types of computation more efficiently:

GPU (Graphics Processing Unit) – optimised for parallel graphics and mathematical operations
DSP (Digital Signal Processor) – optimised for audio, video, and signal processing
TPU (Tensor Processing Unit) – optimised for machine learning operations

Co-processors free the CPU to focus on general-purpose tasks while offloading specialised work.

Fetch-Decode-Execute Cycle (A1.1.5)

The fetch-decode-execute (FDE) cycle is the fundamental process by which the CPU processes every instruction. It repeats continuously while the computer is running.

Phase 1: Fetch

The address stored in the PC is copied to the MAR
The CPU sends a read signal along the control bus and the address along the address bus
The instruction stored at that memory address is fetched via the data bus into the MDR
The instruction is copied from the MDR into the IR
The PC is incremented by 1 (pointing to the next instruction)

Phase 2: Decode

The CU examines the instruction in the IR
The CU determines the type of operation (e.g., load, add, store, branch)
The CU identifies any operands (data or memory addresses the instruction needs)
The CU prepares the necessary control signals for execution

Phase 3: Execute

The ALU or other components carry out the decoded instruction
For arithmetic/logic operations, the ALU performs the calculation and stores the result in the AC
For memory operations (load/store), data is transferred between registers and memory
For branch instructions, the PC is updated to the target address

The cycle then repeats from Fetch, using the updated PC to get the next instruction.

The FDE cycle is one of the most frequently examined topics on IB exams. You must be able to describe each phase step by step, naming the specific registers involved.

Enrichment: Half Adder and Full Adder

This goes beyond the IB syllabus but helps build understanding of how the ALU performs arithmetic using logic gates.

The ALU does not “know” how to add numbers the way humans do. Instead, it uses combinations of logic gates to perform binary addition.

Half Adder

A half adder adds two single binary digits (A and B) and produces:

Sum = A XOR B
Carry = A AND B

A	B	Sum (A XOR B)	Carry (A AND B)
0	0	0	0
0	1	1	0
1	0	1	0
1	1	0	1

Full Adder

A full adder extends the half adder by accepting a carry-in from a previous addition. This allows multiple adders to be chained together to add multi-bit numbers.

Sum = A XOR B XOR Carry-in
Carry-out = (A AND B) OR (Carry-in AND (A XOR B))

Connection to the ALU

By chaining full adders together (one for each bit), the ALU can add numbers of any bit width. An 8-bit adder uses 8 full adders in sequence, with the carry-out of each feeding into the carry-in of the next. This is a ripple-carry adder.

This is why understanding logic gates is the foundation for understanding how CPUs actually compute.

Enrichment: Little Man Computer (LMC)

The LMC is a simplified model that demonstrates how a CPU processes instructions. It is not part of the IB syllabus but is a useful teaching tool.

The Little Man Computer is an instructional model of a simple CPU. It imagines a “little man” inside the computer who follows a fixed procedure to process instructions.

The LMC has a limited instruction set:

Mnemonic	Code	Description
LDA	5xx	Load the value from address xx into the accumulator
STA	3xx	Store the accumulator value at address xx
ADD	1xx	Add the value at address xx to the accumulator
SUB	2xx	Subtract the value at address xx from the accumulator
INP	901	Read input into the accumulator
OUT	902	Output the value in the accumulator
BRA	6xx	Branch (jump) to address xx unconditionally
BRZ	7xx	Branch to address xx if accumulator is zero
BRP	8xx	Branch to address xx if accumulator is positive or zero
HLT	000	Halt the program

The LMC follows the same fetch-decode-execute pattern as a real CPU, making it an excellent way to trace through programs and understand how registers change at each step.

HL Section: Pipelining (A1.1.6)

HL Only – The following section on pipelining is assessed at HL level only.

What is Pipelining?

Without pipelining, the CPU completes all phases of one instruction before starting the next. This means the fetch unit sits idle while the ALU executes, and the ALU sits idle while the next instruction is fetched.

Pipelining overlaps the phases of multiple instructions so that different parts of the CPU are active simultaneously.

4-Stage Pipeline

Clock Cycle	Fetch	Decode	Execute	Write-back
1	I1	–	–	–
2	I2	I1	–	–
3	I3	I2	I1	–
4	I4	I3	I2	I1
5	I5	I4	I3	I2
6	–	I5	I4	I3
7	–	–	I5	I4
8	–	–	–	I5

Without pipelining, 5 instructions would take 20 clock cycles (4 stages x 5 instructions). With pipelining, the same 5 instructions complete in just 8 cycles. Once the pipeline is full (cycle 4 onwards), one instruction completes every clock cycle.

Why Pipelining Improves Throughput

Each CPU component (fetch unit, decoder, ALU, write-back unit) is kept busy rather than idle
Throughput increases – more instructions completed per unit of time
Individual instruction latency does not decrease (each still takes 4 cycles), but overall program execution is faster

Pipeline Hazards

Pipelining does not always work perfectly. Hazards are situations that prevent the next instruction from executing in the expected clock cycle:

Data Dependency Hazard Instruction 2 needs the result of Instruction 1, but Instruction 1 has not finished executing yet. Example: ADD R1, R2 followed by STORE R1 – the second instruction needs the updated value of R1.

Branch Hazard A conditional branch instruction means the CPU does not know which instruction comes next until the branch condition is evaluated. Instructions fetched speculatively into the pipeline may need to be discarded.

Solutions: CPUs use techniques like forwarding (passing results directly between pipeline stages), stalling (inserting a wait), and branch prediction (guessing which way a branch will go).

Why Multi-Core Processors Help

Even with pipelining, a single core can only execute one instruction stream. Multi-core processors run multiple independent pipelines simultaneously, each processing a separate thread of execution. This is especially beneficial for:

Running multiple applications at the same time
Parallelisable tasks (e.g., rendering different parts of an image)
Server workloads handling many simultaneous requests

Worked Examples

Example 1: Labelling the CPU

Label each component in the CPU diagram below:

+------------------- CPU -------------------+
|  +---------+         +------------------+ |
|  |  (A)    |         |  (B) Registers   | |
|  |         |         |  (C)  (D)  (E)   | |
|  |         |         |  (F)  (G)        | |
|  +----+----+         +--------+---------+ |
|       |                       |           |
|  +----+-----------------------+-----+     |
|  |             (H)                  |     |
|  +----------------------------------+     |
+-------------------+-------------------+---+
                    |
         +----------+----------+
         |  (I)     | (J)      | (K)
         +----------+----------+
                    |
              +-----+-----+
              |    (L)    |
              +-----------+

Answers:

Label	Component	Role
(A)	Control Unit (CU)	Directs operations, sends control signals
(B)	Registers	Fast internal storage
(C)	Program Counter (PC)	Address of next instruction
(D)	Memory Address Register (MAR)	Address being accessed
(E)	Memory Data Register (MDR)	Data being read/written
(F)	Instruction Register (IR)	Current instruction being decoded
(G)	Accumulator (AC)	Stores ALU results
(H)	Arithmetic Logic Unit (ALU)	Performs arithmetic and logic operations
(I)	Address Bus	Carries memory addresses (unidirectional)
(J)	Data Bus	Carries data (bidirectional)
(K)	Control Bus	Carries control signals (bidirectional)
(L)	Memory (RAM)	Stores instructions and data

Example 2: FDE Cycle Trace

Trace the FDE cycle for these three instructions. Assume the initial memory state:

Address	Contents
5	LOAD 20
6	ADD 21
7	STORE 22
20	42
21	8
22	0

PC starts at 5.

Instruction 1: LOAD 20 (load value from address 20 into AC)

Phase	Step	PC	MAR	MDR	IR	AC
Fetch	PC -> MAR	5	5	–	–	–
Fetch	Memory[5] -> MDR	5	5	LOAD 20	–	–
Fetch	MDR -> IR; PC+1	6	5	LOAD 20	LOAD 20	–
Decode	CU reads IR	6	5	LOAD 20	LOAD 20	–
Execute	MAR = 20; Memory[20] -> MDR	6	20	42	LOAD 20	–
Execute	MDR -> AC	6	20	42	LOAD 20	42

Instruction 2: ADD 21 (add value from address 21 to AC)

Phase	Step	PC	MAR	MDR	IR	AC
Fetch	PC -> MAR	6	6	–	–	42
Fetch	Memory[6] -> MDR	6	6	ADD 21	–	42
Fetch	MDR -> IR; PC+1	7	6	ADD 21	ADD 21	42
Decode	CU reads IR	7	6	ADD 21	ADD 21	42
Execute	MAR = 21; Memory[21] -> MDR	7	21	8	ADD 21	42
Execute	AC = AC + MDR	7	21	8	ADD 21	50

Instruction 3: STORE 22 (store AC value to address 22)

Phase	Step	PC	MAR	MDR	IR	AC
Fetch	PC -> MAR	7	7	–	–	50
Fetch	Memory[7] -> MDR	7	7	STORE 22	–	50
Fetch	MDR -> IR; PC+1	8	7	STORE 22	STORE 22	50
Decode	CU reads IR	8	7	STORE 22	STORE 22	50
Execute	MAR = 22; MDR = AC	8	22	50	STORE 22	50
Execute	MDR -> Memory[22]	8	22	50	STORE 22	50

After execution, Memory[22] = 50 (which is 42 + 8).

Quick Code Check

Q1. Which register holds the address of the next instruction to be fetched?

Q2. What is the role of the ALU?

Q3. Which bus is unidirectional?

Q4. What happens during the Fetch phase of the FDE cycle?

Q5. (HL) What is the main benefit of pipelining?

Trace Exercise

Trace the FDE cycle for two instructions. Fill in the register values at each phase.

Initial memory state:

Address	Contents
10	LOAD 20
11	ADD 21
20	42
21	8

PC starts at 10. All other registers start empty (shown as –).

Instruction 1: LOAD 20

Phase	PC	MAR	MDR	IR	AC
Before Fetch	10	--	--	--	--
After Fetch					--
After Execute

Instruction 2: ADD 21

Phase	PC	MAR	MDR	IR	AC
After Fetch
After Execute

Final value in the Accumulator:

Spot the Error

A student describes the Fetch phase of the FDE cycle but gets the register roles wrong. One line contains an error about which register receives the fetched instruction. Click the line with the error, then pick the correct fix.

1FETCH PHASE: 21. The address in the PC is copied to the MAR. 32. The instruction is fetched from memory into the MDR, then copied into the MAR. 43. The PC is incremented by 1.

Pick the correct fix for line 3:

Remember the fetch pathway: PC -> MAR -> Memory -> MDR -> IR. The MAR only ever holds addresses. The MDR is the “middle step” that receives data from memory before it goes to its final destination (IR for instructions, AC for data values).

Fill in the Blanks

Complete the description of the FDE cycle by filling in the correct register names.

The Fetch-Decode-Execute cycle describes how the CPU processes each instruction. Fill in the blanks with the correct register or component names:

FETCH PHASE:
1. The address in the  is copied to the .
2. The instruction at that address is fetched from memory into the .
3. The instruction is then copied from the MDR into the .
4. The  is incremented by 1.

DECODE PHASE:
5. The  interprets the instruction stored in the IR.

EXECUTE PHASE:
6. The  performs the operation.
7. The result is stored in the .

Predict the Output

Given the following initial CPU state and memory contents, the CPU executes two complete FDE cycles. The first instruction loads a value into the accumulator, and the second adds another value to it. What is the final value in the AC after both instructions execute?

PC = 0
Memory:
  Address 0: LOAD 5
  Address 1: ADD 6
  Address 5: 7
  Address 6: 3

AC before execution: 0

Final value in AC:

The CPU processes three instructions in sequence. The first loads a value, the second subtracts another value from it, and the third stores the result in memory. What value is stored at memory address 22 after all three instructions execute?

PC = 10, AC = 15
Memory:
  Address 10: LOAD 20
  Address 11: SUB 21
  Address 12: STORE 22
  Address 20: 42
  Address 21: 12
  Address 22: 0

Value stored at address 22:

Practice Exercises

Core

Register roles – Describe the role of each of the following registers in one sentence each: PC, MAR, MDR, IR, AC.
FDE phases – Describe the three phases of the fetch-decode-execute cycle. For each phase, name the registers involved and explain what happens to them.
Bus types – Name the three types of bus in a computer system. For each bus, state its direction (unidirectional or bidirectional) and what it carries.

Extension

FDE trace – Given the following memory contents and PC = 0, trace all register values through each phase of the FDE cycle for all three instructions:

Address Contents

0 LOAD 10

1 ADD 11

2 STORE 12

10 25

11 17

12 0
Bus analysis – A program loads a value from memory address 100. Explain which buses are used during this operation, what signals travel on each bus, and in which direction.

Address	Contents
0	LOAD 10
1	ADD 11
2	STORE 12
10	25
11	17
12	0

Challenge

Single-core vs multi-core – A program has two tasks: Task A takes 4 seconds and Task B takes 3 seconds. Task B depends on the output of Task A (it cannot start until A finishes). Compare the total execution time on a single-core processor versus a dual-core processor. Then consider a different scenario where Task A and Task B are independent – how does the comparison change?
(HL) Pipeline hazards – Consider this sequence of instructions:
```
I1: ADD R1, R2, R3    (R1 = R2 + R3)
I2: SUB R4, R1, R5    (R4 = R1 - R5)
I3: LOAD R6, [100]    (R6 = Memory[100])
```
Explain why a data dependency hazard occurs between I1 and I2. Describe two techniques the CPU could use to handle this hazard. Would I3 be affected by the same hazard? Why or why not?

Connections

Prerequisites: Logic Gates – gates build the ALU; understanding AND, OR, XOR is essential for understanding how arithmetic works at the hardware level
Related: Primary Memory – the CPU reads instructions and data from memory and writes results back; understanding the memory hierarchy explains why registers are faster than RAM
Related: GPU – a co-processor that works alongside the CPU; the HL comparison (A1.1.3) requires understanding CPU architecture first
Forward: Operating Systems Layer – the OS schedules processes on CPU cores, manages context switching between tasks, and handles interrupts that interact with the FDE cycle

Clock Cycle	Fetch	Decode	Execute	Write-back
1	I1	–	–	–
2	I2	I1	–	–
3	I3	I2	I1	–
4	I4	I3	I2	I1
5	I5	I4	I3	I2
6	–	I5	I4	I3
7	–	–	I5	I4
8	–	–	–	I5

Clock Cycle	Fetch	Decode	Execute	Write-back
1	I1	–	–	–
2	I2	I1	–	–
3	I3	I2	I1	–
4	I4	I3	I2	I1
5	I5	I4	I3	I2
6	–	I5	I4	I3
7	–	–	I5	I4
8	–	–	–	I5

CPU Architecture

Table of Contents

Key Concepts

CPU Components (A1.1.1)

Buses

CPU Architecture Diagram

Single-Core vs Multi-Core Processors

Co-processors

Fetch-Decode-Execute Cycle (A1.1.5)

Phase 1: Fetch

Phase 2: Decode

Phase 3: Execute

Enrichment: Half Adder and Full Adder

Half Adder

Full Adder

Connection to the ALU

Enrichment: Little Man Computer (LMC)

HL Section: Pipelining (A1.1.6)

What is Pipelining?

4-Stage Pipeline

Why Pipelining Improves Throughput

Pipeline Hazards

Why Multi-Core Processors Help

Worked Examples

Example 1: Labelling the CPU

Example 2: FDE Cycle Trace

Quick Code Check

Trace Exercise

Spot the Error

Fill in the Blanks

Predict the Output

Practice Exercises

Core

Extension

Challenge

Connections

Clock Cycle	Fetch	Decode	Execute	Write-back
1	I1	–	–	–
2	I2	I1	–	–
3	I3	I2	I1	–
4	I4	I3	I2	I1
5	I5	I4	I3	I2
6	–	I5	I4	I3
7	–	–	I5	I4
8	–	–	–	I5