Choosing the Right Structure
IB Syllabus: B4.1.1 (HL) – Explain the properties and purpose of ADTs. B4.1.2 (HL) – Evaluate linked lists. B4.1.4 (HL) – Explain the structures and properties of BSTs. B4.1.6 (HL) – Explain the core principles of ADTs: hash tables (hashing functions, collision resolution, load factors), mechanics of sets, HashMap/HashSet in Java.
This page is HL only.
Key Concepts
Choosing the right data structure is one of the most important decisions in programming. Each structure has strengths and weaknesses – there is no single “best” structure. The right choice depends on what operations you need and how often you perform them.
The Big Comparison Table
| Structure | Search | Insert | Delete | Sorted? | Duplicates? | Memory | Best for |
|---|---|---|---|---|---|---|---|
| Array | O(n) | O(n)* | O(n)* | No (unless sorted) | Yes | Fixed, compact | Fixed-size collections, index-based access |
| ArrayList | O(n) | O(1) amortized (end) | O(n) | No | Yes | Dynamic, some overhead | Growing lists, frequent appends |
| Singly linked list | O(n) | O(1) at head | O(n)** | No | Yes | Extra pointer per node | Frequent insert/remove at front |
| Doubly linked list | O(n) | O(1) at head/tail | O(1) with ref** | No | Yes | Two pointers per node | Frequent insert/remove at both ends |
| Stack | O(n) | O(1) push | O(1) pop | No | Yes | Minimal | LIFO: undo, backtracking, brackets |
| Queue | O(n) | O(1) enqueue | O(1) dequeue | No | Yes | Minimal | FIFO: scheduling, BFS, buffers |
| BST (balanced) | O(log n) | O(log n) | O(log n) | Yes (in-order) | No | Two pointers per node | Fast search + sorted output |
| HashSet | O(1) avg | O(1) avg | O(1) avg | No | No | Hash table overhead | Membership testing, deduplication |
| HashMap | O(1) avg by key | O(1) avg | O(1) avg | No | Keys: no, Values: yes | Hash table overhead | Key-value lookup, counting |
* Array insert/delete at a specific position requires shifting elements. ** Linked list delete is O(1) if you already have a reference to the node, but O(n) to find it first.
BST performance depends on balance. A skewed BST degrades to O(n) for all operations – the same as a linked list. The O(log n) figures assume a reasonably balanced tree.
Decision Flowchart
Use these questions to narrow down your choice:
1. Do you need key-value pairs?
- Yes – HashMap
2. Do you need only unique elements?
- Yes, and order does not matter – HashSet
- Yes, and you need sorted order – BST
3. Do you need LIFO (last in, first out) access?
- Yes – Stack
4. Do you need FIFO (first in, first out) access?
- Yes – Queue
5. Do you need fast search?
- By key/value in O(1) – HashSet / HashMap
- In sorted order, O(log n) – BST
- By index in O(1) – Array / ArrayList
6. Do you need frequent insertion/deletion?
- At the front – Linked list
- At both ends – Doubly linked list
- At the end – ArrayList
- In sorted position – Ordered linked list or BST
7. Is the size known and fixed?
- Yes – Array
- No – ArrayList, linked list, or BST
Common Trade-offs
| Trade-off | Explanation |
|---|---|
| Speed vs memory | Hash tables give O(1) lookup but waste memory on empty buckets. Linked lists use memory per element but need extra pointers. |
| Sorted vs fast | BSTs maintain sorted order but O(log n) operations. HashSets give O(1) but no order. |
| Fixed vs dynamic | Arrays are fast and compact but cannot resize. ArrayList and linked lists grow as needed. |
| Simple vs flexible | Stacks and queues are restrictive by design – that is their strength. Restricting operations prevents misuse. |
Worked Examples
Example 1: Choosing a Structure for a Task
For each scenario, identify the best data structure and explain why.
Scenario A: Browser history (back button)
- Need: add pages as you visit them, go back to the previous page
- Pattern: last in, first out
- Best choice: Stack – push each page, pop to go back
Scenario B: Print job queue
- Need: add jobs as they arrive, process in arrival order
- Pattern: first in, first out
- Best choice: Queue – enqueue new jobs, dequeue to process
Scenario C: Spell checker dictionary
- Need: check if a word exists in the dictionary, very fast lookup
- Pattern: membership testing, no order needed
- Best choice: HashSet – O(1) contains, no duplicates
Scenario D: Student grade book
- Need: look up a student’s grade by name
- Pattern: key (name) to value (grade) mapping
- Best choice: HashMap – O(1) lookup by key
Scenario E: Music playlist sorted by song title
- Need: keep songs sorted, add new songs in the right position, display in order
- Pattern: sorted collection with insertion
- Best choice: BST – O(log n) insert in sorted position, in-order traversal for display
Scenario F: Tracking the last 100 sensor readings
- Need: fixed-size buffer, fast access by index
- Pattern: fixed-size, index-based access
- Best choice: Array – known size, O(1) index access
Example 2: Performance Comparison
This example measures how many operations different structures need for common tasks on a collection of 1000 elements.
| Task | Array | LinkedList | BST (balanced) | HashSet |
|---|---|---|---|---|
| Find element | 500 avg (linear) | 500 avg (linear) | 10 avg (log2 1000) | 1 avg (hash) |
| Insert at end | 1 (if space) | 1 (with tail ref) | 10 avg | 1 avg |
| Insert in sorted position | 500 avg (shift) | 500 avg (traverse) | 10 avg | N/A (unordered) |
| Delete specific element | 500 avg (shift) | 500 avg (traverse) | 10 avg | 1 avg |
| Get sorted output | n log n (sort first) | n log n (sort first) | n (in-order traversal) | n log n (sort first) |
O(1) does not mean “instant.” It means the time does not grow with the input size. A hash table lookup might be slower than an array index access for small collections, but it scales much better for large ones.
Example 3: Wrong Structure, Right Structure
import java.util.ArrayList;
import java.util.HashSet;
public class DuplicateCheck {
// SLOW approach: ArrayList contains() is O(n) per check
public static boolean hasDuplicatesSlow(String[] words) {
ArrayList<String> seen = new ArrayList<>();
for (String w : words) {
if (seen.contains(w)) { // O(n) each time
return true;
}
seen.add(w);
}
return false;
}
// FAST approach: HashSet contains() is O(1) per check
public static boolean hasDuplicatesFast(String[] words) {
HashSet<String> seen = new HashSet<>();
for (String w : words) {
if (seen.contains(w)) { // O(1) each time
return true;
}
seen.add(w);
}
return false;
}
public static void main(String[] args) {
String[] words = {"apple", "banana", "cherry", "date", "banana"};
System.out.println("Slow: " + hasDuplicatesSlow(words));
System.out.println("Fast: " + hasDuplicatesFast(words));
}
}
Output:
Slow: true
Fast: true
Both methods give the same answer, but the ArrayList approach is O(n^2) overall (n words, each checking up to n previous words). The HashSet approach is O(n) overall (n words, each checking in O(1)). For 10,000 words, the ArrayList approach would need up to 50,000,000 comparisons; the HashSet approach needs about 10,000.
Quick Code Check
Q1. Which structure is best for looking up a student's grade by their name?
Q2. An application needs to support an "undo" feature that reverses the most recent action first. Which structure is best?
Q3. A spell checker needs to test whether a word exists in a dictionary of 1 million words. Which gives the fastest lookup?
Q4. You need a structure that supports fast search, fast insertion, AND can produce sorted output. Which is best?
Q5. A printer processes jobs in the order they were submitted. Which structure models this?
Trace Exercise
For each scenario, fill in the best data structure and its key advantage.
| Scenario | Best structure | Key advantage |
|---|---|---|
| Remove duplicate email addresses from a list | ||
| Process customer service tickets in arrival order | ||
| Count how many times each word appears in a book | ||
| Navigate browser back/forward history | ||
| Store a phone book sorted by name with fast lookup |
Fill in the time complexity for each operation on each structure.
| Operation | Array | LinkedList | BST (balanced) | HashSet |
|---|---|---|---|---|
| Search | ||||
| Insert | O(log n) | |||
| Delete (by value) |
Code Completion
Complete the code that chooses the right structure for a duplicate-detection task.
Check if an array contains any duplicate values using the fastest approach:
public static boolean hasDuplicates(int[] arr) {
<Integer> seen = new HashSet<>();
for (int val : arr) {
if (seen.(val)) {
return true;
}
seen.(val);
}
return false;
}
Complete the code that uses a HashMap to count character frequencies.
Count how many times each character appears in a string:
HashMap<Character, Integer> freq = new HashMap<>();
for (int i = 0; i < text.length(); i++) {
char c = text.charAt(i);
if (freq.(c)) {
freq.put(c, freq.get(c) + );
} else {
freq.(c, 1);
}
}
Spot the Bug
This duplicate checker works correctly but is far too slow for large arrays (takes minutes for 100,000 elements). The bug is a poor structure choice:
Pick the correct fix:
This print queue processes jobs in the wrong order -- the last job submitted prints first instead of the first job:
Pick the correct fix:
Output Prediction
What does this code print?
HashSet<String> set = new HashSet<>();
set.add("A");
set.add("B");
set.add("C");
set.add("A");
set.add("B");
System.out.println(set.size());
System.out.println(set.contains("D"));Type the exact output:
What does this code print?
Stack<String> stack = new Stack<>();
stack.push("Doc1");
stack.push("Doc2");
stack.push("Doc3");
System.out.println(stack.pop());
System.out.println(stack.pop());
System.out.println(stack.size());Type the exact output:
Practice Exercises
Core
- Match the structure – For each scenario, state the best data structure and give one reason: (a) storing usernames that must be unique, (b) processing hospital patients in arrival order, (c) mapping country codes to country names, (d) implementing the back button in a web browser.
- Complexity table – Copy and complete this table from memory (without looking at notes): Search, Insert, and Delete complexity for Array, LinkedList, BST (balanced), and HashSet.
- Justify the choice – A teacher stores student marks in an
ArrayList<Integer>. They want to quickly check if a specific mark (e.g., 95) exists. Explain why this is slow and suggest a better structure.
Extension
- Refactor for performance – You are given code that uses a
LinkedListto detect duplicates in a 10,000-element array (O(n^2)). Rewrite it using aHashSetto achieve O(n). Measure the improvement by counting the number ofcontainscalls in each version. - Multi-structure solution – A school administration system needs to: (a) look up any student by ID (fast), (b) display all students in alphabetical order, (c) process enrolment applications in FIFO order. State which structure you would use for each requirement and explain why no single structure handles all three.
- BST vs HashMap – Insert 1000 random integers into both a BST and a HashSet. Then search for 100 random values. Compare the number of comparisons. When would you prefer a BST over a HashSet?
Challenge
- Design a library system – A library needs to: track all books (with title, author, ISBN), look up books by ISBN, find all books by a specific author, and process hold requests in FIFO order. Design the data structures for this system. State which structure you would use for each requirement and justify your choices.
- Exam-style question – “A company stores employee records. The system must support: (i) adding new employees, (ii) finding an employee by ID, (iii) listing all employees sorted by name. Evaluate the use of an array, a linked list, a BST, and a HashMap for this system. For each structure, state one advantage and one disadvantage. Recommend a combination of structures that would satisfy all three requirements.” (8 marks)
Connections
- Prerequisites: All ADT pages – this page synthesises knowledge from the entire section
- What Are ADTs? – interface vs implementation
- Linked Lists, Ordered Linked Lists, Doubly Linked Lists
- Stacks from Scratch, Queues from Scratch
- Binary Search Trees, BST Operations
- Sets and HashMaps
- Related: Stacks and Queues (B2.2 – using built-in classes)
- Related: Searching Algorithms – performance analysis connects to structure choice
- Related: Big O Notation – the framework for comparing structures