Choosing the Right Structure

IB Syllabus: B4.1.1 (HL) – Explain the properties and purpose of ADTs. B4.1.2 (HL) – Evaluate linked lists. B4.1.4 (HL) – Explain the structures and properties of BSTs. B4.1.6 (HL) – Explain the core principles of ADTs: hash tables (hashing functions, collision resolution, load factors), mechanics of sets, HashMap/HashSet in Java.

This page is HL only.

Key Concepts

Choosing the right data structure is one of the most important decisions in programming. Each structure has strengths and weaknesses – there is no single “best” structure. The right choice depends on what operations you need and how often you perform them.

The Big Comparison Table

Structure	Search	Insert	Delete	Sorted?	Duplicates?	Memory	Best for
Array	O(n)	O(n)*	O(n)*	No (unless sorted)	Yes	Fixed, compact	Fixed-size collections, index-based access
ArrayList	O(n)	O(1) amortized (end)	O(n)	No	Yes	Dynamic, some overhead	Growing lists, frequent appends
Singly linked list	O(n)	O(1) at head	O(n)**	No	Yes	Extra pointer per node	Frequent insert/remove at front
Doubly linked list	O(n)	O(1) at head/tail	O(1) with ref**	No	Yes	Two pointers per node	Frequent insert/remove at both ends
Stack	O(n)	O(1) push	O(1) pop	No	Yes	Minimal	LIFO: undo, backtracking, brackets
Queue	O(n)	O(1) enqueue	O(1) dequeue	No	Yes	Minimal	FIFO: scheduling, BFS, buffers
BST (balanced)	O(log n)	O(log n)	O(log n)	Yes (in-order)	No	Two pointers per node	Fast search + sorted output
HashSet	O(1) avg	O(1) avg	O(1) avg	No	No	Hash table overhead	Membership testing, deduplication
HashMap	O(1) avg by key	O(1) avg	O(1) avg	No	Keys: no, Values: yes	Hash table overhead	Key-value lookup, counting

* Array insert/delete at a specific position requires shifting elements. ** Linked list delete is O(1) if you already have a reference to the node, but O(n) to find it first.

BST performance depends on balance. A skewed BST degrades to O(n) for all operations – the same as a linked list. The O(log n) figures assume a reasonably balanced tree.

Decision Flowchart

Use these questions to narrow down your choice:

1. Do you need key-value pairs?

Yes – HashMap

2. Do you need only unique elements?

Yes, and order does not matter – HashSet
Yes, and you need sorted order – BST

3. Do you need LIFO (last in, first out) access?

Yes – Stack

4. Do you need FIFO (first in, first out) access?

Yes – Queue

5. Do you need fast search?

By key/value in O(1) – HashSet / HashMap
In sorted order, O(log n) – BST
By index in O(1) – Array / ArrayList

6. Do you need frequent insertion/deletion?

At the front – Linked list
At both ends – Doubly linked list
At the end – ArrayList
In sorted position – Ordered linked list or BST

7. Is the size known and fixed?

Yes – Array
No – ArrayList, linked list, or BST

Common Trade-offs

Trade-off	Explanation
Speed vs memory	Hash tables give O(1) lookup but waste memory on empty buckets. Linked lists use memory per element but need extra pointers.
Sorted vs fast	BSTs maintain sorted order but O(log n) operations. HashSets give O(1) but no order.
Fixed vs dynamic	Arrays are fast and compact but cannot resize. ArrayList and linked lists grow as needed.
Simple vs flexible	Stacks and queues are restrictive by design – that is their strength. Restricting operations prevents misuse.

Worked Examples

Example 1: Choosing a Structure for a Task

For each scenario, identify the best data structure and explain why.

Scenario A: Browser history (back button)

Need: add pages as you visit them, go back to the previous page
Pattern: last in, first out
Best choice: Stack – push each page, pop to go back

Scenario B: Print job queue

Need: add jobs as they arrive, process in arrival order
Pattern: first in, first out
Best choice: Queue – enqueue new jobs, dequeue to process

Scenario C: Spell checker dictionary

Need: check if a word exists in the dictionary, very fast lookup
Pattern: membership testing, no order needed
Best choice: HashSet – O(1) contains, no duplicates

Scenario D: Student grade book

Need: look up a student’s grade by name
Pattern: key (name) to value (grade) mapping
Best choice: HashMap – O(1) lookup by key

Scenario E: Music playlist sorted by song title

Need: keep songs sorted, add new songs in the right position, display in order
Pattern: sorted collection with insertion
Best choice: BST – O(log n) insert in sorted position, in-order traversal for display

Scenario F: Tracking the last 100 sensor readings

Need: fixed-size buffer, fast access by index
Pattern: fixed-size, index-based access
Best choice: Array – known size, O(1) index access

Example 2: Performance Comparison

This example measures how many operations different structures need for common tasks on a collection of 1000 elements.

Task	Array	LinkedList	BST (balanced)	HashSet
Find element	500 avg (linear)	500 avg (linear)	10 avg (log2 1000)	1 avg (hash)
Insert at end	1 (if space)	1 (with tail ref)	10 avg	1 avg
Insert in sorted position	500 avg (shift)	500 avg (traverse)	10 avg	N/A (unordered)
Delete specific element	500 avg (shift)	500 avg (traverse)	10 avg	1 avg
Get sorted output	n log n (sort first)	n log n (sort first)	n (in-order traversal)	n log n (sort first)

O(1) does not mean “instant.” It means the time does not grow with the input size. A hash table lookup might be slower than an array index access for small collections, but it scales much better for large ones.

Example 3: Wrong Structure, Right Structure

import java.util.ArrayList;
import java.util.HashSet;

public class DuplicateCheck {
    // SLOW approach: ArrayList contains() is O(n) per check
    public static boolean hasDuplicatesSlow(String[] words) {
        ArrayList<String> seen = new ArrayList<>();
        for (String w : words) {
            if (seen.contains(w)) {    // O(n) each time
                return true;
            }
            seen.add(w);
        }
        return false;
    }

    // FAST approach: HashSet contains() is O(1) per check
    public static boolean hasDuplicatesFast(String[] words) {
        HashSet<String> seen = new HashSet<>();
        for (String w : words) {
            if (seen.contains(w)) {    // O(1) each time
                return true;
            }
            seen.add(w);
        }
        return false;
    }

    public static void main(String[] args) {
        String[] words = {"apple", "banana", "cherry", "date", "banana"};
        System.out.println("Slow: " + hasDuplicatesSlow(words));
        System.out.println("Fast: " + hasDuplicatesFast(words));
    }
}

# SLOW approach: "in" on a plain list is O(n) per check (like ArrayList contains)
def has_duplicates_slow(words):
    seen = []
    for w in words:
        if w in seen:     # O(n) each time
            return True
        seen.append(w)
    return False

# FAST approach: "in" on a set is O(1) per check (Python's set is like HashSet)
def has_duplicates_fast(words):
    seen = set()
    for w in words:
        if w in seen:     # O(1) each time
            return True
        seen.add(w)
    return False

words = ["apple", "banana", "cherry", "date", "banana"]
print("Slow:", has_duplicates_slow(words))    # Python prints True/False with a capital letter
print("Fast:", has_duplicates_fast(words))

Output:

Slow: true
Fast: true

Both methods give the same answer, but the ArrayList approach is O(n^2) overall (n words, each checking up to n previous words). The HashSet approach is O(n) overall (n words, each checking in O(1)). For 10,000 words, the ArrayList approach would need up to 50,000,000 comparisons; the HashSet approach needs about 10,000.

Quick Code Check

Q1. Which structure is best for looking up a student's grade by their name?

Q2. An application needs to support an "undo" feature that reverses the most recent action first. Which structure is best?

Q3. A spell checker needs to test whether a word exists in a dictionary of 1 million words. Which gives the fastest lookup?

Q4. You need a structure that supports fast search, fast insertion, AND can produce sorted output. Which is best?

Q5. A printer processes jobs in the order they were submitted. Which structure models this?

Trace Exercise

For each scenario, fill in the best data structure and its key advantage.

Scenario	Best structure	Key advantage
Remove duplicate email addresses from a list
Process customer service tickets in arrival order
Count how many times each word appears in a book
Navigate browser back/forward history
Store a phone book sorted by name with fast lookup

Fill in the time complexity for each operation on each structure.

Operation	Array	LinkedList	BST (balanced)	HashSet
Search
Insert			O(log n)
Delete (by value)

Code Completion

Complete the code that chooses the right structure for a duplicate-detection task.

Check if an array contains any duplicate values using the fastest approach:

public static boolean hasDuplicates(int[] arr) {
    <Integer> seen = new HashSet<>();
    for (int val : arr) {
        if (seen.(val)) {
            return true;
        }
        seen.(val);
    }
    return false;
}

Complete the code that uses a HashMap to count character frequencies.

Count how many times each character appears in a string:

HashMap<Character, Integer> freq = new HashMap<>();
for (int i = 0; i < text.length(); i++) {
    char c = text.charAt(i);
    if (freq.(c)) {
        freq.put(c, freq.get(c) + );
    } else {
        freq.(c, 1);
    }
}

Spot the Bug

This duplicate checker works correctly but is far too slow for large arrays (takes minutes for 100,000 elements). The bug is a poor structure choice:

1public static boolean hasDuplicates(int[] arr) { 2 ArrayList<Integer> seen = new ArrayList<>(); 3 for (int val : arr) { 4 if (seen.contains(val)) return true; 5 seen.add(val); 6 } 7 return false; 8}

Pick the correct fix:

This print queue processes jobs in the wrong order -- the last job submitted prints first instead of the first job:

1// Print queue: should process first-come, first-served 2Stack<String> printJobs = new Stack<>(); 3printJobs.push("Report.pdf"); 4printJobs.push("Photo.jpg"); 5String next = printJobs.pop(); // gets Photo.jpg, not Report.pdf

Pick the correct fix:

Output Prediction

What does this code print?

HashSet<String> set = new HashSet<>();
set.add("A");
set.add("B");
set.add("C");
set.add("A");
set.add("B");
System.out.println(set.size());
System.out.println(set.contains("D"));

Type the exact output:

What does this code print?

Stack<String> stack = new Stack<>();
stack.push("Doc1");
stack.push("Doc2");
stack.push("Doc3");
System.out.println(stack.pop());
System.out.println(stack.pop());
System.out.println(stack.size());

Type the exact output:

Practice Exercises

Core

Match the structure – For each scenario, state the best data structure and give one reason: (a) storing usernames that must be unique, (b) processing hospital patients in arrival order, (c) mapping country codes to country names, (d) implementing the back button in a web browser.
Complexity table – Copy and complete this table from memory (without looking at notes): Search, Insert, and Delete complexity for Array, LinkedList, BST (balanced), and HashSet.
Justify the choice – A teacher stores student marks in an ArrayList<Integer>. They want to quickly check if a specific mark (e.g., 95) exists. Explain why this is slow and suggest a better structure.

Extension

Refactor for performance – You are given code that uses a LinkedList to detect duplicates in a 10,000-element array (O(n^2)). Rewrite it using a HashSet to achieve O(n). Measure the improvement by counting the number of contains calls in each version.
Multi-structure solution – A school administration system needs to: (a) look up any student by ID (fast), (b) display all students in alphabetical order, (c) process enrolment applications in FIFO order. State which structure you would use for each requirement and explain why no single structure handles all three.
BST vs HashMap – Insert 1000 random integers into both a BST and a HashSet. Then search for 100 random values. Compare the number of comparisons. When would you prefer a BST over a HashSet?

Challenge

Design a library system – A library needs to: track all books (with title, author, ISBN), look up books by ISBN, find all books by a specific author, and process hold requests in FIFO order. Design the data structures for this system. State which structure you would use for each requirement and justify your choices.
Exam-style question – “A company stores employee records. The system must support: (i) adding new employees, (ii) finding an employee by ID, (iii) listing all employees sorted by name. Evaluate the use of an array, a linked list, a BST, and a HashMap for this system. For each structure, state one advantage and one disadvantage. Recommend a combination of structures that would satisfy all three requirements.” (8 marks)

Connections

Prerequisites: All ADT pages – this page synthesises knowledge from the entire section
- What Are ADTs? – interface vs implementation
- Linked Lists, Ordered Linked Lists, Doubly Linked Lists
- Stacks from Scratch, Queues from Scratch
- Binary Search Trees, BST Operations
- Sets and HashMaps
Related: Stacks and Queues (B2.2 – using built-in classes)
Related: Searching Algorithms – performance analysis connects to structure choice
Related: Big O Notation – the framework for comparing structures