Choosing the Right Structure

IB Syllabus: B4.1.1 (HL) – Explain the properties and purpose of ADTs. B4.1.2 (HL) – Evaluate linked lists. B4.1.4 (HL) – Explain the structures and properties of BSTs. B4.1.6 (HL) – Explain the core principles of ADTs: hash tables (hashing functions, collision resolution, load factors), mechanics of sets, HashMap/HashSet in Java.

This page is HL only.


Key Concepts

Choosing the right data structure is one of the most important decisions in programming. Each structure has strengths and weaknesses – there is no single “best” structure. The right choice depends on what operations you need and how often you perform them.

The Big Comparison Table

Structure Search Insert Delete Sorted? Duplicates? Memory Best for
Array O(n) O(n)* O(n)* No (unless sorted) Yes Fixed, compact Fixed-size collections, index-based access
ArrayList O(n) O(1) amortized (end) O(n) No Yes Dynamic, some overhead Growing lists, frequent appends
Singly linked list O(n) O(1) at head O(n)** No Yes Extra pointer per node Frequent insert/remove at front
Doubly linked list O(n) O(1) at head/tail O(1) with ref** No Yes Two pointers per node Frequent insert/remove at both ends
Stack O(n) O(1) push O(1) pop No Yes Minimal LIFO: undo, backtracking, brackets
Queue O(n) O(1) enqueue O(1) dequeue No Yes Minimal FIFO: scheduling, BFS, buffers
BST (balanced) O(log n) O(log n) O(log n) Yes (in-order) No Two pointers per node Fast search + sorted output
HashSet O(1) avg O(1) avg O(1) avg No No Hash table overhead Membership testing, deduplication
HashMap O(1) avg by key O(1) avg O(1) avg No Keys: no, Values: yes Hash table overhead Key-value lookup, counting

* Array insert/delete at a specific position requires shifting elements. ** Linked list delete is O(1) if you already have a reference to the node, but O(n) to find it first.

BST performance depends on balance. A skewed BST degrades to O(n) for all operations – the same as a linked list. The O(log n) figures assume a reasonably balanced tree.

Decision Flowchart

Use these questions to narrow down your choice:

1. Do you need key-value pairs?

  • Yes – HashMap

2. Do you need only unique elements?

  • Yes, and order does not matter – HashSet
  • Yes, and you need sorted order – BST

3. Do you need LIFO (last in, first out) access?

  • Yes – Stack

4. Do you need FIFO (first in, first out) access?

  • Yes – Queue

5. Do you need fast search?

  • By key/value in O(1) – HashSet / HashMap
  • In sorted order, O(log n) – BST
  • By index in O(1) – Array / ArrayList

6. Do you need frequent insertion/deletion?

  • At the front – Linked list
  • At both ends – Doubly linked list
  • At the end – ArrayList
  • In sorted position – Ordered linked list or BST

7. Is the size known and fixed?

  • Yes – Array
  • No – ArrayList, linked list, or BST

Common Trade-offs

Trade-off Explanation
Speed vs memory Hash tables give O(1) lookup but waste memory on empty buckets. Linked lists use memory per element but need extra pointers.
Sorted vs fast BSTs maintain sorted order but O(log n) operations. HashSets give O(1) but no order.
Fixed vs dynamic Arrays are fast and compact but cannot resize. ArrayList and linked lists grow as needed.
Simple vs flexible Stacks and queues are restrictive by design – that is their strength. Restricting operations prevents misuse.

Worked Examples

Example 1: Choosing a Structure for a Task

For each scenario, identify the best data structure and explain why.

Scenario A: Browser history (back button)

  • Need: add pages as you visit them, go back to the previous page
  • Pattern: last in, first out
  • Best choice: Stack – push each page, pop to go back

Scenario B: Print job queue

  • Need: add jobs as they arrive, process in arrival order
  • Pattern: first in, first out
  • Best choice: Queue – enqueue new jobs, dequeue to process

Scenario C: Spell checker dictionary

  • Need: check if a word exists in the dictionary, very fast lookup
  • Pattern: membership testing, no order needed
  • Best choice: HashSet – O(1) contains, no duplicates

Scenario D: Student grade book

  • Need: look up a student’s grade by name
  • Pattern: key (name) to value (grade) mapping
  • Best choice: HashMap – O(1) lookup by key

Scenario E: Music playlist sorted by song title

  • Need: keep songs sorted, add new songs in the right position, display in order
  • Pattern: sorted collection with insertion
  • Best choice: BST – O(log n) insert in sorted position, in-order traversal for display

Scenario F: Tracking the last 100 sensor readings

  • Need: fixed-size buffer, fast access by index
  • Pattern: fixed-size, index-based access
  • Best choice: Array – known size, O(1) index access

Example 2: Performance Comparison

This example measures how many operations different structures need for common tasks on a collection of 1000 elements.

Task Array LinkedList BST (balanced) HashSet
Find element 500 avg (linear) 500 avg (linear) 10 avg (log2 1000) 1 avg (hash)
Insert at end 1 (if space) 1 (with tail ref) 10 avg 1 avg
Insert in sorted position 500 avg (shift) 500 avg (traverse) 10 avg N/A (unordered)
Delete specific element 500 avg (shift) 500 avg (traverse) 10 avg 1 avg
Get sorted output n log n (sort first) n log n (sort first) n (in-order traversal) n log n (sort first)

O(1) does not mean “instant.” It means the time does not grow with the input size. A hash table lookup might be slower than an array index access for small collections, but it scales much better for large ones.

Example 3: Wrong Structure, Right Structure

import java.util.ArrayList;
import java.util.HashSet;

public class DuplicateCheck {
    // SLOW approach: ArrayList contains() is O(n) per check
    public static boolean hasDuplicatesSlow(String[] words) {
        ArrayList<String> seen = new ArrayList<>();
        for (String w : words) {
            if (seen.contains(w)) {    // O(n) each time
                return true;
            }
            seen.add(w);
        }
        return false;
    }

    // FAST approach: HashSet contains() is O(1) per check
    public static boolean hasDuplicatesFast(String[] words) {
        HashSet<String> seen = new HashSet<>();
        for (String w : words) {
            if (seen.contains(w)) {    // O(1) each time
                return true;
            }
            seen.add(w);
        }
        return false;
    }

    public static void main(String[] args) {
        String[] words = {"apple", "banana", "cherry", "date", "banana"};
        System.out.println("Slow: " + hasDuplicatesSlow(words));
        System.out.println("Fast: " + hasDuplicatesFast(words));
    }
}

Output:

Slow: true
Fast: true

Both methods give the same answer, but the ArrayList approach is O(n^2) overall (n words, each checking up to n previous words). The HashSet approach is O(n) overall (n words, each checking in O(1)). For 10,000 words, the ArrayList approach would need up to 50,000,000 comparisons; the HashSet approach needs about 10,000.


Quick Code Check

Q1. Which structure is best for looking up a student's grade by their name?

Q2. An application needs to support an "undo" feature that reverses the most recent action first. Which structure is best?

Q3. A spell checker needs to test whether a word exists in a dictionary of 1 million words. Which gives the fastest lookup?

Q4. You need a structure that supports fast search, fast insertion, AND can produce sorted output. Which is best?

Q5. A printer processes jobs in the order they were submitted. Which structure models this?


Trace Exercise

For each scenario, fill in the best data structure and its key advantage.

ScenarioBest structureKey advantage
Remove duplicate email addresses from a list
Process customer service tickets in arrival order
Count how many times each word appears in a book
Navigate browser back/forward history
Store a phone book sorted by name with fast lookup

Fill in the time complexity for each operation on each structure.

OperationArrayLinkedListBST (balanced)HashSet
Search
Insert O(log n)
Delete (by value)

Code Completion

Complete the code that chooses the right structure for a duplicate-detection task.

Check if an array contains any duplicate values using the fastest approach:

public static boolean hasDuplicates(int[] arr) {
    <Integer> seen = new HashSet<>();
    for (int val : arr) {
        if (seen.(val)) {
            return true;
        }
        seen.(val);
    }
    return false;
}

Complete the code that uses a HashMap to count character frequencies.

Count how many times each character appears in a string:

HashMap<Character, Integer> freq = new HashMap<>();
for (int i = 0; i < text.length(); i++) {
    char c = text.charAt(i);
    if (freq.(c)) {
        freq.put(c, freq.get(c) + );
    } else {
        freq.(c, 1);
    }
}

Spot the Bug

This duplicate checker works correctly but is far too slow for large arrays (takes minutes for 100,000 elements). The bug is a poor structure choice:

1public static boolean hasDuplicates(int[] arr) { 2 ArrayList<Integer> seen = new ArrayList<>(); 3 for (int val : arr) { 4 if (seen.contains(val)) return true; 5 seen.add(val); 6 } 7 return false; 8}

Pick the correct fix:


This print queue processes jobs in the wrong order -- the last job submitted prints first instead of the first job:

1// Print queue: should process first-come, first-served 2Stack<String> printJobs = new Stack<>(); 3printJobs.push("Report.pdf"); 4printJobs.push("Photo.jpg"); 5String next = printJobs.pop(); // gets Photo.jpg, not Report.pdf

Pick the correct fix:


Output Prediction

What does this code print?

HashSet<String> set = new HashSet<>();
set.add("A");
set.add("B");
set.add("C");
set.add("A");
set.add("B");
System.out.println(set.size());
System.out.println(set.contains("D"));

Type the exact output:


What does this code print?

Stack<String> stack = new Stack<>();
stack.push("Doc1");
stack.push("Doc2");
stack.push("Doc3");
System.out.println(stack.pop());
System.out.println(stack.pop());
System.out.println(stack.size());

Type the exact output:


Practice Exercises

Core

  1. Match the structure – For each scenario, state the best data structure and give one reason: (a) storing usernames that must be unique, (b) processing hospital patients in arrival order, (c) mapping country codes to country names, (d) implementing the back button in a web browser.
  2. Complexity table – Copy and complete this table from memory (without looking at notes): Search, Insert, and Delete complexity for Array, LinkedList, BST (balanced), and HashSet.
  3. Justify the choice – A teacher stores student marks in an ArrayList<Integer>. They want to quickly check if a specific mark (e.g., 95) exists. Explain why this is slow and suggest a better structure.

Extension

  1. Refactor for performance – You are given code that uses a LinkedList to detect duplicates in a 10,000-element array (O(n^2)). Rewrite it using a HashSet to achieve O(n). Measure the improvement by counting the number of contains calls in each version.
  2. Multi-structure solution – A school administration system needs to: (a) look up any student by ID (fast), (b) display all students in alphabetical order, (c) process enrolment applications in FIFO order. State which structure you would use for each requirement and explain why no single structure handles all three.
  3. BST vs HashMap – Insert 1000 random integers into both a BST and a HashSet. Then search for 100 random values. Compare the number of comparisons. When would you prefer a BST over a HashSet?

Challenge

  1. Design a library system – A library needs to: track all books (with title, author, ISBN), look up books by ISBN, find all books by a specific author, and process hold requests in FIFO order. Design the data structures for this system. State which structure you would use for each requirement and justify your choices.
  2. Exam-style question – “A company stores employee records. The system must support: (i) adding new employees, (ii) finding an employee by ID, (iii) listing all employees sorted by name. Evaluate the use of an array, a linked list, a BST, and a HashMap for this system. For each structure, state one advantage and one disadvantage. Recommend a combination of structures that would satisfy all three requirements.” (8 marks)

Connections


© EduCS.me — A resource hub for Computer Science education

This site uses Just the Docs, a documentation theme for Jekyll.