Sets and HashMaps

IB Syllabus: B4.1.5 (HL) – Construct and apply sets as an ADT: unordered, unique elements; union, intersection, difference; check membership, add, remove, subset/superset. B4.1.6 (HL) – Explain the core principles of ADTs: hash tables (hashing functions, collision resolution, load factors); mechanics of sets; HashMap/HashSet in Java.

This page is HL only.


Key Concepts

So far you have built data structures where finding a value requires traversal – linked lists need O(n), and even balanced BSTs need O(log n). Hash-based structures take a fundamentally different approach: they compute the storage location directly from the value, giving O(1) average-case lookup.

Sets

A set is an unordered collection of unique elements. Adding a duplicate has no effect.

Operation Java HashSet method Description
Add add(element) Insert element (ignored if already present)
Remove remove(element) Delete element
Contains contains(element) Check membership – returns true/false
Size size() Number of elements
Union addAll(otherSet) All elements from both sets
Intersection retainAll(otherSet) Only elements in both sets
Difference removeAll(otherSet) Elements in this set but not in the other

retainAll and removeAll modify the original set. If you need to keep the original, copy it first: HashSet<String> result = new HashSet<>(setA);

HashMaps

A HashMap stores key-value pairs. Each key maps to exactly one value. Keys must be unique; values can repeat.

Operation Java HashMap method Description
Put put(key, value) Add or update a key-value pair
Get get(key) Retrieve the value for a key (returns null if absent)
Contains key containsKey(key) Check if a key exists
Contains value containsValue(value) Check if a value exists (slow – O(n))
Remove remove(key) Delete a key-value pair
Size size() Number of key-value pairs
Keys keySet() Returns a Set of all keys

How Hashing Works

Both HashSet and HashMap use a hash table internally. The idea:

  1. A hash function converts a key into an integer (the hash code)
  2. The hash code is mapped to a bucket index (usually hashCode % tableSize)
  3. The element is stored in that bucket
Key: "Alice"  →  hashCode: 63650101  →  bucket: 63650101 % 10 = 1
Key: "Bob"    →  hashCode: 66965     →  bucket: 66965 % 10 = 5
Key: "Carol"  →  hashCode: 65184341  →  bucket: 65184341 % 10 = 1  ← collision!

Collision Resolution

A collision occurs when two different keys hash to the same bucket. Two common strategies:

Strategy How it works Trade-off
Chaining Each bucket holds a linked list of entries Simple, handles many collisions, but uses extra memory
Open addressing If bucket is full, probe the next empty bucket No extra memory, but performance degrades as table fills

Java’s HashMap uses chaining (linked lists in each bucket; switches to a tree when a chain gets long).

Load Factor

The load factor is the ratio of entries to buckets:

load factor = number of entries / number of buckets

When the load factor exceeds a threshold (Java default: 0.75), the table resizes (doubles its capacity and rehashes all entries). A higher load factor saves memory but increases collisions. A lower load factor wastes memory but keeps operations fast.

When to Use Each

Need Best structure Why
Unique elements, fast lookup HashSet O(1) contains, automatic duplicate removal
Key-value mapping HashMap O(1) get/put by key
Sorted order needed TreeSet / BST Hash tables do not maintain order
Counting occurrences HashMap<String, Integer> Map each item to its count
Finding common elements Two HashSets + retainAll Set intersection

Worked Examples

Example 1: HashSet – Unique Words

import java.util.HashSet;

public class UniqueWords {
    public static void main(String[] args) {
        String[] words = {"the", "cat", "sat", "on", "the", "mat", "the"};

        HashSet<String> unique = new HashSet<>();
        for (String w : words) {
            unique.add(w);
        }

        System.out.println("Unique words: " + unique);
        System.out.println("Count: " + unique.size());
        System.out.println("Contains 'cat'? " + unique.contains("cat"));
        System.out.println("Contains 'dog'? " + unique.contains("dog"));
    }
}

Output (element order may vary):

Unique words: [on, cat, the, mat, sat]
Count: 5
Contains 'cat'? true
Contains 'dog'? false

The word “the” appeared three times in the array but only once in the set. The output order is not the insertion order – HashSet is unordered, so elements may appear in any order.

Example 2: Set Operations – Mutual Friends

import java.util.HashSet;

public class MutualFriends {
    public static void main(String[] args) {
        HashSet<String> aliceFriends = new HashSet<>();
        aliceFriends.add("Bob");
        aliceFriends.add("Carol");
        aliceFriends.add("Dave");
        aliceFriends.add("Eve");

        HashSet<String> bobFriends = new HashSet<>();
        bobFriends.add("Alice");
        bobFriends.add("Carol");
        bobFriends.add("Frank");
        bobFriends.add("Eve");

        // Intersection: mutual friends
        HashSet<String> mutual = new HashSet<>(aliceFriends);
        mutual.retainAll(bobFriends);
        System.out.println("Mutual friends: " + mutual);

        // Union: all friends combined
        HashSet<String> allFriends = new HashSet<>(aliceFriends);
        allFriends.addAll(bobFriends);
        System.out.println("All friends: " + allFriends);

        // Difference: Alice's friends who are not Bob's friends
        HashSet<String> aliceOnly = new HashSet<>(aliceFriends);
        aliceOnly.removeAll(bobFriends);
        System.out.println("Alice only: " + aliceOnly);
    }
}

Output (element order may vary):

Mutual friends: [Carol, Eve]
All friends: [Bob, Alice, Carol, Dave, Eve, Frank]
Alice only: [Bob, Dave]

Always copy the set before calling retainAll, removeAll, or addAll if you need to keep the original. These methods modify the set in place.

Example 3: HashMap – Frequency Counter

import java.util.HashMap;

public class FrequencyCounter {
    public static void main(String[] args) {
        String[] votes = {"Red", "Blue", "Red", "Green", "Blue", "Red", "Blue", "Blue"};

        HashMap<String, Integer> counts = new HashMap<>();
        for (String vote : votes) {
            if (counts.containsKey(vote)) {
                counts.put(vote, counts.get(vote) + 1);
            } else {
                counts.put(vote, 1);
            }
        }

        System.out.println("Vote counts: " + counts);
        System.out.println("Red votes: " + counts.get("Red"));

        // Find the winner
        String winner = "";
        int maxVotes = 0;
        for (String key : counts.keySet()) {
            if (counts.get(key) > maxVotes) {
                maxVotes = counts.get(key);
                winner = key;
            }
        }
        System.out.println("Winner: " + winner + " with " + maxVotes + " votes");
    }
}

Output:

Vote counts: {Red=3, Blue=4, Green=1}
Red votes: 3
Winner: Blue with 4 votes

The frequency counter pattern is one of the most common uses of HashMap. Each unique item becomes a key, and its count becomes the value.

Example 4: How a Hash Table Works Internally

This simplified hash table uses chaining (linked lists) to handle collisions.

public class SimpleHashTable {
    private String[] table;
    private int size;

    public SimpleHashTable(int capacity) {
        table = new String[capacity];
        size = 0;
    }

    private int hash(String key) {
        int code = 0;
        for (int i = 0; i < key.length(); i++) {
            code = code + key.charAt(i);
        }
        return code % table.length;
    }

    public void put(String key) {
        int index = hash(key);
        System.out.println("'" + key + "' -> bucket " + index);
        table[index] = key;    // simplified: overwrites on collision
        size++;
    }

    public static void main(String[] args) {
        SimpleHashTable ht = new SimpleHashTable(7);
        ht.put("cat");
        ht.put("dog");
        ht.put("bat");
        ht.put("tac");          // anagram of "cat" -- same letter sum
    }
}

Output:

'cat' -> bucket 4
'dog' -> bucket 6
'bat' -> bucket 3
'tac' -> bucket 4

“cat” and “tac” both hash to bucket 4 – a collision. They have the same letters (c+a+t = 312), so the sum-based hash produces the same value (312 % 7 = 4). In this simplified version, “tac” overwrites “cat”. A real implementation would use chaining (a linked list at each bucket) to store both values.


Quick Code Check

Q1. What happens when you call set.add("hello") on a HashSet that already contains "hello"?

Q2. If a HashMap already contains the key "age" with value 25, what happens when you call map.put("age", 30)?

Q3. What is a collision in a hash table?

Q4. A hash table has 12 entries and 16 buckets. What is the load factor?

Q5. Which set operation does setA.retainAll(setB) perform?


Trace Exercise

Trace the set operations on these two sets:

setA = {1, 3, 5, 7, 9}
setB = {2, 3, 5, 8}
OperationJava methodResult
Union (A ∪ B) addAll
Intersection (A ∩ B) retainAll
Difference (A − B) removeAll
Is B a subset of A? containsAll

Trace the HashMap operations step by step.

OperationMap state afterReturn value
put("A", 10) {A=10}
put("B", 20) null
put("A", 30)
get("B") {A=30, B=20}
get("C") {A=30, B=20}

Code Completion

Complete the frequency counter pattern using a HashMap.

Count how many times each word appears in an array:

HashMap<String, Integer> counts = new HashMap<>();
for (String word : words) {
    if (counts.(word)) {
        counts.put(word, counts.(word) + 1);
    } else {
        counts.put(word, );
    }
}

Complete the set intersection using a copy to preserve the original set.

Find mutual friends without modifying the original sets:

HashSet<String> mutual = new (aliceFriends);
mutual.(bobFriends);
System.out.println("Mutual: " + mutual);

Spot the Bug

This code is supposed to print the intersection AND the original set A, but set A is wrong after the intersection:

1HashSet<String> setA = new HashSet<>(Arrays.asList("X", "Y", "Z")); 2HashSet<String> setB = new HashSet<>(Arrays.asList("Y", "Z", "W")); 3 4setA.retainAll(setB); 5System.out.println("Intersection: " + setA); 6System.out.println("Original A: " + setA); // expected {X, Y, Z}

Pick the correct fix:


This frequency counter does not compile:

1HashMap<String, Integer> map = new HashMap<>(); 2for (String item : items) { 3 if (map.contains(item)) { 4 map.put(item, map.get(item) + 1); 5 } else { 6 map.put(item, 1); 7 } 8}

Pick the correct fix:


Output Prediction

What does this code print?

HashSet<String> fruits = new HashSet<>();
fruits.add("apple");
fruits.add("banana");
fruits.add("cherry");
fruits.add("apple");
fruits.remove("banana");
System.out.println(fruits.size());
System.out.println(fruits.contains("banana"));

Type the exact output:


What does this code print?

HashMap<String, Integer> map = new HashMap<>();
map.put("X", 5);
map.put("Y", 10);
map.put("X", 15);
System.out.println(map.get("X"));
System.out.println(map.get("Z"));
System.out.println(map.size());

Type the exact output:


Practice Exercises

Core

  1. Remove duplicates – Given an array of integers {4, 2, 7, 2, 9, 4, 7, 1}, use a HashSet to remove duplicates. Print the unique values and the count. (Expected count: 5)
  2. Word frequency – Given a sentence stored as a String[] of words, use a HashMap<String, Integer> to count how many times each word appears. Print each word and its count.
  3. Student lookup – Create a HashMap<String, Integer> mapping student names to their grades. Add 5 students, then look up a name and print “Found: [grade]” or “Not found”.

Extension

  1. Set operations – Create two sets: classA = {"Alice", "Bob", "Carol", "Dave"} and classB = {"Bob", "Dave", "Eve", "Frank"}. Print: (a) students in both classes (intersection), (b) all students (union), (c) students only in class A (difference). Preserve the original sets.
  2. Most frequent character – Given a string, use a HashMap<Character, Integer> to count each character’s frequency, then find and print the most frequent character and its count.
  3. Hash function exploration – Write a simple hash function that sums the character values of a string and returns the result modulo a table size. Test with 10 words and a table of size 7. How many collisions occur? Try a table of size 13. Does it reduce collisions?

Challenge

  1. Two-sum problem – Given an array of integers and a target sum, use a HashMap to find two numbers that add up to the target. For each number, check if target - number is already in the map. Return the pair. This is O(n) with a HashMap vs O(n^2) with nested loops.
  2. Simple spell checker – Load a dictionary of valid words into a HashSet. Given a paragraph (array of words), check each word against the dictionary and print any misspelled words. Measure: how many lookups would this require with a list (O(n) each) vs a HashSet (O(1) each)?

Connections

  • Prerequisites: What Are ADTs? – the concept of interface vs implementation
  • Prerequisites: OOP Fundamentals – classes, objects, encapsulation
  • Related: Binary Search Trees – another fast-lookup structure, but maintains sorted order (O(log n) vs O(1))
  • Related: ArrayList – dynamic collections, but without the uniqueness guarantee or O(1) lookup
  • Next: Choosing the Right Structure – comparing all ADTs to pick the right tool

© EduCS.me — A resource hub for Computer Science education

This site uses Just the Docs, a documentation theme for Jekyll.