Sets and HashMaps

IB Syllabus: B4.1.5 (HL) – Construct and apply sets as an ADT: unordered, unique elements; union, intersection, difference; check membership, add, remove, subset/superset. B4.1.6 (HL) – Explain the core principles of ADTs: hash tables (hashing functions, collision resolution, load factors); mechanics of sets; HashMap/HashSet in Java.

This page is HL only.

Key Concepts

So far you have built data structures where finding a value requires traversal – linked lists need O(n), and even balanced BSTs need O(log n). Hash-based structures take a fundamentally different approach: they compute the storage location directly from the value, giving O(1) average-case lookup.

Sets

A set is an unordered collection of unique elements. Adding a duplicate has no effect.

Operation	Java `HashSet` method	Description
Add	`add(element)`	Insert element (ignored if already present)
Remove	`remove(element)`	Delete element
Contains	`contains(element)`	Check membership – returns `true`/`false`
Size	`size()`	Number of elements
Union	`addAll(otherSet)`	All elements from both sets
Intersection	`retainAll(otherSet)`	Only elements in both sets
Difference	`removeAll(otherSet)`	Elements in this set but not in the other

retainAll and removeAll modify the original set. If you need to keep the original, copy it first: HashSet<String> result = new HashSet<>(setA); (Python’s set operators &, |, and - return a new set instead, so the originals are safe.)

HashMaps

A HashMap stores key-value pairs. Each key maps to exactly one value. Keys must be unique; values can repeat.

Operation	Java `HashMap` method	Description
Put	`put(key, value)`	Add or update a key-value pair
Get	`get(key)`	Retrieve the value for a key (returns `null` if absent)
Contains key	`containsKey(key)`	Check if a key exists
Contains value	`containsValue(value)`	Check if a value exists (slow – O(n))
Remove	`remove(key)`	Delete a key-value pair
Size	`size()`	Number of key-value pairs
Keys	`keySet()`	Returns a Set of all keys

How Hashing Works

Both HashSet and HashMap use a hash table internally. The idea:

A hash function converts a key into an integer (the hash code)
The hash code is mapped to a bucket index (usually hashCode % tableSize)
The element is stored in that bucket

Key: "Alice"  →  hashCode: 63650101  →  bucket: 63650101 % 10 = 1
Key: "Bob"    →  hashCode: 66965     →  bucket: 66965 % 10 = 5
Key: "Carol"  →  hashCode: 65184341  →  bucket: 65184341 % 10 = 1  ← collision!

Collision Resolution

A collision occurs when two different keys hash to the same bucket. Two common strategies:

Strategy	How it works	Trade-off
Chaining	Each bucket holds a linked list of entries	Simple, handles many collisions, but uses extra memory
Open addressing	If bucket is full, probe the next empty bucket	No extra memory, but performance degrades as table fills

Java’s HashMap uses chaining (linked lists in each bucket; switches to a tree when a chain gets long).

Load Factor

The load factor is the ratio of entries to buckets:

load factor = number of entries / number of buckets

When the load factor exceeds a threshold (Java default: 0.75), the table resizes (doubles its capacity and rehashes all entries). A higher load factor saves memory but increases collisions. A lower load factor wastes memory but keeps operations fast.

When to Use Each

Need	Best structure	Why
Unique elements, fast lookup	`HashSet`	O(1) contains, automatic duplicate removal
Key-value mapping	`HashMap`	O(1) get/put by key
Sorted order needed	`TreeSet` / BST	Hash tables do not maintain order
Counting occurrences	`HashMap<String, Integer>`	Map each item to its count
Finding common elements	Two `HashSet`s + `retainAll`	Set intersection

Worked Examples

Example 1: HashSet – Unique Words

import java.util.HashSet;

public class UniqueWords {
    public static void main(String[] args) {
        String[] words = {"the", "cat", "sat", "on", "the", "mat", "the"};

        HashSet<String> unique = new HashSet<>();
        for (String w : words) {
            unique.add(w);
        }

        System.out.println("Unique words: " + unique);
        System.out.println("Count: " + unique.size());
        System.out.println("Contains 'cat'? " + unique.contains("cat"));
        System.out.println("Contains 'dog'? " + unique.contains("dog"));
    }
}

words = ["the", "cat", "sat", "on", "the", "mat", "the"]

unique = set()    # set() makes an empty set -- {} would make an empty dict
for w in words:
    unique.add(w)

# Python shows a set as {'on', 'cat', ...} rather than [on, cat, ...] -- order may vary either way
print("Unique words:", unique)
print("Count:", len(unique))
# Membership uses the in operator; Python prints True/False, Java prints true/false
print("Contains 'cat'?", "cat" in unique)
print("Contains 'dog'?", "dog" in unique)

Output (element order may vary):

Unique words: [on, cat, the, mat, sat]
Count: 5
Contains 'cat'? true
Contains 'dog'? false

The word “the” appeared three times in the array but only once in the set. The output order is not the insertion order – HashSet is unordered, so elements may appear in any order.

Example 2: Set Operations – Mutual Friends

import java.util.HashSet;

public class MutualFriends {
    public static void main(String[] args) {
        HashSet<String> aliceFriends = new HashSet<>();
        aliceFriends.add("Bob");
        aliceFriends.add("Carol");
        aliceFriends.add("Dave");
        aliceFriends.add("Eve");

        HashSet<String> bobFriends = new HashSet<>();
        bobFriends.add("Alice");
        bobFriends.add("Carol");
        bobFriends.add("Frank");
        bobFriends.add("Eve");

        // Intersection: mutual friends
        HashSet<String> mutual = new HashSet<>(aliceFriends);
        mutual.retainAll(bobFriends);
        System.out.println("Mutual friends: " + mutual);

        // Union: all friends combined
        HashSet<String> allFriends = new HashSet<>(aliceFriends);
        allFriends.addAll(bobFriends);
        System.out.println("All friends: " + allFriends);

        // Difference: Alice's friends who are not Bob's friends
        HashSet<String> aliceOnly = new HashSet<>(aliceFriends);
        aliceOnly.removeAll(bobFriends);
        System.out.println("Alice only: " + aliceOnly);
    }
}

alice_friends = {"Bob", "Carol", "Dave", "Eve"}    # a set literal builds the set directly
bob_friends = {"Alice", "Carol", "Frank", "Eve"}

# The set operators return a NEW set -- no copy needed, unlike Java's retainAll/addAll/removeAll

# Intersection: mutual friends
mutual = alice_friends & bob_friends
print("Mutual friends:", mutual)

# Union: all friends combined
all_friends = alice_friends | bob_friends
print("All friends:", all_friends)

# Difference: Alice's friends who are not Bob's friends
alice_only = alice_friends - bob_friends
print("Alice only:", alice_only)

Output (element order may vary):

Mutual friends: [Carol, Eve]
All friends: [Bob, Alice, Carol, Dave, Eve, Frank]
Alice only: [Bob, Dave]

Always copy the set before calling retainAll, removeAll, or addAll if you need to keep the original. These methods modify the set in place.

Example 3: HashMap – Frequency Counter

import java.util.HashMap;

public class FrequencyCounter {
    public static void main(String[] args) {
        String[] votes = {"Red", "Blue", "Red", "Green", "Blue", "Red", "Blue", "Blue"};

        HashMap<String, Integer> counts = new HashMap<>();
        for (String vote : votes) {
            if (counts.containsKey(vote)) {
                counts.put(vote, counts.get(vote) + 1);
            } else {
                counts.put(vote, 1);
            }
        }

        System.out.println("Vote counts: " + counts);
        System.out.println("Red votes: " + counts.get("Red"));

        // Find the winner
        String winner = "";
        int maxVotes = 0;
        for (String key : counts.keySet()) {
            if (counts.get(key) > maxVotes) {
                maxVotes = counts.get(key);
                winner = key;
            }
        }
        System.out.println("Winner: " + winner + " with " + maxVotes + " votes");
    }
}

votes = ["Red", "Blue", "Red", "Green", "Blue", "Red", "Blue", "Blue"]

counts = {}    # an empty dict -- Python's HashMap
for vote in votes:
    if vote in counts:                     # in checks the KEYS, like containsKey
        counts[vote] = counts[vote] + 1    # counts[vote] reads/updates the value for that key
    else:
        counts[vote] = 1

# Python shows a dict as {'Red': 3, ...} rather than {Red=3, ...}
print("Vote counts:", counts)
print("Red votes:", counts["Red"])

# Find the winner
winner = ""
max_votes = 0
for key in counts:                         # looping over a dict visits its keys, like keySet()
    if counts[key] > max_votes:
        max_votes = counts[key]
        winner = key
print("Winner:", winner, "with", max_votes, "votes")

Output:

Vote counts: {Red=3, Blue=4, Green=1}
Red votes: 3
Winner: Blue with 4 votes

The frequency counter pattern is one of the most common uses of HashMap. Each unique item becomes a key, and its count becomes the value.

Example 4: How a Hash Table Works Internally

This simplified hash table uses chaining (linked lists) to handle collisions.

public class SimpleHashTable {
    private String[] table;
    private int size;

    public SimpleHashTable(int capacity) {
        table = new String[capacity];
        size = 0;
    }

    private int hash(String key) {
        int code = 0;
        for (int i = 0; i < key.length(); i++) {
            code = code + key.charAt(i);
        }
        return code % table.length;
    }

    public void put(String key) {
        int index = hash(key);
        System.out.println("'" + key + "' -> bucket " + index);
        table[index] = key;    // simplified: overwrites on collision
        size++;
    }

    public static void main(String[] args) {
        SimpleHashTable ht = new SimpleHashTable(7);
        ht.put("cat");
        ht.put("dog");
        ht.put("bat");
        ht.put("tac");          // anagram of "cat" -- same letter sum
    }
}

class SimpleHashTable():
    def __init__(self, capacity):
        self._table = [None] * capacity    # a fixed-size list of empty buckets
        self._size = 0                     # the leading underscore marks these as private (IB convention)

    def hash(self, key):
        code = 0
        for i in range(len(key)):
            code = code + ord(key[i])    # ord() gives the character code -- Java adds chars as numbers automatically
        return code % len(self._table)

    def put(self, key):
        index = self.hash(key)
        print("'" + key + "' -> bucket " + str(index))
        self._table[index] = key    # simplified: overwrites on collision
        self._size = self._size + 1

ht = SimpleHashTable(7)
ht.put("cat")
ht.put("dog")
ht.put("bat")
ht.put("tac")          # anagram of "cat" -- same letter sum

Output:

'cat' -> bucket 4
'dog' -> bucket 6
'bat' -> bucket 3
'tac' -> bucket 4

“cat” and “tac” both hash to bucket 4 – a collision. They have the same letters (c+a+t = 312), so the sum-based hash produces the same value (312 % 7 = 4). In this simplified version, “tac” overwrites “cat”. A real implementation would use chaining (a linked list at each bucket) to store both values.

Quick Code Check

Q1. What happens when you call set.add("hello") on a HashSet that already contains "hello"?

Q2. If a HashMap already contains the key "age" with value 25, what happens when you call map.put("age", 30)?

Q3. What is a collision in a hash table?

Q4. A hash table has 12 entries and 16 buckets. What is the load factor?

Q5. Which set operation does setA.retainAll(setB) perform?

Trace Exercise

Trace the set operations on these two sets:

setA = {1, 3, 5, 7, 9}
setB = {2, 3, 5, 8}

Operation	Java method	Result
Union (A ∪ B)	`addAll`
Intersection (A ∩ B)	`retainAll`
Difference (A − B)	`removeAll`
Is B a subset of A?	`containsAll`

Trace the HashMap operations step by step.

Operation	Map state after	Return value
`put("A", 10)`	{A=10}
`put("B", 20)`		null
`put("A", 30)`
`get("B")`	{A=30, B=20}
`get("C")`	{A=30, B=20}

Code Completion

Complete the frequency counter pattern using a HashMap.

Count how many times each word appears in an array:

HashMap<String, Integer> counts = new HashMap<>();
for (String word : words) {
    if (counts.(word)) {
        counts.put(word, counts.(word) + 1);
    } else {
        counts.put(word, );
    }
}

Complete the set intersection using a copy to preserve the original set.

Find mutual friends without modifying the original sets:

HashSet<String> mutual = new (aliceFriends);
mutual.(bobFriends);
System.out.println("Mutual: " + mutual);

Spot the Bug

This code is supposed to print the intersection AND the original set A, but set A is wrong after the intersection:

1HashSet<String> setA = new HashSet<>(Arrays.asList("X", "Y", "Z")); 2HashSet<String> setB = new HashSet<>(Arrays.asList("Y", "Z", "W")); 3 4setA.retainAll(setB); 5System.out.println("Intersection: " + setA); 6System.out.println("Original A: " + setA); // expected {X, Y, Z}

Pick the correct fix:

This frequency counter does not compile:

1HashMap<String, Integer> map = new HashMap<>(); 2for (String item : items) { 3 if (map.contains(item)) { 4 map.put(item, map.get(item) + 1); 5 } else { 6 map.put(item, 1); 7 } 8}

Pick the correct fix:

Output Prediction

What does this code print?

HashSet<String> fruits = new HashSet<>();
fruits.add("apple");
fruits.add("banana");
fruits.add("cherry");
fruits.add("apple");
fruits.remove("banana");
System.out.println(fruits.size());
System.out.println(fruits.contains("banana"));

Type the exact output:

What does this code print?

HashMap<String, Integer> map = new HashMap<>();
map.put("X", 5);
map.put("Y", 10);
map.put("X", 15);
System.out.println(map.get("X"));
System.out.println(map.get("Z"));
System.out.println(map.size());

Type the exact output:

Practice Exercises

Core

Remove duplicates – Given an array of integers {4, 2, 7, 2, 9, 4, 7, 1}, use a HashSet to remove duplicates. Print the unique values and the count. (Expected count: 5)
Word frequency – Given a sentence stored as a String[] of words, use a HashMap<String, Integer> to count how many times each word appears. Print each word and its count.
Student lookup – Create a HashMap<String, Integer> mapping student names to their grades. Add 5 students, then look up a name and print “Found: [grade]” or “Not found”.

Extension

Set operations – Create two sets: classA = {"Alice", "Bob", "Carol", "Dave"} and classB = {"Bob", "Dave", "Eve", "Frank"}. Print: (a) students in both classes (intersection), (b) all students (union), (c) students only in class A (difference). Preserve the original sets.
Most frequent character – Given a string, use a HashMap<Character, Integer> to count each character’s frequency, then find and print the most frequent character and its count.
Hash function exploration – Write a simple hash function that sums the character values of a string and returns the result modulo a table size. Test with 10 words and a table of size 7. How many collisions occur? Try a table of size 13. Does it reduce collisions?

Challenge

Two-sum problem – Given an array of integers and a target sum, use a HashMap to find two numbers that add up to the target. For each number, check if target - number is already in the map. Return the pair. This is O(n) with a HashMap vs O(n^2) with nested loops.
Simple spell checker – Load a dictionary of valid words into a HashSet. Given a paragraph (array of words), check each word against the dictionary and print any misspelled words. Measure: how many lookups would this require with a list (O(n) each) vs a HashSet (O(1) each)?

Connections

Prerequisites: What Are ADTs? – the concept of interface vs implementation
Prerequisites: OOP Fundamentals – classes, objects, encapsulation
Related: Binary Search Trees – another fast-lookup structure, but maintains sorted order (O(log n) vs O(1))
Related: ArrayList – dynamic collections, but without the uniqueness guarantee or O(1) lookup
Next: Choosing the Right Structure – comparing all ADTs to pick the right tool