Sets and HashMaps
IB Syllabus: B4.1.5 (HL) – Construct and apply sets as an ADT: unordered, unique elements; union, intersection, difference; check membership, add, remove, subset/superset. B4.1.6 (HL) – Explain the core principles of ADTs: hash tables (hashing functions, collision resolution, load factors); mechanics of sets; HashMap/HashSet in Java.
This page is HL only.
Key Concepts
So far you have built data structures where finding a value requires traversal – linked lists need O(n), and even balanced BSTs need O(log n). Hash-based structures take a fundamentally different approach: they compute the storage location directly from the value, giving O(1) average-case lookup.
Sets
A set is an unordered collection of unique elements. Adding a duplicate has no effect.
| Operation | Java HashSet method | Description |
|---|---|---|
| Add | add(element) | Insert element (ignored if already present) |
| Remove | remove(element) | Delete element |
| Contains | contains(element) | Check membership – returns true/false |
| Size | size() | Number of elements |
| Union | addAll(otherSet) | All elements from both sets |
| Intersection | retainAll(otherSet) | Only elements in both sets |
| Difference | removeAll(otherSet) | Elements in this set but not in the other |
retainAllandremoveAllmodify the original set. If you need to keep the original, copy it first:HashSet<String> result = new HashSet<>(setA);
HashMaps
A HashMap stores key-value pairs. Each key maps to exactly one value. Keys must be unique; values can repeat.
| Operation | Java HashMap method | Description |
|---|---|---|
| Put | put(key, value) | Add or update a key-value pair |
| Get | get(key) | Retrieve the value for a key (returns null if absent) |
| Contains key | containsKey(key) | Check if a key exists |
| Contains value | containsValue(value) | Check if a value exists (slow – O(n)) |
| Remove | remove(key) | Delete a key-value pair |
| Size | size() | Number of key-value pairs |
| Keys | keySet() | Returns a Set of all keys |
How Hashing Works
Both HashSet and HashMap use a hash table internally. The idea:
- A hash function converts a key into an integer (the hash code)
- The hash code is mapped to a bucket index (usually
hashCode % tableSize) - The element is stored in that bucket
Key: "Alice" → hashCode: 63650101 → bucket: 63650101 % 10 = 1
Key: "Bob" → hashCode: 66965 → bucket: 66965 % 10 = 5
Key: "Carol" → hashCode: 65184341 → bucket: 65184341 % 10 = 1 ← collision!
Collision Resolution
A collision occurs when two different keys hash to the same bucket. Two common strategies:
| Strategy | How it works | Trade-off |
|---|---|---|
| Chaining | Each bucket holds a linked list of entries | Simple, handles many collisions, but uses extra memory |
| Open addressing | If bucket is full, probe the next empty bucket | No extra memory, but performance degrades as table fills |
Java’s HashMap uses chaining (linked lists in each bucket; switches to a tree when a chain gets long).
Load Factor
The load factor is the ratio of entries to buckets:
load factor = number of entries / number of buckets
When the load factor exceeds a threshold (Java default: 0.75), the table resizes (doubles its capacity and rehashes all entries). A higher load factor saves memory but increases collisions. A lower load factor wastes memory but keeps operations fast.
When to Use Each
| Need | Best structure | Why |
|---|---|---|
| Unique elements, fast lookup | HashSet | O(1) contains, automatic duplicate removal |
| Key-value mapping | HashMap | O(1) get/put by key |
| Sorted order needed | TreeSet / BST | Hash tables do not maintain order |
| Counting occurrences | HashMap<String, Integer> | Map each item to its count |
| Finding common elements | Two HashSets + retainAll | Set intersection |
Worked Examples
Example 1: HashSet – Unique Words
import java.util.HashSet;
public class UniqueWords {
public static void main(String[] args) {
String[] words = {"the", "cat", "sat", "on", "the", "mat", "the"};
HashSet<String> unique = new HashSet<>();
for (String w : words) {
unique.add(w);
}
System.out.println("Unique words: " + unique);
System.out.println("Count: " + unique.size());
System.out.println("Contains 'cat'? " + unique.contains("cat"));
System.out.println("Contains 'dog'? " + unique.contains("dog"));
}
}
Output (element order may vary):
Unique words: [on, cat, the, mat, sat]
Count: 5
Contains 'cat'? true
Contains 'dog'? false
The word “the” appeared three times in the array but only once in the set. The output order is not the insertion order – HashSet is unordered, so elements may appear in any order.
Example 2: Set Operations – Mutual Friends
import java.util.HashSet;
public class MutualFriends {
public static void main(String[] args) {
HashSet<String> aliceFriends = new HashSet<>();
aliceFriends.add("Bob");
aliceFriends.add("Carol");
aliceFriends.add("Dave");
aliceFriends.add("Eve");
HashSet<String> bobFriends = new HashSet<>();
bobFriends.add("Alice");
bobFriends.add("Carol");
bobFriends.add("Frank");
bobFriends.add("Eve");
// Intersection: mutual friends
HashSet<String> mutual = new HashSet<>(aliceFriends);
mutual.retainAll(bobFriends);
System.out.println("Mutual friends: " + mutual);
// Union: all friends combined
HashSet<String> allFriends = new HashSet<>(aliceFriends);
allFriends.addAll(bobFriends);
System.out.println("All friends: " + allFriends);
// Difference: Alice's friends who are not Bob's friends
HashSet<String> aliceOnly = new HashSet<>(aliceFriends);
aliceOnly.removeAll(bobFriends);
System.out.println("Alice only: " + aliceOnly);
}
}
Output (element order may vary):
Mutual friends: [Carol, Eve]
All friends: [Bob, Alice, Carol, Dave, Eve, Frank]
Alice only: [Bob, Dave]
Always copy the set before calling
retainAll,removeAll, oraddAllif you need to keep the original. These methods modify the set in place.
Example 3: HashMap – Frequency Counter
import java.util.HashMap;
public class FrequencyCounter {
public static void main(String[] args) {
String[] votes = {"Red", "Blue", "Red", "Green", "Blue", "Red", "Blue", "Blue"};
HashMap<String, Integer> counts = new HashMap<>();
for (String vote : votes) {
if (counts.containsKey(vote)) {
counts.put(vote, counts.get(vote) + 1);
} else {
counts.put(vote, 1);
}
}
System.out.println("Vote counts: " + counts);
System.out.println("Red votes: " + counts.get("Red"));
// Find the winner
String winner = "";
int maxVotes = 0;
for (String key : counts.keySet()) {
if (counts.get(key) > maxVotes) {
maxVotes = counts.get(key);
winner = key;
}
}
System.out.println("Winner: " + winner + " with " + maxVotes + " votes");
}
}
Output:
Vote counts: {Red=3, Blue=4, Green=1}
Red votes: 3
Winner: Blue with 4 votes
The frequency counter pattern is one of the most common uses of HashMap. Each unique item becomes a key, and its count becomes the value.
Example 4: How a Hash Table Works Internally
This simplified hash table uses chaining (linked lists) to handle collisions.
public class SimpleHashTable {
private String[] table;
private int size;
public SimpleHashTable(int capacity) {
table = new String[capacity];
size = 0;
}
private int hash(String key) {
int code = 0;
for (int i = 0; i < key.length(); i++) {
code = code + key.charAt(i);
}
return code % table.length;
}
public void put(String key) {
int index = hash(key);
System.out.println("'" + key + "' -> bucket " + index);
table[index] = key; // simplified: overwrites on collision
size++;
}
public static void main(String[] args) {
SimpleHashTable ht = new SimpleHashTable(7);
ht.put("cat");
ht.put("dog");
ht.put("bat");
ht.put("tac"); // anagram of "cat" -- same letter sum
}
}
Output:
'cat' -> bucket 4
'dog' -> bucket 6
'bat' -> bucket 3
'tac' -> bucket 4
“cat” and “tac” both hash to bucket 4 – a collision. They have the same letters (c+a+t = 312), so the sum-based hash produces the same value (312 % 7 = 4). In this simplified version, “tac” overwrites “cat”. A real implementation would use chaining (a linked list at each bucket) to store both values.
Quick Code Check
Q1. What happens when you call set.add("hello") on a HashSet that already contains "hello"?
Q2. If a HashMap already contains the key "age" with value 25, what happens when you call map.put("age", 30)?
Q3. What is a collision in a hash table?
Q4. A hash table has 12 entries and 16 buckets. What is the load factor?
Q5. Which set operation does setA.retainAll(setB) perform?
Trace Exercise
Trace the set operations on these two sets:
setA = {1, 3, 5, 7, 9}
setB = {2, 3, 5, 8}
| Operation | Java method | Result |
|---|---|---|
| Union (A ∪ B) | addAll | |
| Intersection (A ∩ B) | retainAll | |
| Difference (A − B) | removeAll | |
| Is B a subset of A? | containsAll |
Trace the HashMap operations step by step.
| Operation | Map state after | Return value |
|---|---|---|
put("A", 10) | {A=10} | |
put("B", 20) | null | |
put("A", 30) | ||
get("B") | {A=30, B=20} | |
get("C") | {A=30, B=20} |
Code Completion
Complete the frequency counter pattern using a HashMap.
Count how many times each word appears in an array:
HashMap<String, Integer> counts = new HashMap<>();
for (String word : words) {
if (counts.(word)) {
counts.put(word, counts.(word) + 1);
} else {
counts.put(word, );
}
}
Complete the set intersection using a copy to preserve the original set.
Find mutual friends without modifying the original sets:
HashSet<String> mutual = new (aliceFriends);
mutual.(bobFriends);
System.out.println("Mutual: " + mutual);
Spot the Bug
This code is supposed to print the intersection AND the original set A, but set A is wrong after the intersection:
Pick the correct fix:
This frequency counter does not compile:
Pick the correct fix:
Output Prediction
What does this code print?
HashSet<String> fruits = new HashSet<>();
fruits.add("apple");
fruits.add("banana");
fruits.add("cherry");
fruits.add("apple");
fruits.remove("banana");
System.out.println(fruits.size());
System.out.println(fruits.contains("banana"));Type the exact output:
What does this code print?
HashMap<String, Integer> map = new HashMap<>();
map.put("X", 5);
map.put("Y", 10);
map.put("X", 15);
System.out.println(map.get("X"));
System.out.println(map.get("Z"));
System.out.println(map.size());Type the exact output:
Practice Exercises
Core
- Remove duplicates – Given an array of integers
{4, 2, 7, 2, 9, 4, 7, 1}, use aHashSetto remove duplicates. Print the unique values and the count. (Expected count: 5) - Word frequency – Given a sentence stored as a
String[]of words, use aHashMap<String, Integer>to count how many times each word appears. Print each word and its count. - Student lookup – Create a
HashMap<String, Integer>mapping student names to their grades. Add 5 students, then look up a name and print “Found: [grade]” or “Not found”.
Extension
- Set operations – Create two sets:
classA = {"Alice", "Bob", "Carol", "Dave"}andclassB = {"Bob", "Dave", "Eve", "Frank"}. Print: (a) students in both classes (intersection), (b) all students (union), (c) students only in class A (difference). Preserve the original sets. - Most frequent character – Given a string, use a
HashMap<Character, Integer>to count each character’s frequency, then find and print the most frequent character and its count. - Hash function exploration – Write a simple hash function that sums the character values of a string and returns the result modulo a table size. Test with 10 words and a table of size 7. How many collisions occur? Try a table of size 13. Does it reduce collisions?
Challenge
- Two-sum problem – Given an array of integers and a target sum, use a
HashMapto find two numbers that add up to the target. For each number, check iftarget - numberis already in the map. Return the pair. This is O(n) with a HashMap vs O(n^2) with nested loops. - Simple spell checker – Load a dictionary of valid words into a
HashSet. Given a paragraph (array of words), check each word against the dictionary and print any misspelled words. Measure: how many lookups would this require with a list (O(n) each) vs a HashSet (O(1) each)?
Connections
- Prerequisites: What Are ADTs? – the concept of interface vs implementation
- Prerequisites: OOP Fundamentals – classes, objects, encapsulation
- Related: Binary Search Trees – another fast-lookup structure, but maintains sorted order (O(log n) vs O(1))
- Related: ArrayList – dynamic collections, but without the uniqueness guarantee or O(1) lookup
- Next: Choosing the Right Structure – comparing all ADTs to pick the right tool