How does a hash table achieve near constant-time lookup, and how are collisions handled?
Describe a hash table and hash function, explain collision resolution by chaining and open addressing, and analyse the effect of load factor
A focused answer to the H2 Computing outcome on hash tables. Hash functions mapping keys to buckets, average O(1) lookup, collisions resolved by chaining or open addressing, and how the load factor affects performance.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
SEAB wants you to describe a hash table and its hash function, explain how collisions are resolved by chaining and open addressing, and analyse how the load factor affects performance. The central idea is that a hash function turns a key directly into an array index, giving average lookup - provided collisions are kept rare.
The answer
The idea: a hash function as an index
A hash table stores key-value pairs in an array of buckets. A hash function maps a key to a bucket index, so you compute the location directly instead of searching:
for a table of buckets. To store or find a key, compute its bucket and go straight there - this is why average lookup, insertion and deletion are .
Collisions
Two different keys can hash to the same bucket - a collision. Since a finite table cannot give every possible key its own slot, collisions are inevitable and must be resolved. Two standard strategies:
Chaining. Each bucket holds a list of all keys that hashed there. On collision, append to the list; on lookup, scan the short list. Simple and degrades gracefully:
# bucket 3 after collisions: [23, 33]
Open addressing. Each bucket holds at most one key. On collision, probe for the next free bucket by a rule (linear probing tries the next index, wrapping around). Lookup follows the same probe sequence until it finds the key or an empty slot.
Load factor
The load factor measures how full the table is:
As rises, collisions become more likely, so chains lengthen or probes take longer and average lookup slows. Tables therefore resize (allocate more buckets and rehash every key) when passes a threshold, restoring near- performance.
A good hash function
A good hash function distributes keys uniformly across buckets (few collisions), is fast to compute, and is deterministic (same key, same bucket every time). Poor distribution clusters keys and pushes the worst case toward .
Examples in context
Example 1. A dictionary or set. Python's dict and set are hash tables: looking up a value by key is average , which is why membership tests like key in d are effectively instant even for millions of entries. The hash function does the work that a search would otherwise require.
Example 2. Caching and deduplication. A web cache keyed by URL, or a system removing duplicate records, hashes each key to check presence in constant time on average. Hashing is the standard tool whenever you need fast "have I seen this before?" lookups and do not need the data in sorted order.
Try this
Q1. With , which bucket does key 23 hash to? [1 mark]
- Cue. , so bucket 3.
Q2. Define the load factor of a hash table and state its effect as it increases. [2 marks]
- Cue. (keys over buckets); as it rises, collisions become more frequent, lengthening chains or probes and slowing average lookup.
Q3. Distinguish between chaining and open addressing for collision resolution. [2 marks]
- Cue. Chaining keeps a list of all colliding keys in the bucket; open addressing stores one key per bucket and probes for another free bucket on collision.
Exam-style practice questions
Practice questions written in the style of SEAB exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
Original6 marksA hash table has 10 buckets (indices 0 to 9) and uses the hash function . Insert the keys 23, 45, 12, 33, 25 using **chaining** for collisions. Show the contents of each bucket after all insertions.Show worked answer →
Compute each key's bucket as :
| Key | Bucket | |
|---|---|---|
| 23 | 3 | 3 |
| 45 | 5 | 5 |
| 12 | 2 | 2 |
| 33 | 3 | 3 (collision with 23) |
| 25 | 5 | 5 (collision with 45) |
With chaining, each bucket holds a list of all keys that hash to it:
0: []
1: []
2: [12]
3: [23, 33]
4: []
5: [45, 25]
6-9: []
Buckets 3 and 5 each hold a chain of two keys because of collisions.
Markers reward correct bucket computation, placing colliding keys into the same bucket's list, and the final bucket contents.
Original5 marks(a) State the average and worst-case time complexity of search in a hash table. (b) Define the load factor and explain its effect on performance. (c) State one property of a good hash function.Show worked answer →
(a) Average case ; worst case (if all keys collide into one bucket, search degrades to scanning a list of all keys).
(b) The load factor is , the number of stored keys divided by the number of buckets . As the load factor rises, collisions become more frequent, so chains lengthen (or probing takes longer), and average lookup slows. Tables are typically resized (rehashed into more buckets) when the load factor exceeds a threshold to keep operations near .
(c) A good hash function distributes keys uniformly across the buckets (minimising collisions), is fast to compute, and is deterministic (the same key always hashes to the same bucket).
Markers reward the average and worst , the load-factor definition with its collision effect (and resizing), and a valid hash-function property such as uniform distribution.
Related dot points
- Describe arrays and records as data structures, explain constant-time array indexing and contrast static with dynamic arrays
A focused answer to the H2 Computing outcome on arrays and records. Contiguous storage and constant-time indexing, one and two dimensional arrays, records as fields of mixed type, and static versus dynamic arrays.
- Describe a linked list as nodes joined by pointers, implement insertion and deletion, and contrast it with an array
A focused answer to the H2 Computing outcome on linked lists. Nodes and pointers, traversal, inserting and deleting nodes by relinking, and the trade-offs against arrays for access and modification.
- Describe a binary search tree, perform insertion, search and in-order traversal, and explain how balance affects complexity
A focused answer to the H2 Computing outcome on binary search trees. The ordering property, inserting and searching nodes, in-order traversal giving sorted output, and why an unbalanced tree degrades to O(n).
- Implement and trace linear search and binary search, state their complexities, and justify when each applies
A focused answer to the H2 Computing outcome on searching. Linear search and binary search, tracing each on an example, their O(n) and O(log n) complexities, and the precondition that binary search needs sorted data.
- Analyse the time and space complexity of an algorithm using Big-O notation, and compare common growth rates
A focused answer to the H2 Computing outcome on algorithmic complexity. Big-O notation, deriving the order of growth from loops and recursion, the common complexity classes, and the trade-off between time and space.