Skip to main content
SingaporeComputer ScienceSyllabus dot point

How does a hash table achieve near constant-time lookup, and how are collisions handled?

Describe a hash table and hash function, explain collision resolution by chaining and open addressing, and analyse the effect of load factor

A focused answer to the H2 Computing outcome on hash tables. Hash functions mapping keys to buckets, average O(1) lookup, collisions resolved by chaining or open addressing, and how the load factor affects performance.

Generated by Claude Opus 4.89 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this dot point is asking
  2. The answer
  3. Examples in context
  4. Try this

What this dot point is asking

SEAB wants you to describe a hash table and its hash function, explain how collisions are resolved by chaining and open addressing, and analyse how the load factor affects performance. The central idea is that a hash function turns a key directly into an array index, giving average O(1)O(1) lookup - provided collisions are kept rare.

The answer

The idea: a hash function as an index

A hash table stores key-value pairs in an array of buckets. A hash function h(k)h(k) maps a key to a bucket index, so you compute the location directly instead of searching:

h(k)=kmodmh(k) = k \bmod m

for a table of mm buckets. To store or find a key, compute its bucket and go straight there - this is why average lookup, insertion and deletion are O(1)O(1).

Collisions

Two different keys can hash to the same bucket - a collision. Since a finite table cannot give every possible key its own slot, collisions are inevitable and must be resolved. Two standard strategies:

Chaining. Each bucket holds a list of all keys that hashed there. On collision, append to the list; on lookup, scan the short list. Simple and degrades gracefully:

# bucket 3 after collisions: [23, 33]

Open addressing. Each bucket holds at most one key. On collision, probe for the next free bucket by a rule (linear probing tries the next index, wrapping around). Lookup follows the same probe sequence until it finds the key or an empty slot.

Load factor

The load factor measures how full the table is:

α=nm(keys÷buckets)\alpha = \frac{n}{m} \quad (\text{keys} \div \text{buckets})

As α\alpha rises, collisions become more likely, so chains lengthen or probes take longer and average lookup slows. Tables therefore resize (allocate more buckets and rehash every key) when α\alpha passes a threshold, restoring near-O(1)O(1) performance.

A good hash function

A good hash function distributes keys uniformly across buckets (few collisions), is fast to compute, and is deterministic (same key, same bucket every time). Poor distribution clusters keys and pushes the worst case toward O(n)O(n).

Examples in context

Example 1. A dictionary or set. Python's dict and set are hash tables: looking up a value by key is average O(1)O(1), which is why membership tests like key in d are effectively instant even for millions of entries. The hash function does the work that a search would otherwise require.

Example 2. Caching and deduplication. A web cache keyed by URL, or a system removing duplicate records, hashes each key to check presence in constant time on average. Hashing is the standard tool whenever you need fast "have I seen this before?" lookups and do not need the data in sorted order.

Try this

Q1. With h(k)=kmod5h(k) = k \bmod 5, which bucket does key 23 hash to? [1 mark]

  • Cue. 23mod5=323 \bmod 5 = 3, so bucket 3.

Q2. Define the load factor of a hash table and state its effect as it increases. [2 marks]

  • Cue. α=n/m\alpha = n/m (keys over buckets); as it rises, collisions become more frequent, lengthening chains or probes and slowing average lookup.

Q3. Distinguish between chaining and open addressing for collision resolution. [2 marks]

  • Cue. Chaining keeps a list of all colliding keys in the bucket; open addressing stores one key per bucket and probes for another free bucket on collision.

Exam-style practice questions

Practice questions written in the style of SEAB exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

Original6 marksA hash table has 10 buckets (indices 0 to 9) and uses the hash function h(k)=kmod10h(k) = k \bmod 10. Insert the keys 23, 45, 12, 33, 25 using **chaining** for collisions. Show the contents of each bucket after all insertions.
Show worked answer →

Compute each key's bucket as kbmod10k \\bmod 10:

Key kbmod10k \\bmod 10 Bucket
23 3 3
45 5 5
12 2 2
33 3 3 (collision with 23)
25 5 5 (collision with 45)

With chaining, each bucket holds a list of all keys that hash to it:

0: []
1: []
2: [12]
3: [23, 33]
4: []
5: [45, 25]
6-9: []

Buckets 3 and 5 each hold a chain of two keys because of collisions.

Markers reward correct bucket computation, placing colliding keys into the same bucket's list, and the final bucket contents.

Original5 marks(a) State the average and worst-case time complexity of search in a hash table. (b) Define the load factor and explain its effect on performance. (c) State one property of a good hash function.
Show worked answer →

(a) Average case O(1)O(1); worst case O(n)O(n) (if all keys collide into one bucket, search degrades to scanning a list of all nn keys).

(b) The load factor is alpha=n/m\\alpha = n / m, the number of stored keys nn divided by the number of buckets mm. As the load factor rises, collisions become more frequent, so chains lengthen (or probing takes longer), and average lookup slows. Tables are typically resized (rehashed into more buckets) when the load factor exceeds a threshold to keep operations near O(1)O(1).

(c) A good hash function distributes keys uniformly across the buckets (minimising collisions), is fast to compute, and is deterministic (the same key always hashes to the same bucket).

Markers reward the average O(1)O(1) and worst O(n)O(n), the load-factor definition with its collision effect (and resizing), and a valid hash-function property such as uniform distribution.

Related dot points