SingaporeComputer ScienceSyllabus dot point

Why do computers use several kinds of memory, and how does a cache speed things up?

Describe the memory hierarchy from registers to secondary storage, and explain how caching exploits locality of reference

A focused answer to the H2 Computing outcome on the memory hierarchy. Registers, cache, main memory (RAM) and secondary storage, the trade-off between speed, cost and capacity, and how caching uses locality of reference.

Generated by Claude Opus 4.88 min answerUpdated 2026-06-06

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section

What this dot point is asking
The answer
Examples in context
Try this

What this dot point is asking

SEAB wants you to describe the memory hierarchy from registers down to secondary storage, and explain how caching exploits locality of reference. The central idea is that no single memory can be fast, cheap and large at once, so computers layer several kinds and use a small fast cache - relying on the predictable way programs access memory - to get close to fast-memory speed at large-memory cost.

The answer

The memory hierarchy

Computer memory is arranged in levels, fastest at the top:

Registers - inside the CPU, the fastest, but only a handful.
Cache (L1, L2, L3) - small, very fast memory close to the CPU.
Main memory (RAM) - larger, slower, volatile working memory.
Secondary storage (SSD, hard disk) - very large, much slower, non-volatile.

Below these sit offline/archival storage. The CPU works with registers and cache at full speed and reaches down the hierarchy only as needed.

The trade-off

The levels trade three things against each other:

Moving up: faster, but more expensive per byte and smaller.
Moving down: slower, but cheaper per byte and far larger.

You cannot have fast, cheap and large simultaneously, so the hierarchy combines a little fast memory with a lot of slow memory to approximate "fast and large" at acceptable cost.

Cache, hits and misses

Cache is small fast memory between the CPU and RAM, holding copies of recently or frequently used data and instructions:

A cache hit - the needed data is in the cache, retrieved quickly.
A cache miss - it is not, so it is fetched from slower main memory (and copied into the cache for next time).

A high hit rate keeps the average access time close to cache speed.

Locality of reference

Caching works because programs access memory predictably - locality of reference:

Temporal locality - recently used data is likely to be used again soon (a loop counter, a frequently called function). Keep it in cache.
Spatial locality - data near a recent access is likely to be used soon (the next array element). So a cache loads a whole block of nearby data on a miss, not just one item.

These patterns are why a small cache can serve the great majority of accesses.

Worked example

A program sums all elements of a large array in a loop. Explain how the cache and locality of reference make this fast.

Step 1: First access is a miss

The first element accessed is not yet in the cache - a cache miss. The CPU fetches it from main memory.

Step 2: Spatial locality loads a block

On the miss, the cache loads not just that element but a block of neighbouring elements, because spatial locality predicts the next elements will be needed soon.

Step 3: Subsequent accesses are hits

The loop then reads the next array elements, which are already in the cache from the block load - a run of cache hits, each served at cache speed without touching RAM.

Step 4: Temporal locality on the accumulator

The running total (and the loop variable) are accessed every iteration; temporal locality keeps them in cache too. So after rare misses to load fresh blocks, almost every access is a fast hit, and the sum runs near cache speed.

Examples in context

Example 1. Why arrays beat scattered data. Iterating a contiguous array is fast because spatial locality means each cache block load serves many subsequent elements as hits. Data scattered across memory (as in a poorly laid-out structure) causes frequent misses, which is one reason cache-friendly layouts can dramatically speed up real programs.

Example 2. Loading a program. Starting an application copies it from slow secondary storage (SSD) into faster RAM, and the parts the CPU runs are then cached. Each step up the hierarchy trades capacity for speed, so the working code ends up in the fast levels while the bulk stays in cheaper, larger storage.

Try this

Q1. List the memory hierarchy from fastest to slowest. [2 marks]

Cue. Registers, cache, main memory (RAM), secondary storage.

Q2. Distinguish between temporal and spatial locality of reference. [2 marks]

Cue. Temporal: recently used data is likely used again soon. Spatial: data near a recent access is likely used soon (so caches load blocks).

Q3. What happens on a cache miss? [1 mark]

Cue. The required data is fetched from main memory into the cache (usually as a block) and then delivered to the CPU.

Exam-style practice questions

Practice questions written in the style of SEAB exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

Original6 marks(a) List the levels of the memory hierarchy from fastest to slowest. (b) State the general trade-off between the levels in terms of speed, cost per byte and capacity. (c) Explain the purpose of cache memory.

Show worked answer →

(a) From fastest to slowest: registers, cache (L1, L2, L3), main memory (RAM), secondary storage (SSD/hard disk), then offline/archival storage.

(b) The trade-off: as you move up the hierarchy (toward registers) memory is faster but more expensive per byte and smaller in capacity; as you move down, it is slower but cheaper per byte and far larger. You cannot have all of fast, cheap and large at once, so a hierarchy combines them.

(c) Cache is small, very fast memory between the CPU and main memory. It holds copies of frequently or recently used data and instructions so the CPU can get them quickly, avoiding the slower trip to RAM. This reduces average access time and helps relieve the von Neumann bottleneck.

Markers reward the ordered levels, the speed-up versus cost-and-capacity trade-off, and cache as fast memory holding frequently used data to cut average access time.

Original5 marks(a) Define a cache hit and a cache miss. (b) Explain how locality of reference makes caching effective, distinguishing temporal and spatial locality. (c) What happens on a cache miss?

Show worked answer →

(a) A cache hit is when the data the CPU needs is already in the cache, so it is retrieved quickly. A cache miss is when it is not in the cache, so it must be fetched from the slower main memory.

(b) Locality of reference is the tendency of programs to access memory in predictable patterns, which caching exploits:

Temporal locality - recently accessed data is likely to be accessed again soon (for example, a loop variable), so keeping it in cache pays off.
Spatial locality - data near a recently accessed location is likely to be accessed soon (for example, the next array element), so caches load a block of nearby data, not just one item.

(c) On a cache miss, the required data is fetched from main memory into the cache (usually as a block, exploiting spatial locality), and then delivered to the CPU; a future access to the same or nearby data is then a hit.

Markers reward the hit/miss definitions, temporal and spatial locality correctly distinguished, and a miss fetching a block from RAM into the cache.