Why do computers use several kinds of memory, and how does a cache speed things up?
Describe the memory hierarchy from registers to secondary storage, and explain how caching exploits locality of reference
A focused answer to the H2 Computing outcome on the memory hierarchy. Registers, cache, main memory (RAM) and secondary storage, the trade-off between speed, cost and capacity, and how caching uses locality of reference.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
SEAB wants you to describe the memory hierarchy from registers down to secondary storage, and explain how caching exploits locality of reference. The central idea is that no single memory can be fast, cheap and large at once, so computers layer several kinds and use a small fast cache - relying on the predictable way programs access memory - to get close to fast-memory speed at large-memory cost.
The answer
The memory hierarchy
Computer memory is arranged in levels, fastest at the top:
- Registers - inside the CPU, the fastest, but only a handful.
- Cache (L1, L2, L3) - small, very fast memory close to the CPU.
- Main memory (RAM) - larger, slower, volatile working memory.
- Secondary storage (SSD, hard disk) - very large, much slower, non-volatile.
Below these sit offline/archival storage. The CPU works with registers and cache at full speed and reaches down the hierarchy only as needed.
The trade-off
The levels trade three things against each other:
- Moving up: faster, but more expensive per byte and smaller.
- Moving down: slower, but cheaper per byte and far larger.
You cannot have fast, cheap and large simultaneously, so the hierarchy combines a little fast memory with a lot of slow memory to approximate "fast and large" at acceptable cost.
Cache, hits and misses
Cache is small fast memory between the CPU and RAM, holding copies of recently or frequently used data and instructions:
- A cache hit - the needed data is in the cache, retrieved quickly.
- A cache miss - it is not, so it is fetched from slower main memory (and copied into the cache for next time).
A high hit rate keeps the average access time close to cache speed.
Locality of reference
Caching works because programs access memory predictably - locality of reference:
- Temporal locality - recently used data is likely to be used again soon (a loop counter, a frequently called function). Keep it in cache.
- Spatial locality - data near a recent access is likely to be used soon (the next array element). So a cache loads a whole block of nearby data on a miss, not just one item.
These patterns are why a small cache can serve the great majority of accesses.
Examples in context
Example 1. Why arrays beat scattered data. Iterating a contiguous array is fast because spatial locality means each cache block load serves many subsequent elements as hits. Data scattered across memory (as in a poorly laid-out structure) causes frequent misses, which is one reason cache-friendly layouts can dramatically speed up real programs.
Example 2. Loading a program. Starting an application copies it from slow secondary storage (SSD) into faster RAM, and the parts the CPU runs are then cached. Each step up the hierarchy trades capacity for speed, so the working code ends up in the fast levels while the bulk stays in cheaper, larger storage.
Try this
Q1. List the memory hierarchy from fastest to slowest. [2 marks]
- Cue. Registers, cache, main memory (RAM), secondary storage.
Q2. Distinguish between temporal and spatial locality of reference. [2 marks]
- Cue. Temporal: recently used data is likely used again soon. Spatial: data near a recent access is likely used soon (so caches load blocks).
Q3. What happens on a cache miss? [1 mark]
- Cue. The required data is fetched from main memory into the cache (usually as a block) and then delivered to the CPU.
Exam-style practice questions
Practice questions written in the style of SEAB exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
Original6 marks(a) List the levels of the memory hierarchy from fastest to slowest. (b) State the general trade-off between the levels in terms of speed, cost per byte and capacity. (c) Explain the purpose of cache memory.Show worked answer →
(a) From fastest to slowest: registers, cache (L1, L2, L3), main memory (RAM), secondary storage (SSD/hard disk), then offline/archival storage.
(b) The trade-off: as you move up the hierarchy (toward registers) memory is faster but more expensive per byte and smaller in capacity; as you move down, it is slower but cheaper per byte and far larger. You cannot have all of fast, cheap and large at once, so a hierarchy combines them.
(c) Cache is small, very fast memory between the CPU and main memory. It holds copies of frequently or recently used data and instructions so the CPU can get them quickly, avoiding the slower trip to RAM. This reduces average access time and helps relieve the von Neumann bottleneck.
Markers reward the ordered levels, the speed-up versus cost-and-capacity trade-off, and cache as fast memory holding frequently used data to cut average access time.
Original5 marks(a) Define a cache hit and a cache miss. (b) Explain how locality of reference makes caching effective, distinguishing temporal and spatial locality. (c) What happens on a cache miss?Show worked answer →
(a) A cache hit is when the data the CPU needs is already in the cache, so it is retrieved quickly. A cache miss is when it is not in the cache, so it must be fetched from the slower main memory.
(b) Locality of reference is the tendency of programs to access memory in predictable patterns, which caching exploits:
- Temporal locality - recently accessed data is likely to be accessed again soon (for example, a loop variable), so keeping it in cache pays off.
- Spatial locality - data near a recently accessed location is likely to be accessed soon (for example, the next array element), so caches load a block of nearby data, not just one item.
(c) On a cache miss, the required data is fetched from main memory into the cache (usually as a block, exploiting spatial locality), and then delivered to the CPU; a future access to the same or nearby data is then a hit.
Markers reward the hit/miss definitions, temporal and spatial locality correctly distinguished, and a miss fetching a block from RAM into the cache.
Related dot points
- Describe the fetch-decode-execute cycle, the registers involved, and how the program counter sequences instructions
A focused answer to the H2 Computing outcome on the fetch-execute cycle. The fetch, decode and execute stages, the program counter, memory address and data registers, the instruction register, and how branching changes the flow.
- Describe the components of the CPU - control unit, ALU, registers - and the buses connecting it to memory in the von Neumann architecture
A focused answer to the H2 Computing outcome on CPU components. The control unit, arithmetic logic unit and registers, the address, data and control buses, and the stored-program von Neumann architecture.
- Explain interrupts and interrupt handling, contrast them with polling, and describe how the CPU services and resumes after an interrupt
A focused answer to the H2 Computing outcome on interrupts. What an interrupt is, the interrupt service routine, saving and restoring state, interrupt priorities, and the contrast with polling for input and output.
- Construct and interpret logic gates and truth tables, write Boolean expressions, and simplify them using Boolean algebra laws
A focused answer to the H2 Computing outcome on digital logic. The AND, OR, NOT, NAND, NOR and XOR gates, building and reading truth tables, writing Boolean expressions, and simplifying them with Boolean algebra laws and De Morgan.
- Describe arrays and records as data structures, explain constant-time array indexing and contrast static with dynamic arrays
A focused answer to the H2 Computing outcome on arrays and records. Contiguous storage and constant-time indexing, one and two dimensional arrays, records as fields of mixed type, and static versus dynamic arrays.