Mismatch between processor and memory speeds leads us to add a new level: The “memory cache”, or cache for short.

Usually on the same chip as the CPU. Faster but more expensive than DRAM memory.

The cache is a copy of a subset of main memory. It copies data that are being used.

Most processors have separate caches for instructions and data.

A cache works on the principles of temporal and spatial locality:

  Temporal locality Spatial locality
Idea If we use it now, chances are that we’ll want to use it again soon. If we use a piece of memory, chances are we’ll use the neighboring pieces soon.
Library Analogy We keep a book on the desk while we check out another book. If we check out a book’s vol. 1 while we’re at it, we’ll also check out vol. 2.
Memory If a memory location is referenced, then it will tend to be referenced again soon. Therefore, keep most recently accessed data items closer to the processor. If a memory location is referenced, the locations with nearby addresses will tend to be referenced soon. Move blocks consisting of contiguous words closer to the processor .

Assume we load memory from the address 0x12F0.

lw t0 0(t1)

Memory access with cache:

  1. Processor issues address 0x12F0 to cache
  2. Cache checks for copy of data, address 0x12F0
  3. If hit, cache reads 1234, sends 1234 to processor.
  4. If miss, cache sends address 0x12F0 to the memory. Memory reads and sends 1234 to cache. Cache replaces some block to store 1234 and its address 0x12F0. Cache sends 1234 to processor
  5. Processor loads 1234 into register t0.

L1 cache is usually directly embedded on the CPU cache. It’s often embedded into two parts,L1i (instruction) and L1d (data). IMEM and DMEM are the two caches on the CPU Datapath.

L2 cache located on integrated circuit, often adjacent to CPU

Hit rate: fraction of access that hit in the cache

Hit time: time (latency) to access cache memory (including tag comparison)

Miss rate: 1 – Hit rate.

Miss penalty: time (latency) to replace a block from lower level in memory hierarchy to cache.

AMAT (Average Memory Access Time)

= 𝐻𝑖𝑡 𝑅𝑎𝑡𝑒 ⋅ 𝐻𝑖𝑡 𝑇𝑖𝑚𝑒 + 𝑀𝑖𝑠𝑠 𝑅𝑎𝑡𝑒 ⋅ (𝐻𝑖𝑡 𝑇𝑖𝑚𝑒 + 𝑀𝑖𝑠𝑠 𝑃𝑒𝑛𝑎𝑙𝑡𝑦)

= 𝐻𝑖𝑡 𝑇𝑖𝑚𝑒 + 𝑀𝑖𝑠𝑠 𝑅𝑎𝑡𝑒 ⋅ (𝑀𝑖𝑠𝑠 𝑃𝑒𝑛𝑎𝑙𝑡𝑦)

For example, if hit time = 1 cycle, miss rate = 5%, miss penalty = 20 cycles, AMAT = 1 + 0.05 * 20=2 cycles

AMAT = L1 Hit time + L1 miss rate * L1 miss penalty = L1 Hit time + L1 miss rate * (L2 Hit time + L2 miss rate * L2 miss penalty)

Ways to reduce miss rate:

  • Larger cache
  • More places in the cache to put each block of memory

Fully Associative Cache : Put a new block anywhere.

For a given memory address: Split into two fields: the tag and the offset.

To check if a block (i.e., cache line) has our data: Check the tag, then get the correct offset.

Cache Temperatures

  • Cold: Cache is empty
  • Warming: Cache filling with values you’ll hopefully be accessing again soon
  • Warm: Cache is doing its job, fair % of hits
  • Hot: Cache is doing very well, high % of hits
  • Cache block/line: A single entry in the cache. The smallest unit of memory that can be transferred between the main memory and the cache
  • Block size / line size: bytes per cache block. Typically cache lines are 64 bytes.
  • Capacity: Total data bytes that can be stored in a cache = # lines * line size
  • Tag: Identifies data stored at a given cache block
  • Valid bit: Indicates if data stored at a cache block is valid

Suppose we have 12b memory address, we make the last two bits be the offset and use other bits to make the tag. If we need to read the memory at address 0x43F, we will search the tag 0100 0011 11 in the cache.

\[\underbrace{0100 \,\, 0011\,\, 11}_\text{tag} \underbrace{11}_\text{offset}\]

If cache miss, we load the 4-byte block from 0x43C to 0x43F and mark valid bit. Then read byte at 0x3 offset and return to processor.