Cache

Caching is perhaps the most important example of the big idea of prediction, see 8 Great Ideas in Computer Architecture.

Cache (L1 and L2 Cache) is built using SRAM.

Cache represents the level of the memory hierarchy between the processor and main memory. It to any storage managed to take advantage of locality of access.

It relies on the Principle of Locality to try to find the desired data in the higher levels of the memory hierarchy, and provides mechanisms to ensure that when the prediction is wrong it finds and uses the proper data from the lower levels of the Memory Hierarchy.

  • The hit rates of the cache prediction on modern computers are often above 95%

We use Caching to speed up reading from memory. However, there are two questions:

  • How do we know if a data item is in the cache?
  • If it is, how can we find it?

For Distributed Systems, look at Bus Snooping.

Cache Structures

There are a few ways to do this, each of these is a particular cache structure:

  1. Direct Mapped Cache
  2. Fully Associative Cache
  3. Set Associative Cache

ECE222 mostly emphasized on 1 and 3.

Read Access

a referenced address is divided into

  • A tag field, which is used to compare with the value of the tag field of the cache
  • A cache index, which is used to select the block

Layers of Cache

Handling Cache Misses

I still don’t understand this

Sometimes, we request data from the cache, but the data is not in the cache. This is a cache miss.

The cache miss handling is done in collaboration with the processor control unit and with a separate controller that initiates the memory access and refills the cache.

  • The processing of a cache miss creates a pipeline stall (Chapter 4) in contrast to an exception or interrupt, which would require saving the state of all registers.
  • For a cache miss, we can stall the entire processor, essentially freezing the contents of the temporary and programmer-visible registers, while we wait for memory.

how instruction misses are handled:

  • If an instruction access results in a miss, then the content of the Instruction register is invalid. To get the proper instruction into the cache, we must be able to tell the lower level in the memory hierarchy to perform a read. Since the program counter is incremented in the first clock cycle of execution, the address of the instruction that generates an instruction cache miss is equal to the value of the program counter minus 4. Once we have the address, we need to instruct the main memory to perform a read. We wait for the memory to respond (since the access will take multiple clock cycles).

Steps:

  1. Send the original PC value to the memory.
  2. Instruct main memory to perform a read and wait for the memory to complete its access.
  3. Write the cache entry, putting the data from memory in the data portion of the entry, writing the upper bits of the address (from the ALU) into the tag field, and turning the valid bit on if it was not on already.
  4. Restart the instruction execution at the first step, which will refetch the instruction, this time finding it in the cache.

Handling Writes

This seems closely related to the idea of having Cache Coherency.

Basically, if you write the the cache, you also want to make sure that value is written to the memory, else cache and memory would have different values.

  • Writing to both cache and lower level of Memory Hierarchy is a scheme called write-through

The problem is that just using a write-through scheme provides very slow performance. A solution is to use a Write Buffer, which stores data while waiting to be written to memory.

There is also the write-back scheme, where new data is written only to the block in the cache, not to memory. This provides performance increase, but is more complex to implement.

Personal Thoughts

Cache: a safe place for hiding or storing things. Caching can be a relatively interesting problem. I worked on using caching with Angular so we can 20x our load speed, and not having to read from the database every time.

At HackWestern, I wanted to load the image, but since this image was being cached, when I overwrote that file, it was not being detected.