You hear a lot about cache coherency these days. In fact, at the recent Linley processor conference, no fewer than three companies announced new cache-coherent networks-on-chip (NoCs).
The first cache I ever ran into was on a computer at Cambridge University called Titan. It had a 32-word instruction cache, indexed off the lower five bits of the PC. It was a normal direct-mapped cache. If the higher order bits (above 5) of the PC matched the cache address, then instead of fetching the instruction from memory it was pulled from the cache. Of course, this was much faster, that is the point of caches. If the higher order bits didn't match, a cache-miss, the instruction was fetched from memory and also the cache was updated. These days, when three-level caches are common, and cache sizes can be measured in megabytes, this seems almost comically small. Would such a tiny cache make any difference? It turns out, when you think about it, that the architecture of the cache means that any loop of less than 32 instructions will run out of the cache. Since processors spend a lot of time in small loops, especially if they lack instructions for clearing or copying areas of memory, this made a big difference.
Another key thing to note is that the programmers don't have to do anything. If the cache is turned on, then the code will run unchanged, just faster. It is invisible to the programmers. The hardware designers worry about the cache, but they give the illusion to the software engineers that it doesn't exist.
Click here to read more ...