By Mahmut Kandemir and Nikil DuttEmbedded.com (01/06/09, 02:17:00 PM EST)
To follow on the review and assessment of various memory architectures in Part 1 in this series, we will now survey some research efforts that address the exploration space involving on-chip memories. A number of distinct memory architectures could be devised to exploit different application-specific memory access patterns efficiently.
Even if we restrict the scope of the architecture to those involving on-chip memory only, the exploration space of different possible configurations is too large, making it infeasible to simulate exhaustively the performance and energy characteristics of the application for each configuration. Thus, exploration tools are necessary for rapidly evaluating the impact of several candidate architectures. Such tools can be of great utility to a system designer by giving fast initial feedback on a wide range of memory architectures.Cache
Two of the most important aspects of data caches that can be customized for an application are: (1) the cache line size and (2) the cache size. The customization of cache line size for an application is performed in the study just referemced link above by using an estimation technique for predicting the memory access performance, that is, the total number of processor cycles required for all the memory accesses in the application.
There is a tradeoff in sizing the cache line. If the memory accesses are very regular and consecutive, i.e., exhibit spatial locality, a longer cache line is desirable, since it minimizes the number of off-chip accesses and exploits the locality by prefetching elements that will be needed in the immediate future.
On the other hand, if the memory accesses are irregular, or have large strides, a shorter cache line is desirable, as this reduces off-chip memory traffic by not bringing unnecessary data into the cache. The maximum size of a cache line is the DRAM page size.
The estimation technique uses data reuse analysis to predict the total number of cache hits and misses inside loop nests so that spatial locality is incorporated into the estimation. An estimate of the impact of conflict misses is also incorporated. The estimation is carried out for the different candidate line sizes, and the best line size is selected for the cache.
Click here to read more ...