By Mahmut Kandemir and Nikil DuttEmbedded.com (01/07/09, 01:41:00 AM EST)
An optimizing compiler that targets MPSoC environments should tackle a number of critical issues. Building on what was learned in Part 1 and Part 2, we first explain these issues and then study potential solutions. From the performance viewpoint, perhaps the two most important memory-related tasks to be performed in an MPSoC environment are optimizing parallelism and locality. Other important issues relate to power/energy consumption and memory space.The problem with parallelism
Optimizing parallelism is obviously important, since parallelism is the main reason to employ multiple processors in a single unit. In fact, a parallelization strategy determines how memory is utilized by multiple on-chip processors and can be an important factor for achieving an acceptable performance. However, maximum parallelism may not always be easy to achieve because of several factors. For example, intrinsic data dependences in the code may not allow full utilization of all on-chip processors. Similarly, in some cases, interprocessor communication costs can be overwhelming as one increases the number of processors used.
Finally, performance benefits due to increased interprocessor parallelism may not be sufficient when one considers the increase in power consumption. Because of all these, it may be preferable to avoid increasing the number of processors arbitrarily. In addition, the possibility of different parts of the same application demanding different number of processors can make the problem much harder.Instruction and Data Locality
An equally important problem is ensuring locality of data/instruction accesses. Although achieving acceptable instruction cache performance is not very difficult (since instructions are read-only and exhibit perfect spatial locality), the same cannot be said for data locality.
This is because straightforward coding of many applications can lead to poor data cache utilization. In addition, in an MPSoC environment, interprocessor communication can lead to frequent cache line invalidations/updates (due to interprocessor data sharing), which in turn increases overall latency.
This last issue becomes particularly problematic when false sharing occurs (i.e., the multiple processors share a cache line but not the same data in it). Therefore, an important task for the compiler is to minimize false sharing as much as possible.
Click here to read more ...