Domain-specific accelerators (DSAs) are becoming increasingly common in system-on-chip (SoC) designs. A DSA provides higher performance per watt by optimizing the specialized function it implements. Examples of DSAs include compression/decompression units, random number generators and network packet processors. A DSA is typically connected to the core complex using a standard IO interconnect, such as an AXI bus (Figure 1).
SoCs based on RISC-V offer a unique opportunity to optimize high-bandwidth data transfers between a DSA and memory. DSAs often need to transfer their data to memory, such as DDR, LPDDR or HBM memories. Often this is accomplished using a DMA (Direct Memory Access) engine.
The difficulty in the traditional approach (Figure 1) is that such data transfers often involve allocating the data in the Last-Level Cache first. This can significantly slow down accesses, particularly if the volume of transferred data is greater than the size of the Last-Level Cache.
Click here to read more ...