Why the Memory Subsystem is Critical in Inferencing Chips

By Geoff Tate, Flex Logix
EETimes (December 22, 2019)

Good inferencing chips can move data very quickly.

The number of new inferencing chip companies announced this past year is enough to make your head spin. With so many chips and no lack of any quality benchmarks, the industry often forgets one extremely critical piece: the memory subsystem. The truth is, you can’t have a good inference chip unless you have a good memory subsystem. Thus, if an inferencing chip company is only talking about TOPS and having very little discussion around SRAM, DRAM and the memory subsystem in general, they probably don’t have a very good solution.

It’s All About Data Throughput

Good inferencing chips are architected so that they can move data through them very quickly, which means they have to process that data very fast, and move it in and out of memory very quickly. If you look at models using ResNet-50 and YOLOv3, you will see a striking difference not only in their computational side, but also in how they each use memory.

For each image using ResNet-50, it takes 2 billion multiply accumulates (MACs), but for YOLOv3 it takes over 200 billion MACs. That is a hundred times increase. Part of this is due to the fact that there are more weights for YOLOv3 (62 million weights versus approximately 23 million for ResNet-50.) However, the biggest difference is with the image size in the typical benchmark. ResNet-50 uses 224×224 which is the size no one actually uses and YOLOv3 uses 2 megapixels. Thus, the computational load is much greater on YOLOv3.

Using the example above, you can see that we have two different workloads and one takes 100 times more. The obvious question is: does this mean YOLOv3 runs 100 times slower? The only way you can answer that is by looking at the memory subsystem because that is going to tell you the actual throughput on any given chip.

Click here to read more ...

Industry Articles

Why the Memory Subsystem is Critical in Inferencing Chips