Industry Expert Blogs
Memory Systems for AI: Part 4Rambus Blog - Steven Woo, Rambus
Mar. 12, 2020
In part three of this series, we discussed how a Roofline model can help system designers better understand if the performance of applications running on specific processors is limited more by compute resources, or by memory bandwidth. Rooflines are particularly useful when analyzing machine learning applications like neural networks running on artificial intelligence (AI) processors. In this blog post, we’ll be taking a closer look at a Roofline model that illustrates how AI applications perform on Google’s tensor processing unit (TPU), NVIDIA’s K80 GPU and Intel’s Haswell CPU.
The graph above is featured in a paper published by Google a couple of years ago detailing the first-generation Tensor Processing Unit (TPU). It’s a very insightful paper, because it compares the performance of Google’s TPU against two other processors. You can see three different Rooflines in the graph above: one in red, one in gold and one in blue. The blue Roofline represents the Google TPU, a special purpose-built piece of silicon that was specifically designed for AI inferencing. The NVIDIA K80 – a GPU designed to handle a larger class of operations – is in red. Represented in gold is the Roofline for the Intel Haswell CPU, a very general-purpose processor.