By Geoff Tate, Flex Logix
EEtimes (November 12, 2019)
Using multiple inferencing chips can deliver significant improvements in performance, but only when the neural network is designed correctly
The last two years have been extremely busy in the inferencing chip business. For a while, it seemed like every other week another company introduced a new and better solution. While all this innovation was great, the problem was that most companies didn’t know what to make of the various solutions because they could not tell which one performed better than another. With no set of established benchmarks in this new market, they either had to get up to speed really quickly on inference chips, or they had to believe the performance figures provided by the various vendors.
Most vendors provided some type of performance figure and usually it was whatever benchmark made them look good. Some vendors talked about TOPS and TOPS/Watt without specifying models, batch sizes or process/voltage/temperature conditions. Others used the ResNet-50 benchmark, which is a much simpler model than most people need, so its value in evaluating inference options is questionable.
We’ve come a long way from those early days. Companies have slowly figured out that what really matters when measuring the performance of inference chips is 1) high MAC utilization, 2) low power and 3) you need to keep everything small.
Click here to read more ...