Ron Wilson, Altera
March 24, 2015
The scenario is becoming increasingly familiar. You have a working embedded design, perhaps backed by years of deployment with customers and hundreds of thousands of lines of debugged code. Along comes marketing with a new set of performance specifications, or R/D with a new computer-crushing algorithm. Your existing CPU family just can’t handle it.
At this point your options can look uniformly dismal. Perhaps you can move to a higher-performance CPU family without totally losing instruction-set compatibility. But there will almost certainly be enough differences to require a new operating-system (OS) version and re-verification of the code. And the new CPU will have new hardware implications, from increased power consumption to different DRAM interfaces.
Of course you can also move to a faster CPU with a different instruction set. But a new tool chain, from compiler to debug, plus the task of finding all those hidden instruction-set dependencies in your legacy code, can make this move genuinely frightening. And changing SoC vendors will have system-level hardware implications too.
Or you could try a different approach: you could identify the performance hot spots in your code—or in that new algorithm—and break them into multiple threads that can be executed in parallel. Then you could execute this new code in a multicore CPU cluster. Unless you are currently using something quite strange, there is a good chance that there is a chip out there that has multiple instances of the CPU core you are using now. Or, if there is inherent parallelism in your data, you could rewrite those threads to run on a graphics processor (GPU), a multicore digital signal processing (DSP) chip, or a purpose-built hardware accelerator in an FPGA or ASIC. All these choices require new code—but only for the specific segments you are accelerating. The vast majority of that legacy code can remain unexamined.
Click here to read more ...