Included in an article on FPGA benchmarking in the September 2011 edition of InsideDSP, BDTI wrote:
In design situations where optimum performance and/or power consumption is required, implementing digital signal processing functions in dedicated hardware versus software becomes an attractive proposition. A FPGA is a particularly compelling silicon platform for realizing this aspiration, because it conceptually combines the inherent hardware attributes of an ASIC with the flexibility and time-to-market advantages of the software alternative running on a CPU, GPU or DSP. As such, FPGAs are increasingly finding use as parallel processing engines for demanding digital signal processing applications. Benchmark results show that on highly parallelizable workloads, FPGAs can achieve very strong performance and performance/cost metrics compared to DSPs and CPUs.
However, it continued:
To date, FPGAs have been used almost exclusively for fixed-point digital signal processing functions. Although FPGA vendors have long offered floating-point primitive libraries, the performance of FPGAs in floating-point applications has historically been very limited. The inefficiency of traditional floating-point FPGA designs is partially due to the deeply pipelined nature and wide arithmetic structures of the floating-point operators, which create large data path latencies and routing congestion. In turn, the latencies can create hard-to-manage problems in designs with high data dependencies. The final result is often a design with a low operating frequency.
Specialized FPGA toolsets such as Altera's DSP Builder, which BDTI evaluated in advance of publication of the 2011 article (white paper PDF here) and again for a March 2013 follow-up writeup (white paper PDFs here and here), strive to efficiently implement common floating-point DSP structures. But while Altera and its competitors have long included dedicated-function fixed-point DSP acceleration blocks in their FPGAs, floating-point operations have to date required the extensive use of generic programmable logic blocks to supplement the capabilities fixed-point acceleration blocks. The result, as Altera's recently published artwork acknowledges, is a sub-optimal implementation both in terms of performance and silicon area efficiency.
Click here to read more ...