Gen4 architecture delivers 300% faster performance for AI/ML applications while drawing 50% less power and consuming 65% less die area compared to the previous Speedcore eFPGA generation.
The new Speedcore eFPGA IP with Gen4 architecture dramatically improves upon Achronix’s original and highly successful Speedcore offering. Optimized for high-performance AI/ML and hardware-acceleration applications, Speedcore IP with Gen4 architecture delivers 60% faster performance (300% faster for AI/ML applications) while drawing 50% less power and consuming 65% less die area compared to the previous Speedcore eFPGA generation.
High-Performance FPGA Technology
AI/ML applications place heavy processing demands on systems, requiring billions or trillions of operations per second. Meanwhile, cloud and enterprise data-center computing resources and communications infrastructure can no longer keep pace with explosive, exponential growth in data bandwidth requirements, rapidly changing security protocols, or emerging networking standards. Today’s multi-core CPUs and SoCs cannot meet these needs unaided. Programmable hardware accelerators are now required to increase system performance by offloading these computations from overburdened server CPUs. Achronix specifically architected the Gen4 architecture to address the needs of these applications.
Reconfigurable Logic Blocks (RLBs)
Logic – 6-input look-up-tables (LUTs) that implement all functions with as many as 7-inputs and some 8-input functions in a single level of logic. Reducing the need for multiple logic levels improves performance.
Shift chain – Double the number of registers compared to the original Speedcore architecture plus optimized routing for shift chains.
ALU – A larger ALU now supports 8-bit operations for addition, counting, comparison, and maximum functions.
LUT-based multiplication – Efficient, LUT-based multipliers require half the on-chip resources compared to other leading FPGA products: A 6 × 6 multiply requires only 11 LUTs and runs at 1 GHz. An 8 × 8 multiply requires only 18 LUTs and runs at 500 MHz.
Dedicated buses – A first in the FPGA industry! High-performance, bus-grouped routing channels, separate from the standard eFPGA routing channels, ensure that there is no congestion between bus-oriented data traffic — common with memories — and other types of data traffic routed over the eFPGA’s standard, bit-oriented channels.
Bus muxes – Another first in the FPGA industry; bus muxes allow users to efficiently create bus mux functions without consuming any LUTs or standard routing. This capability effectively creates a giant, distributed, run-time-configurable switching network that is separate from the eFPGA’s bit-oriented routing network.
Configurable multiply precision and count
Trade off performance/power vs. precision – Increasing multiplier count for lower precision functions.
Cyclical register file
Double compute performance – Similar to a cache function in that data is saved for efficient reuse by the MLP. Optimized for AI/ML functions.
Column bonding and MLP cascade paths
Higher performance – Hard paths between memory and other MLP blocks enable high-performance functionality while freeing up general-purpose routing.
Multiple number formats
Flexibility – Supports mainstream fixed- and floating-point formats and frameworks.
Rounding and saturation
System performance – Support for multiple rounding formats and saturation that would otherwise need to be implemented in LUTs.
Speedcore eFPGA IP with Gen4 architecture is available immediately, supported by the latest version of Achronix’s ACE design tool. This tool includes preconfigured example instances of Speedcore eFPGAs with Gen4 architecture. Users can evaluate performance, resource usage, and compile times for Speedcore eFPGA IP with Gen4 architecture using these example instances, even before developing their own designs.