Budapest, Hungary, 14th December 2022 – aiMotive, one of the world’s leading suppliers of scalable modular automated driving technologies, today announced the latest release of its award-winning aiWare automotive NPU hardware IP. aiWare4+ builds on the success of aiWare4 in production automotive SoCs, such as Nextchip’s Apache5 and Apach6, by refining the hardware architecture and significantly upgrading the software SDK. Together these enable higher efficiency execution of a far broader range of workloads, such as transformer networks and other emerging AI network topologies. Support for FP8, as well as INT8 computation and dedicated sparsity hardware support, are also included in aiWare4+.
The unique “data-first” scalable hardware architecture combines concepts such as near-memory execution, massively parallel on-chip I/O, hierarchical hardware tiling, and wavefront processing to deliver the highest possible PPA.
Upgraded capabilities for aiWare4+ include:
- Upgraded Programmability: significant enhancements to the aiWare hardware architecture and SDK portfolio of tools enable users to gain full access to every part of aiWare’s internal execution pipeline without compromising the high-level AI-centric approach that makes tools such as the highly interactive aiWare Studio so popular with both research and production engineers
- Full FP8 Support: with aiWare4+, full support has been added for FP8 in addition to INT8 quantization for workload execution
- Broader Network Support: SDK upgrades enable users to deliver higher performance for not only CNNs but the latest emerging industry trends, such as transformer networks, occupancy networks and LSTMs. aiWare4+ users will also benefit from hardware enhancements delivering significant performance and efficiency boosts for workloads such as transformer networks
- Enhanced Sparsity Support: aiWare4+ hardware upgrades mean any weight sparsity results in minimized NPU power consumption on a per-clock basis, ensuring optimized power consumption for the widest possible range of workloads
- Improved Scalability: aiWare4+ is designed to scale from 10 TOPS up to 1000+ TOPS using a multi-core architecture to increase throughput while retaining high efficiency (subject to external memory bandwidth constraints). Furthermore, aiWare4+ brings interleaved multi-tasking that optimizes performance and efficiency with multiple workloads.
aiMotive team of AI researchers constantly track the latest developments in the automotive AI industry and relentlessly benchmark and compare our methodologies to the best in the industry. aiWare4+ continues to deliver the automotive industry’s highest NPU efficiency of up to 98% for a wide range of AI workloads, enabling superior performance using less silicon and less power.
“When we delivered aiWare4, we knew our highly customized hardware architecture enabled us to deliver superior efficiency and PPA compared to any other automotive inference NPU on the market,” says Mustafa Ali, product director, aiWare for aiMotive. “However, while acknowledging our CNN efficiency leadership, some of our customers were concerned about aiWare’s programmability compared to more conventional architectures such as DSP- or GPU-based NPUs. These latest aiWare4+ and aiWare SDK upgrades ensure that our customers can program aiWare for a broad range of AI workloads, achieving future-proof flexibility comparable to some of the best-known SoCs and DSP-based NPUs, without sacrificing our industry-leading NPU efficiency”.
aiMotive will be shipping aiWare4+ RTL to lead customers starting Q2 2023 while the SDK provides early support for the majority of the new features today, with the availability of production quality release in 2023.
Note 1: PPA: Power, Performance and Area
Note 2: See aiWare3 benchmarks on Nextchip’s Apache5 SoC
For more details about aiWare4+, click here.