The Omnitek DPU (Deep Learning Processing Unit) is a configurable IP core built from a suite of FPGA IP comprising the key components needed to construct inference engines suitable for running Deep Neural Networks (DNNs) used for a wide range of Machine Learning applications, plus an SDK supporting the development of applications which integrate the DPU functionality. These can be targeted for a range of devices including small FPGAs with an embedded processor control for edge devices, or a PCI Express card with a large FPGA for data centre applications.
The Omnitek DPU can be programmed by creating a model of a chosen neural network in C/C++ or Python using standard frameworks such as TensorFlow. The SDK provides an API to enable the DPU inference to be integrated into a user application. The DPU SDK Compiler converts the model into microcode for execution by the Omnitek DPU. A quantizer optimally converts the weights and biases into the selected reduced precision processing format.
Implementation in today’s high-performance FPGAs makes the Omnitek DPU not only fast but also highly adaptable. The architecture Omnitek has developed for its DPU ensures world-class performance across different neural network topologies (including CNNs, RNNs/LSTMs and MLPs) by adapting the FPGA design to optimise for a given workload using a range of novel architecture features, making optimal use of the FPGA’s resources and running at the highest possible speed.
For evaluation purposes, Omnitek provides an example of GoogLeNet together with a C Model that works with the same microcode. This performs inference on 224x224 images using 8-bit integer processing at over 5,300 inferences per second when running in a Xilinx UltraScale+ VU9P-3 PCI Express card with a demonstration application run on a PC under Linux.
- Faster than any alternative DNN running on an equivalent FPGA. Out-performs GPUs for a given power or cost budget.
- Fully software programmable in C/C++ or Python via standard frameworks such as TensorFlow
- Highly efficient FPGA use for optimum performance, cost and power
- Highly flexible:
- Able to optimise architecture for the application workload
- Able to adopt novel topologies and optimisation techniques as they emerge from industry and academia
- Suitable for either Data Centre (FPGA) or Embedded (FPGA SoC) applications
- Autonomous driving
- Object detection
- Smart security cameras
- Language translation
- Upscaling video to 8K
- Medical image analysis
- User interaction in virtual reality
- Big data statistical analysis
Block Diagram of the High-Performance FPGA-based Engine for Deep Neural Networks