NeuralCAx16 combines 256 multipliers with crafted data path design. It is flexible for any size of filter and channel configurations. Using the latest technology, NeuralCAx16 can run over 1 GHz and outperform many GPU/CPU CNN solutions.
A typical application of NeuralCAx16 is presented in Figure 1. The host processor can be a RISC engine, a microprocessor, or any in-house developed ASIC core. Its functions are: handling the memory data, performing other layers of operations , and interfacing to the real word.
- Single clock synchronous design
- Straightforward interfaces for SoC applications
- 8/16 bits fixed point and 16 bit floating point operations
- Fully synthesizable technology independent of verilog RTL code
- High performance
- Programmable for different channel and filter sizes
- Convolutional layer typically consumes more than 95% of computation power while CNN is in operation. If such burden is offloaded, a general processor, such as a RISC, can handle the remaining operations. NeuralCAx16 enables chip developers to build their own AI/Deep leaning processors or specified ASIC chips in a matter of months.
- Synthesizable verilog source code
- Verilog testbench
- CNN, convolution neural network, GPU, AI, Image recognition , computer vision, video recognition, accelerator, convolution, deep learning, machine learning, image classification, image detection, image localization, IoT,
Block Diagram of the Convolutional Accelerator for Convolutional Neural Networks (CNN)