Implements a multi-core, multi-threaded graphics processing unit (GPU) with extremely high performance and ultra-low power consumption aimed at both graphics rendering/acceleration and general-purpose computing in embedded applications.
The configurable and scalable Nema™ GPU uses an innovative architecture consisting of one or many processing clusters interconnected with a proprietary Network On Chip (NoC). Each cluster can have one to four floating point vector processing cores, and each core is able to run up to 128 threads. The resulting performance is extremely competitive, providing, for example 19.2 GFlops at just 533 MHz with one four-core cluster.
Nema combines this processing power with ultra-low power consumption. Proprietary compression techniques minimize the bandwidth to the frame buffer (access to which is the major power consumer of any GPU) and intelligent Dynamic Voltage Frequency Scaling (DVFS) allows adjusting the power consumption to suit the computation load. Optional custom hardware accelerators for typical graphic processing tasks such as Texture Mapping, Pixel Blending, and Polygon Rasterization further reduce power consumption.
The Nema GPU is easy to program—using an included compiler tool chain—and supporting popular graphics APIs OpenGL® ES, OpenCL™ and OpenVX, and operating systems Android & Linux.
- Greater Performance per Area
- 16 GFlops per mm2
- 44.4 MTriangles/s per mm2
- 555.6 Mpixels/s per mm2
- Ultra-Low Power
- Less silicon area for the same performance equates to lower power consumption
- Proprietary compression for reduced frame buffer bandwidth
- Advanced power management with DVFS support
- Scalable & Configurable
- One to four processing cores organized in clusters
- One to many clusters
- Ultra-threaded design, up to 128 threads per core
- Unified Shader Architecture
- Configurable pipeline length allows higher frequency
- Optional graphic function acceleration units
- Software changes not required as architecture scales to many cores and more threads
- Powerful Graphics Processing
- Texture Mapping Engine
- Any texture resolution and color depth
- Texture Caching
- Point Sample/Bilinear texture filters
- Texture compression
- Mipmaps support
- Polygon Rasterizer
- Triangle rasterization
- Value interpolators
- PixelBlender Processor
- Fully programmable for any blending mode
- Multiple render targets
- sRGB support
- Compiler Support
- Customized LLVM/Clang Compiler
- GNU Binutils Assembler
- GLSL and OpenCL Compiler
- Software Support
- OpenCL 1.2 (Pocl API)
- OpenGL ES 2.0/3.0 (Mesa3D APIs)
- OpenVG available soon
- The core is available in Verilog RTL or as a targeted FPGA netlist. Deliverables include everything required for successful implementation: an extensive testbench, comprehensive documentation, LLVM/Clang compiler, assembler, and low-level device drivers.
- Nema GPU’s performance and extreme scalability make it ideal not just for graphics rendering but also as a GPGPU platform executing data and computational intensive tasks in industrial, medical, scientific, automotive and other applications.
Block Diagram of the Embedded Graphics Processing Unit IP Core