A Vector Unit is composed of several 'vector cores', roughly equivalent to a GPU core, that perform multiple calculations in parallel. Each vector core has arithmetic units capable of performing addition, subtraction, fused multiply-add, division, square root, and logic operations.
Our vector core can be tailored to support different data types: FP64, FP32, FP16, BF16, INT64, INT32, INT16 or INT8, depending on the customer’s target application domain.
The largest data type size in bits defines the vector core width or ELEN. Customers then select the number of vector cores to be implemented within the Vector Unit, either 4, 8, 16 or 32 cores, catering for a very wide range of power-performance-area trade-off options.
Once these choices are made, the total Vector Unit data path width or DLEN is ELEN x number of vector cores. We support DLEN configurations from 128b to 2048b.
Our Vector Unit is equipped with a high-performance, cross-vector-core network that provides all-to-all connectivity between the vector cores at high bandwidth, even for the very large, 32-vector core option.
The cross-vector-core unit is used for specific instructions in the RISC-V standard that shuffle data between the different vector cores, such as vrgather, vslide, etc.
We also offer a second key choice in the Vector Unit: the number of bits of each vector register (known as VLEN) can also be tailored to customer’s needs.
While most other vendors assume that VLEN is equal to DLEN (i.e., 1X ratio), we offer 2X, 4X and 8X ratios. When the VLEN is larger than the DLEN, a vector operation uses multiple cycles to execute. This is a great feature for tolerating large memory latencies and for reducing power.