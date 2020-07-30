The DesignWare® ARC® Fast Floating Point Unit (FPU) adds performance efficient half-precision, single-precision, and double-precision hardware acceleration for floating point math instructions to all ARC HS4x processors, including ARC HS44, HS45D, HS46, HS47D, and HS48, as well as dual-core and quad-core versions. The FPU accelerates high-precision computation on data sets with a large dynamic range. Table 1 on page 2 lists names and functions of single-precision floating point instructions.



When used with the ARC MetaWare C/C++ Compiler, the ARC HS Fast FPU complies with the IEEE-754-2008 Standard for binary floating point arithmetic. The ARC HS4x processors combined with the FPU provide an ideal solution for system-on-chips (SoCs) that perform complex computations or control algorithms, especially where power and area budgets are constrained.



Small Die Area and Power´



The Fast FPU is implemented as an integral part of the dual-issue ARC HS4x pipeline with full dependency checking and operand bypass capabilities. In contrast to the large floating point coprocessors required by competitive cores, the ARC HS Fast FPU instructions are integrated into the ARC HS4x processor pipeline. This unique approach achieves comparable floating point performance to a coprocessor, but with significantly smaller die area and power consumption. The FPU is implemented so that it will be gated off when not active, further lowering power consumption.



The DesignWare ARC C/C++ Compiler math library optimizes performance and takes full advantage of the ARC FPU instructions to accelerate transcendental and other functions specified in IEEE 754-2008

Features

The FPU is designed to match the clock frequency of any ARC HS4x processor so it can be used for low-power and high-performance applications. The FPU has direct access to core registers and all its source-destination dependencies are inter-mixed into the global dependency logic. This high-level of integration results in a more efficient implementation and eliminates unnecessary stall cycles when mixing integer and floating-point instructions.

The HS4x Fast FPU instructions are divided into two groups:

The first group includes all variants of floating-point multiply, add and convert operations. These execute in a pipelined mode and can sustain a throughput of one clock per instruction

The second group includes all floating-point divide and square-root (sqrt) instructions.

The implementation is based on a radix-4 integer divide hardware with additional logic to handle sub-normals (fractional significand) and rounding. The hardware for the floating-point divide and sqrt is not pipelined and therefore it can only handle one instruction at a time, like the integer DIV/REM instructions The Fast FPU supports conversion between signed or unsigned 32-bit integer types and floating point formats, as well as optional support for floating point “fused” multiply-add/multiply-subtract operations.

These operations perform the multiply without rounding and apply rounding after the subsequent add/sub operation. Optional divide and square root operations are also available. All the operations are designed to support the high-performance speeds of the ARC HS4x processors while minimizing the additional gate count and power consumption required for the FPU functionality.

The ARC Fast FPU for HS4x supports four rounding modes: round to nearest even, round toward positive, round toward negative and round toward zero. Unit level clock gating automatically disables the FPU functionality when it is not in use

Benefits