The LEON5 is a synthesizable VHDL model of a 32-bit processor compliant with the SPARC V8 architecture. The model is highly configurable and suitable for system-on-chip (SOC) designs.
LEON5 provides backward compatibility for most software implementations that have targeted LEON3 and LEON4 processors. The LEON5 primarily targets high-end FPGA:s and deep-submicron ASIC technologies. For legacy and less performant technologies, the LEON3 processor is the recommended choice that continues to be maintained.
The processor pipeline design of the LEON5 is significantly enhanced compared to earlier LEON3 and LEON4 processors, and initial evaluations show that LEON5 can provide up to 85% faster execution for single-threaded integer benchmarks compared to LEON4. The main new feature of the LEON5 pipeline is the dual-issue functionality, allowing up to two instructions per cycle to be executed in parallel in the processor. To support the increased issue rate of the pipeline, the LEON5 has advanced branch prediction capabilities. The cache controller of the LEON5 supports a store buffer FIFO with one cycle per store sustained throughput, wide AHB slave support to enable fast stores and fast cache refill, as well as several other enhancements.
The LEON5 is interfaced using the AMBA 2.0 AHB bus (subsystem with Level-2 cache and AXI4 backend is also available) and supports the IP core plug&play method provided in the Cobham Gaisler IP library (GRLIB). The processor can be efficiently implemented on FPGA and ASIC technologies and uses standard synchronous memory cells for caches and register file. The processor supports the MUL and DIV instructions, an IEEE-754 floating-point unit (FPU) and Memory Management Unit (MMU). The cache system consists of separate I/D multi-set Level-1 (L1) caches with up to 4 ways per cache, and an optional Level-2 (L2) cache for increased performance in data intensive applications.
- SPARC V8 instruction set with V8e extensions and compare-and-swap
- Advanced 7-stage dual-issue pipeline
- Complex dynamic branch predictor and small branch target buffer
- Addition of Late ALU to decrease pipeline stalls
- 64-bit single-clock load/store operation
- 64-bit 4-port register file
- Hardware multiply and divide units
- Hardware floating-point support
- Non-pipelined area efficient FPU (NanoFPU) or High-performance, fully pipelined IEEE-754 FPU including hardware support for denormalized numbers (GRFPU5)
- Separate instruction and data L1 cache (Harvard architecture) with snooping
- Optional L2 cache: 256-bit internal, 1-4 ways, 16 Kbyte - 8 Mbyte
- SPARC Reference MMU (SRMMU) with TLB
- AMBA-2.0 AHB bus interface, 32-, 64- or 128-bit wide
- Subsystem including processor and Level-2 cache with AXI4 backend also available
- Advanced on-chip debug support with instruction and data trace buffer, and performance counter
- Symmetric Multi-processor support (SMP)
- Power-down mode and clock gating
- Robust and fully synchronous single-edge clock design
- Large range of software tools: compilers, kernels, simulators and debug monitors
- High performance: 3.12 DMIPS/MHz (-O3 and all files are combined), 4.52 CoreMark/MHz (-funroll-all-loops -finline-functions -finline-limit=1000)