By Kevin McDermott, Imperas Software Ltd
The topic of RISC-V custom instructions is growing in importance. This article explains why this subject is becoming so significant, and outlines some of the previous approaches to processor hardware and software optimization to illustrate the techniques that are now possible with RISC-V.
RISC-V is an open ISA (Instruction Set Architecture) that defines the boundary between hardware of the processor design and the software that it will execute, such as operating systems and application programs. In traditional or closed source ISAs the structure and future roadmap is tightly controlled and guided in a particular direction. At the same time, one common requirement is the priority for backwards compatibility. New hardware can be added, but all previous software must still be supported to ensure legacy features are retained in perpetuity. The combined advances of hardware innovation and software re-use benefits have produced many innovations and successful solutions as processors have replaced dedicated hardware in multiple market areas. However, in some cases, this one-size-fits-all perhaps leaves some options unexplored.
RISC-V was developed as an open ISA to allow much more freedom of implementation around a typical base structure. This modular approach allows better configurability to match the processor hardware to the end application requirements. This offers greater design freedom when creating new features, and an openness to allow many participants to develop or modify a processor. These implementations can be initiated from scratch or a base design from one of the many commercial IP providers or open source repositories can be used as starting point. In addition to the configurable options defined in the ISA, RISC-V also supports custom instructions or extensions that can be added to optimize the performance around particular task, function, or inter-processor communication.
In general terms, the design of an ISA specification is a tradeoff in flexibility for general purpose software to run efficiently on the dedicated hardware resources of the processor. If the supported hardware functional is too limited, then much more work is performed in the software domain, this affects many aspects of the system cost from memory storage to power efficiency. In contrast, if the hardware is too extensive with many rarely used functions implemented, the software becomes complex to efficiently manage all the available resources and the hardware overhead becomes excessively large. This is fundamental to the difference between a CISC (Complex Instruction Set Computer) and RISC (Reduced Instruction Set Computer) architecture approach, in a RISC design such as RISC-V the intention is to minimize the hardware with a limited set of key functionality operational units that can execute very efficiently and gain advantages in performance over the larger and more complex CISC. But like any method of classification, while these ideas appear clear cut, in the wonderful world of processor ISAs a compromise is always going to cause some interesting discussion on the best balance. So, we have seen that many ISAs develop over time to the point that these classifications become somewhat academic.
Give the costs and complexities of developing a processor, and associated software support, the natural approach is to target a more general use case to leverage the widest potential user community. While the hardware starts simple enough, but overtime, the software becomes more complex to solve different applications. So, users have looked at ways to improve performance and efficiency. These options typically include:
- Hardware accelerator
Perhaps the most obvious approach is to build a dedicated hardware block to accelerate the critical function. So, acting as a peripheral to the processor, but this only works in some limited cases. A lack of shared infrastructure of the processor makes this method somewhat isolated.
Any fixed block will also face issues with interfacing to the software. As a monolithic function, it lacks the flexibility to evolve and to adapt as needs change over time.
Depending on the application, it may be possible to partition the system. In many communication systems, it is possible to divide the tasks as a control plane and data plane. This process allows heterogeneous designs with dedicated processors for each style of operation. However, having completed this partitioning, any further improvement in each processor will still need to be addressed. One area in multicore design is the inter-communication infrastructure between processor nodes. Instead of the bus overhead, a custom hardware infrastructure supported with custom instructions could dramatically improve the efficiency of the communication.
- Custom instruction
Extending the ISA allows a close integration between memory and sub-system infrastructure while delivering software flexibility. The art of ISA design is the fine balance between gaining performance advantages and enhanced usability. Many custom architectures have been proposed but achieving the right balance depends of the application software and future development plans. Custom instructions and extensions offer an adequate gain to regulate features with sufficient software flexibility that can still be adapted as the application needs change.
With the many possible architectural choices and options available, an approach that many have adopted to help with the evaluation process is to use a standard benchmark representing some aspects of the software application. In theory, running these same benchmarks across different configurations provides some guide to the relative merit or performance. After an initial evaluation review, this can be a useful first step in the analysis plan. As the final application will have its unique set of constraints, finding the best benchmark to represent all of the key considerations might be a challenge. One the advantages of the custom instruction approach to design optimization is the ability to develop and analyze the software performance on a standard core configuration. Once the software critical loops are identified, a set of custom instructions can be modeled to verify the improvements and fine-tune the hardware and software balance optimized around the actual application rather than a theoretical benchmark.
More details on this software development approach to custom instruction design and analysis can be found in a recent webinar.