November 16, 2005
Rapid architectural exploration and rapid prototyping are often talked about but rarely achieved. When your design doesn't meet device or interface timing, and/or area or latency requirements, you'd like to be able to quickly make significant changes to either the architecture or micro-architecture so as to address these issues effectively. This article highlights a new approach that gives designers the tools to tackle these changes quickly and safely at the architectural level while staying close to the hardware implementation.
Do humans swim faster or slower in syrup?
The 2005 Ig Nobel Prize in chemistry went to a university team for their study "Will Humans Swim Faster or Slower in Syrup?" While (surprisingly) it turns out there is no difference for swimming, chip design often feels like working in thick syrup – it's a high-viscosity endeavor that impedes quick movement. Once you've done the bulk of your design, it is very difficult to appreciably change your current implementation without taking a significant hit on schedule and resources.
With RTL, architecture and micro-architecture changes are both error-prone and manually intensive to make. Thus, when coding in RTL using Verilog or VHDL, designers are accustomed to making fine-grained, controlled changes to the RTL, with reasonably predictable consequences, so that the design's attributes improve continuously until the targets are met. Of course, changes must not compromise the correctness of the design, which is now usually re-established through extensive verification. This means that designers typically attempt to make changes that are as localized and small as possible without breaking the design.
Having said this, often it is only when your first implementation is complete that you understand whether you are meeting device speeds or timing specifications, and also whether you fit within an area budget. Based on what you've learned in this initial implementation, what you would like to do is explore alternative architectures and micro-architectures; that is, to perform an "intelligent annealing" of the design to meet your targeted latencies, speed, or area. But, with RTL, you are stuck in syrup.
You need an approach that enables you to implement fast, macro-impacting changes without destabilizing the functionality of the design. While designing at a higher-level of abstraction has promised to deliver this, until now there hasn't been a solution that can effectively tackle complex control logic in addition to the simple datapaths that such technology has traditionally handled. A common concern about high-level synthesis is a perceived loss of control in the process of improving the design. This is a legitimate concern because, in most traditional approaches to high-level synthesis, e.g., compiling from a "behavioral" C-like description into hardware, the semantic model of the source (sequential code) is so different from the semantic model of the hardware that the designer loses predictability. To put this another way, it is hard for the designer to imagine what should be changed in the source to effect a particular desired timing, area or latency improvement in the hardware. Furthermore, small and apparently similar changes to the source can result in radically different hardware implementations.
This article highlights a new approach to designing at a high-level that keeps designers in control such that they can make tightly managed architecture and micro-architecture changes safely and quickly. Bluespec SystemVerilog (BSV) uses a very different approach to high-level synthesis, where the designer retains control. The semantic model of the source (guarded atomic state transitions) maps very naturally into the semantic model of clocked synchronous hardware. The designer can make controlled changes to the source, with predictable effects on timing, latency, and area in the standard Verilog RTL that the tool generates as output. Furthermore, due to the extensive static checking in BSV which helps ensure correct changes without having to simulate or run them in an FPGA, these changes can be more dramatic than localized tweaking, thereby allowing the designer to achieve timing goals sooner without compromising correctness.
In this article we describe a sampling of the following techniques used by designers to change a design for timing:
- Adding a pipeline stage to an existing pipeline.
- Adding a pipeline stage where pipelining was not anticipated.
- Spreading a calculation over more clocks (longer iteration).
- Moving logic across a register stage (register retiming).
- Restructuring combinational clouds for shallower logic.
- Incorporating hand-optimized logic.
Of course, all of these techniques can be performed in traditional RTL – in fact, most of them will be quite familiar to the RTL designer. However, Bluespec makes them much easier and faster to accomplish and much more likely to be correct. As designs grow toward much larger sizes and complexities than these small illustrative examples, Bluespec's formal, powerful semantics allow a hardware designer to greatly outpace what can be achieved in a traditional RTL environment, producing an environment for accelerated time-into-device and rapid architectural exploration.
Click here to read more ...