Third-party reuse of intellectual property meets its toughest test when cores grow beyond 20 million transistors. Cores of this complexity can easily overwhelm traditional design tools in the original design phase as well as during integration in customers' chips-forget the romantic ideal of "shrink-wrapped" IP.
Yet GigaPixel Corp., using a combination of close customer support and timing-driven, block-based design methodology based on a tool called IC Wizard from Aristo Technology, is now delivering massively complex 3-D graphics cores. In fact, this methodology has become a key enabler for designing, delivering and integrating the IP.
Rapid integration of the soft-IP cores is critical because GigaPixel's business model is based on nonrecurring engineering costs plus royalties. The company therefore gets significant revenues only when customers' chips roll out, providing a strong incentive to make sure customers can integrate the IP as quickly as possible. That's why GigaPixel has chosen to invest so much in the integration and support of its customers.
While rapid integration of huge cores would be challenging enough, based on the sheer size of the IP, design flow was stretched further by making the cores highly configurable. Customers can adapt the 3-D graphic cores to a wide range of applications by scaling pipelines, configuring features, adding or removing functional modules, targeting different clock speeds, optimizing for power and making any other change the customer wishes. The cores can therefore scale from the high-end performance needed for workstations to the more modest and power-sensitive requirements of handheld wireless devices.
The starting point for this scalable IP lies in the patent-pending technology that tiles a frame, determines visible pixels and renders the tiles. The rendering engine processes only visible pixels, reducing the amount of internal processing and memory bandwidth needed to rend er 3-D scenes. As a result, the cores can perform full-scene anti-aliasing at speed, even for high screen resolutions.
Equally important, these cores overcome the incompatibilities of previous tile-based architectures with advanced visibility algorithms, a compliant system interface and rendering pipelines. The cores support all current software standards including Direct3-D and OpenGL without change for applications or special driver considerations.
These compatibility, scalability and performance characteristics are crucial for meeting a goal of delivering IP that addresses most of today's 3-D graphics processing applications. At a time when the market expects a new generation of 3-D graphics processors every six months, vendors can no longer afford to focus on a single application. Development costs must be amortized across as many applications as possible.
An adaptable flow
To achieve this goal with IP of this complexity, both the initial design process and the en d customer's final chip integration must be supported by a flow that readily accommodates engineering changes, design scalability and reconfigurability. Typically, 80 percent to 90 percent of the final chip consists of GigaPixel's IP, the remainder being the customer's interface logic and value-added features. Because of the predominance of GigaPixel's IP, the customer's back-end flow is virtually identical to GigaPixel's back-end flow.
The original plan had been for 3-D graphics IP to be combined with customer-specific logic and then physically implemented as a flat, gate-level design, but many of the tools in the flow simply broke when confronted with a design of this size. Other tools failed to achieve timing convergence. This experience made it clear that the flat, gate-level approach could not handle the initial design process, much less support an efficient hand-off to customers.
After considering a number of alternatives, GigaPixel's engineers found the solution in a methodology known as block-based design using Aristo Technology's IC Wizard design planner. One way to avoid the capacity limitations of flat, gate-level designs is to divide the design into smaller blocks.
This approach makes it possible to continue using existing tools-rather than migrating to an entirely new flow-because the chip can be partitioned so that each block is well within the capacity limitations of gate-level tools. These tools can then easily achieve timing convergence on a single block and designers can usually fix problems within any given block in just a few hours.
However, taking advantage of a block-based methodology depends on the ability to ensure timing convergence between the blocks. Designers must be able to manage interblock timing so that they can focus on the individual blocks without worrying about how the entire design will come together.
Essentially, Aristo's IC Wizard creates a timing-compliant floor plan for the modules within the design hierarchy. For designs that ar e too big to place-and-route efficiently as a single block and still fulfill high-performance demands, IC Wizard can automatically group the modules into larger blocks, each of which is then flattened for gate-level place-and-route.
This process ensures timing convergence by automatically allocating timing budgets to blocks and to chip-level interconnect. These budgets are passed as constraints to block-building tools. Any subsequent changes to block timing are accommodated by automatically reallocating slack along chip-level timing paths.
One of the key differences between this design-planning function and conventional floor planning is that IC Wizard automatically performs timing-driven placement and module reshaping in one pass; the tool then clusters the modules into multiple place-and-route partitions and optimizes the ports for each. In contrast, conventional floor planners act as interactive aids that help the chip designer perform many of these operations. With this more manual appro ach, designers must interactively place and reshape the modules to fill gaps, taking a number of iterations to get optimal placement. Conventional floor planners are also prone to place ports very close together on a boundary or in a corner of a place-and-route partition, causing undue wiring congestion in that part of the chip.
GigaPixel's IP is coded partly in Verilog and partly in the Module Compiler Language (MCL) that drives Synopsys' Module Compiler (a high-level synthesis tool specialized for datapath modules). The Verilog portions of the design are synthesized using Synopsys' Design Compiler; the entire design then goes to IC Wizard for timing-driven placement and grouping of modules into blocks.
Basically, the IC Wizard design-planning tool automatically places, shapes and packs together the modules to create a chip-level plan. It then groups them into larger blocks that are the appropriate size for gate-level place-and-route. The grouping is then implemented in Design Compiler and timing is optimized to create the final floor plan. This plan includes pin optimization and power planning.
Each block now goes to Cadence's Silicon Ensemble for gate-level place-and-route, power routing, clock tree insertion, parasitics extraction and in-place optimization. Note that each of the blocks is handled separately in this phase until the blocks are glued together, also in Silicon Ensemble. The flow is completed with design-rule checking by Mentor's Calibre.
The most powerful aspect of this methodology is that it enables many designers to work in parallel on different parts of the design right up to the very end of the flow. In addition to the usual parallel
approach on the modules in the front-end flow, the back-end flow also allows parallel work on the multiple blocks.
Designers can therefore optimize timing and fix bugs in one block while other designers do the same for other blocks. A bug in one block does not require full ba ck-end iteration for the entire design. This parallelism saves an enormous amount of time. Even if the traditional tools did have the capacity to handle a 20-million-transistor design, we have found that the ability to work independently on multiple blocks makes the block-based methodology highly preferable.
The original IC Wizard plan essentially provides a delivery vehicle for reusing the IP. The IC Wizard database contains information such as block shapes, pin locations and a timing model . At the end of the design flow, the block-based methodology has proved to be a key enabler.