High-speed fabrics deliver optimal IP implementation
High-speed fabrics deliver optimal IP implementation
By Richard Terrill, Vice President, Lightspeed Semiconductor, Sunnyvale, Calif., EE Times
December 19, 2002 (10:28 a.m. EST)
When developing a customizable silicon architecture, one of the most important and most often overlooked aspects is developing a technology strategy to manage IP before the silicon details are frozen. All too frequently an architecture is created without being informed by the capabilities offered by, and resources required by the 3rd party IP industry. It's easy to focus on gate propagation and register frequencies, and entirely miss the performance delivered at block and system level.
In general , there are two choices when it comes to implementing significant blocks of functionality in customizable device. You can either embed hard-coded functions in the form of optimized physical cells; or you can layer a synthesizable (soft or firm) onto a malleable logic fabric.
In the first instance, you are gaining superior performance at the expense of potentially picking the wrong functions to embed and thus leading to cost inefficiencies fo r both manufacturer and end-user alike. In the second case, you risk delivering a performance level that is inferior and not acceptable for a wide range of applications.
There has been much written about platform-based design and preconfigured SOC architectures, and at first blush they are appealing. If you have the correct (optimal) collection of hard functions, and a bit of "customizable stuff" around them, you can make the ultimate hand-tuned ASSP. The question is what is correct in terms of what blocks to choose, and how to interconnect them. This is compounded by the incompatibility between the full-mask nature of diffused blocks, and the few-mask nature of configurable modules.
Another architectural criterion that is magnified by the importance of IP in contemporary circuit design is balancing I/O performance and fabric performance. A device with I/O cells that can move data across the device pins at spectacular speeds, but with a fabric that is somewhat-less-than-spectacular, i s not helpful to many designers. The correct way to manage this is to first develop an I/O cell with performance that meets the target application criteria, and then develop a logic fabric with comparable bandwidth.
A customizable device can implement a host of different standards-based I/O options, and in modular array ASICs and FPGAs these must be available on all the pins, so the design of the I/O cell is crucial and defining in terms of performance and capability. The natural path is to think of the I/O cell as a "pseudo-PHY" layer, and provide appropriate digital and analog capabilities to realize the range of I/O requirements.
The next architectural step is to build a logic fabric that has performance to match the goals of the target markets and the capabilities of the I/O cells. A smaller/denser fabric may be more efficient for cost purposes, but often trades off too much performance for the size savings. In addition, the slower fabric makes it more likely that hard-diffused IP core s will be the only way to achieve the performance for certain portions of the design. By selecting a high performance process, and favoring speed in the design, a fabric may be invented that is efficient when implementing synthesizable IP cores while preserving their potential performance for a given technology.
Lightspeed is a modular array ASIC vendor, and as such, our ASICs are customized with only a few metal layers late in the wafer fabrication process. The base devices (slice) of a particular size are identical for all customers.
Given this, we have chosen to use support synthesizable Virtual Components in a high-performance logic fabric rather than employing hard-diffused function blocks. To balance the anticipated tradeoff of attenuated performance, we focused heavily on making the logic fabric as fast as possible. This meant we could turn the knob in the favor of speed (at some expense of power and area) to ensure that firm and soft IP would "layer" into th e device fabric and deliver the desired results.
By way of example, the SPI4.2 interface from Silicon Logic Engineering is targeted to operate in our 0.13 µm devices at 400+MHz (800+Mbyte/second) with dynamic deskew and a 16-bit external interface (16-bit TX and 16-bit RX). A path to higher performance is planned as the I/O cells become better characterized. The SPI4.2 macro is delivered in two parts. The PHY part is a very-firm block that is hand-optimized to fit into the I/O cells. It uses logic modules (by definition adjacent to the relevant I/O cells) to implement the digital SERDES and buffering functions, and to prepare a standard digital interface for the Link Layer. The Link Layer is delivered as a soft IP core, with fewer constraints on placement since the performance is not as topologically sensitive as the PHY layer. They are licensed together as one product, but may be thought of as two parts.
Another example is 3DES encryption/decryption. Amphion, another of our IP pa rtners, has especial expertise in this area, and they offer four different "flavors" of this function, to match the user's requirements of size, performance or power. If performance is the ultimate goal, their highest-performance 3DES IP core is estimated to operate at 6400Mbits/second in our 0.13µm devices. This compares favorably to the 4267Mbits/sec achieved in 0.18µm standard cell technology.
Where we favor embedded hard cells is those areas that are used in essentially all design tasks. All of our modular array ASICs contain high performance PLLs for clock control, and area-optimized embedded SRAM for on-board memory requirements. This is a proper tradeoff of embedding functions and thus making the devices somewhat less generic for the broad appeal of those particular capabilities. Essentially, 90%+ of our tapeouts use the embedded memory, and 85%+ use the PLLs. This math tends to work very differently when you consider embedding more specific/focused IP such as pr ocessors and analog functions. If you select the correct blocks that everyone wants, you have a winner. If you choose poorly, or market critical-mass never emerges, you have a noncompetitive device.
The optimization loop is completed by working closely with IP vendors when developing new device architectures. By soliciting their guidance on standards, bandwidth requirements and design features, we can provide can provide customizable devices that are very effective at implementing their high-performance soft and firm IP cores.