Security Chip Design Speeds on to Silicon

Security Chip Design Speeds on to Silicon
By Hemanshu Bhatnagar, Neel Das and Satish Anand, Integrated System Design
March 1, 2002 (12:33 p.m. EST)
URL: http://www.eetimes.com/story/OEG20020227S0055

When we founded our company in September 2000, existing ASIC solutions were not keeping pace with the Internet infrastructure buildout, causing a performance bottleneck. So Corrent set out to deliver the fastest Internet Protocol security accelerator in the market. Our goal was to build a design of 7 million gates or more in less than a year. Funding and market dynamics dictated that we execute on time, first time. We had one shot to make it happen.

We were able to deliver a 2.2 million-instance design with the ability to accelerate Internet Protocol Security (IPsec) and Secure Socket Layer (SSL) at gigabit speeds, 14 months from conception.

This article describes the philosophy Corrent adopted and how it was translated into a working flow in such a short time. Specific examples of challenges and workarounds are discussed. Large hierarchical designs like Corrent's present unique challenges to integration, and some of these issues are discussed.

Our first challenge was to quickly find talent with the broad set of necessary skills to implement a quality design. We then focused on putting a methodology in place that everyone would buy into and execute on.

Early on we decided to go with a customer-owned tooling (COT) flow rather than outsource the silicon back-end implementation. Two factors drove this decision: schedule and control. Owning the back-end implementation was key to meeting the schedule while controlling the design process resulted in faster back-end iterations and better control over timing, methodology, resources and quality.

There was, however, the issue of cost. In the short run, the cost of implementing a COT flow was significantly higher than outsourcing. Since Corrent had an aggressive road map of products to be developed in a short time, the initial investment of building the COT infrastructure paid off in future savings. We also saw savings by avoiding costly respins-an intangible benefit of the COT flow that was very difficult to factor into the cost equation. For a startup like Corrent, however, it was a critical decision and one that ultimately enabled our success in getting product out on time.

Once we decided on the COT flow, our next task was to establish a methodology and create an infrastructure to support it. Our development philosophy hinged on making sure that schedule was king; that there is a need to be correct by design; that everything scales; that timing is everything in methodology, and that we must leverage all internal and external resources.

Schedule was the primary driver for our whole organization. We constantly guarded against feature creep-if it was good enough for production, it was good enough to move forward. In cases where a feature was "less than perfect" and fixing it would have delayed the schedule, we worked closely with our customers to derive the best solution. Our marketing team played a critical role here.

Ou r design methodology focused on creating a development environment that could be centrally controlled. For example, all the synthesis scripts for block-level synthesis referred to a common file for timing and other parameters. As a startup, we were building our team while development work was proceeding concurrently. We found the best way to enforce a uniformly consistent methodology was to build it into our development environment. New employees walked into a correct-by-design methodology.

Everything had to be scalable. Our architecture, our verification methodology and our IT organization were all able to support rapid growth and enable lower costs through reuse.

Corrent's modular silicon architecture was designed to scale in terms of performance, crypto algorithms and chip I/O protocols. For example, our architecture allowed us to quickly incorporate support for new I/Os (such as PCI, PCI-X and PL3) without having to change the rest of our design. Similarly, our pipelined sliced architecture allow ed for relatively simple addition of newer encryption algorithms (Fig. 1).

Our goal was to establish a robust, scalable and re-usable verification environment that could grow with the company. We staffed our verification team with outstanding programmers who were also good design (C++, C, Verilog, Perl) programmers. For example, our lead verification engineer had written a full-blown static timing tool in his past life.

In the end, we created a self-checking random-simulation framework that allowed for pushbutton creation of real-life test cases. This framework meant that most of the designers' time was spent debugging rather than creating test cases. Designers could selectively randomize values for chip features and create a self-checking testbench in minutes. Our verification methodology also allowed us to run completely random test cases for days without intervention.

We supplemented our development/verification framework with an IT infrastructure that was scalable, flexible and controlled. Ha ving an IT expert on your staff is worth every penny! We chose Platform Computing Corp.'s Load Sharing Facility (LSF), a software suite that centrally controls distributed computing resources, to manage our compute farms. These farms were populated with the fastest Linux machines available; large Sun servers were used for jobs requiring 64-bit operating systems and large memory, or for applications unavailable on Linux. Whenever possible, we chose Linux for its affordability and speed.

Our methodology focused on single-pass timing closure. One of the first steps was to do all data preparation work up front. The foundry and the library vendor were chosen early on to permit early qualification. The library could be checked for completeness (we discovered it had no integrated clock-gating cells) and spot-checked for timing accuracy with Spice simulations.

Next, we created a baseline tool flow. In this way, the "nuances" of each tool could be exposed and accounted for early in the game. That allowed us t o create the all-important glue (scripts) that enabled the data to flow smoothly back and forth between the tools. Fig. 2 gives a summary of our tools. Fig. 3 shows our five-step high-level integration flow.

Corrent decided early on that a back-end test run was necessary to hammer out the methodology to ensure a successful tapeout of the final chip.

Within three months of forming a team, we taped out Project Squall and completed these key tasks:

Tested sensitive circuits including I/Os, memories, PLLs and some IP;

Built up scripts for the design flow including place and route and design compiler synthesis;

Validated the Hercules tool decks;

Established and validated a methodology to first-silicon success;

Validated and correlated the library data;

Validated our foundry flow.

We identified the need for better LVS/DRC rule decks and hired additional help to make sure they accurately reflected the foundry manufacturing rules. This turned into a collaborative effort among Corrent, UMC, Avanti and VST. The result was a win/win relationship that benefited all parties.

With the success of Project Squall, our focus shifted to Typhoon (now known as the CR7020). Typhoon was a 33 million-transistor chip that taped out eight months after putting together the team, resulting in fully functional silicon. There were many "lessons learned" in our First Encounter->Physical Compiler->Apollo->Hercules tool design flow. Some of these pertained to the newer EDA tools.

First Encounter (FE) lacked the level of quality we needed to assemble the pad ring, power grid and other layout entities. The tool did not include DRC checks and opens/shorts checks. Consequently, we had to produce our final pad ring/power grid from Planet and then export it to FE. This increased our run-times, particularly because Planet was slow in reading and writing Verilog.

First Encounter has a number of good points, and some that are lacking. For instance, FE's biggest advantage was its speed. We could turn a full-chip place-and-route job on the chip in an overnight run. This shaved weeks from our schedule. However, power grid connectivity could not be checked in FE, so we assembled the grid in Planet instead. Once FE addresses this issue, it will be more effective.

Repeater insertion in FE was one of its highlights: We used a rule-based approach that inserted about 20K repeaters at the top level in under five minutes.

However, when we tried "cookie-cutting" the power grid in FE to push it into the cluster pdef3 database we found out that if a power/ground strap straddles a cluster boundary, FE promptly drops it, and does not report an error on it. A simple visual inspection of the power grid near cluster boundaries would address this.

What's more, FE does not perform thorough checks on incoming Verilog data syntax. Users would be well-advised to ensure the data is ac ceptable to Apollo before reading it into FE.

Congestion correlation between the physical compiler (PC) and Apollo tools was good in most cases, but we did run into some minor differences. One issue that Synopsys R&D helped us discover was that Apollo and PC compute routing congestion differently. Our library had separate horizontal and vertical routing-track pitches. Apollo routing-congestion calculations took the average of the two, whereas PC computed the congestion numbers using the two values separately. Synopsys R&D ran an experiment in which it changed the way PC computed congestion and demonstrated better correlation with Apollo.

We noticed that good correlation between the PC tool and actual extracted RC parasitics helped close on timing faster. On the other hand, rectilinear or "skinny" floor plans and the PC tool did not seem to get along well. We discovered that for long nets in a long, skinny floor plan, PC's timing optimization in a "repeater-insertion-problem" situation did not get us good timing results until we constrained the design with a "max_capacitance" design-rule constraint. Users should recompute RC parameters for "unusual" floor plans: The PC tool tends to perform better in a rectangular, "open," standard-cell space.

Overall, for almost all clusters, we observed about a 5 percent growth in cell count caused by PC and another 5 percent growth caused by clock tree synthesis and high-fan-out net synthesis in Apollo.

Because of the size and complexity of our design, we were consistently driving the EDA tools to their limits. EDA vendors were eager to work with us in solving problems-in many cases they put us in direct contact with their R&D teams.

---
Hemanshu Bhatnagar, vice president of engineering, and CTO Satish Anand co-founded Corrent Corp. (Tempe, Ariz.). Bhatnagar holds MSEE and MBA degrees, and Anand and Neel Das, Corrent's director of silicon integration, have MSEE degrees, all from Arizona State University.

Industry Articles

Security Chip Design Speeds on to Silicon