Network processors are among the most difficult chips to verify, and the "multi-service processor" chips at Brecis Communications, with multiple clock domains and asynchronous interfaces, are the toughest of the tough. In a recent panel discussion at the Network Processors Conference, George Apostol, vice president of engineering at Brecis, described his company's recently taped-out MSP5000 chip as a "verification nightmare."
With the help of a home-grown FPGA emulation environment, however, Brecis was able to achieve first-pass silicon success. In this interview, Apostol describes his company's verification strategy, and discusses the shortcomings of existing verification tools. A slide presentation from the Network Processors Conference is available at the Brecis web site.
EEdesign: What does a multi-service processor do?
Apostol: What we're really looking at is devices that communicate at the end of the last mile or beginning of the first mile. Our target market is really small business. If you look at what small business needs from a communications standpoint, you can see that a number of interfaces need to come into the environment. For example, you've gota WAN interface coming into building, so there's connectivity to the outside world. Voice is coming in, so there's some form of telephony. And then internally, you need a LAN, which may be connected to PBX. So Brecis is focused on generating products for the converged voice and data space.
When you look at the WAN interfaces these devices want to talk to, those vary as well. DSL, T1, cable modem, wireless; all of those various flavors. Also, the type of communication that happens on those networks includes ATM, frame relay, Ethernet, and more.
EEdesign: You taped out your MSP5000 chip earlier this year. Wh at did it include, and what were the design and verification challenges?
Apostol: I'd say it had about 28 million transistors and 3.3Mbits SRAM. Maximum clock speed is 200 MHz, but that varies depending on the various subsystems -- we have independently programmable clock units on the chip. Each of the blocks can run at frequencies that are independent of other blocks. And then, in the communications world, there are synchronization issues have to happen, especially with voice. We implemented a number of those functions with digital PLLs.
There are a number of functional engines that we use to deliver the appropriate interface. At heart of those are three processors. There's a MIPS 4K on board, for the CPU. There's a programmable WAN engine using LSI's ZSP core. For the telephony side we're also using a ZSP core. The difference between the two units are the peripherals that would hang on them for some of the hardware functions.
The chip was done in 0.18 micron technology. Because we're a f ull COT [customer owned tooling] house, we worked with UMC.
When we looked at architecture of the chip, the thing that stuck out the most was clocks and timing. We have 28 different clock domains on the chip. The first question is how do we deal with those, especially with multiple functional, asynchronous interfaces. Because of that, had to really look at things up front, and decide how we wanted to design the chip and also verify it.
EEdesign: What's your approach to verification?
Apostol: We sort of did a bottom up verification approach. We started with RTL simulation at the block level, then at the subsystem level, and finally at the system level.
If you look at the functional block diagram, you'll see that a number of blocks in there are IP [intellectual property] cores we purchased from various vendors. That includes the processors, an Ethernet MAC, a serial communications controller, various blocks on the WAN interface, and a Utopia interface. The cores we purchased generally came with testbenches, so our first task was to run that testbench and assure we have same kinds of results our vendors do. In general those kinds of testbenches are generated more specifically for the core, and as such, have more detail associated with them.
After doing that, we put them into our environment at what we call a subsystem level. One unique thing about our architecture is the way each subsystem plugs into our system bus. We have a standard interface unit so all of the subsystems connect to system bus in exactly the same manner. In those interface units we also have a number of queues, and through the queues we maintain the way in which data flows on that system bus.
One unique thing about our verification environment, as we got into subsystem level, was that we had a standard method for generating vectors. We used the on-board processors to be able to run those tests. Our vector generation methodology was to write a ssembly or C code that would run with the on-board processor. We had models for memory that were connected to our SDRAM interface. So we would write tests, compile them, put them into memory and have the processor execute them.
EEdesign: Why did you create your own FPGA emulation system, and how did you use it?
Apostol: Early on in the architectural process, one of the biggest things we had to deal with was asynchronous clocks at system level. When we looked at all the interfaces we wanted to talk to, we wondered how we could ensure that once we plug into real traffic, we'd hit all the corner case scenarios. It was obvious to us we needed to do some form of emulation. We really had two choices -- roll our own, or go out and use one of the industry solutions that are out there today.
As we looked at external solutions, one issue we ran into was how to drive boxes to do what we needed to do. As looked at our architecture, it fell out very nicely with what we could do on an FPGA board . We ran the system bus on the board, and had all these interfaces that the bus systems would be connected to. So in parallel to the verification effort we started the FPGA development board. By time we were up and running with system level RTL verification, we had the board back.
There were minor differences, in terms of RTL, when we went from an ASIC environment into an FPGA environment. For example, we had to come up with different ways of doing RAMs. But we didn't have to repartition any logic. There was no issue where FPGA boundaries would be. Test vector generation worked exactly the same with the FPGA platform.
EEdesign: What advantages did the emulation environment provide?
Apostol: The difference with the FPGA environment is that we got real world connectivity. We could run 100 Mbit Ethernet. We were getting data rates probably more in the 10 Mbit range, but we were able to hook up to a 100 Mbit interface. On the WAN interface we were able to connect to various Utopia devic es. On the telephony interface, we could ring phones and have voice conversations.
Another thing helped us in software development is that each of the onboard processors had the ability to support JTAG debuggers. On the FPGA platform, we were able to bring up those debuggers. It helped greatly with application software development.
EEdesign: It sounds like the emulator helped you with hardware/software co-development.
Apostol: Most definitely. We actually booted the VxWorks operating system and had applications running in real time. That in itself was a good test of whether or not the asynchronous interfaces were designed correctly. Another benefit is that the software team was actually part of the ASIC verification team. Because of the way we generated vectors, they were able to come in and start the low level diagnostics, and slowly build up to the applications.
EEdesign: Did it take a lot of time to build and maintain the emulator?
Apostol: I don't think the re was any more time and work involved than there would be in implementing a larger box. We had one engineer dedicated full time to the hardware, and then for the board design there were three people, but they were doing other things as well. From start to actually getting things up took about 3 ½ months. When I've used large emulation boxes, it's taken two engineers three or four months to get the environment up. And this board was more specific to our architecture.
EEdesign: Which software verification tools did you use?
Apostol: Basically a Synopsys flow. We used VCS for Verilog simulation, MemPro for memory models, and Vera for the HDL testbench. For formal verification, we used Formality from Synopsys and Conformal from Verplex. We used the Debussy waveform viewer and simulation debugger from Novas, as well as nLint from Novas to check RTL code. We used CoverMeter from Synopsys for coverage analysis.
On the design side, we used Synopsys Physical Compiler for synthesis and placement, and the Avanti toolset for the back-end flow.
EEdesign: What issues did you run into with simulation?
Apostol: Speed. As you do block level simulation everything is fine, but as you put more and more functions in, the degradation takes it down considerably. Given today's ASIC densities and the amount of functionality put on chips, our feeling is that the speed of the simulators hasn't kept up. The difference between our FPGA platform and our RTL environment is probably a factor of 30 or 40 in terms of speed, and the number of vectors we can execute.
The other thing we encountered was clocking. Simulators inherently want to sample data on well-partitioned intervals. The way we got around it was through the FPGA environment. It's not easy to verify asynchronous boundaries in RTL simulation. You're going from the digital to analog world when you start looking at asynchronous boundaries, so some sort of mixed-signal simulation has to happen.
EEdesign: How did you m ake use of Vera?
Apostol: We used Vera to generate communications models that were needed externally for the chip. We also used Vera for some of the protocol generators, so we could monitor packets that went across our system bus and ensure they were correct.
We had a system-level random RTL environment, and we used Vera to do that. We had a lot of directed tests for each subsystem. For instance, on the telephony side we had an interface to generate TDM traffic, and on the WAN side we generated ATM traffic.
EEdesign: Why did you use two different formal equivalency checkers?
Apostol: Formality and Conformal each caught different kinds of structures that the other one didn't. So we found that using both gave us pretty good coverage. Also, speed varied. In some cases Formality was faster, in some Conformal was faster.
EEdesign: You said in the panel discussion that "IP blocks never work as advertised." What problems did you run into?
Apostol: The bigge st issue is that IP blocks are designed for a particular system environment and are verified in that environment. When we get IP blocks, they're given to us with certain assumption of how they're going to be used in system, and what the bus interface is going to be. When we put them into our system environment, and use them a little bit differently, we get behavior that's different from what the vendor explains. Then we have to re-verify the IP in our environment, and that's the difficult part.
The testbenches we get are designed for block-level IP cores. It's difficult for us to take those testbenches in a way that works in our system environment. It seems to me there should be a way to do that, to have tests written in such a way that they're portable across different environments.
EEdesign: How did you make use of CoverMeter?
Apostol: We did statement coverage, which checks to see if lines of RTL can be executed. We also did some FSM [finite state machine] coverage, which tells y ou whether a particular state has been hit. FSM transition coverage tells you to what extent the arcs of a state machine have been executed, and in that, there is some improvement that has to be done.
To detect what I would call first-order arcs of transition is not that difficult. But as you get into second and third order arcs, that's where you're going to find most of your bugs. We had one bug we had to follow through six arcs of the state machine in the proscribed order at the right time for that transition to occur. We found it on the FPGA system, but when we brought it back into simulation to try to recreate it, it was difficult.
EEdesign: With coverage tools, do you get a good sense of knowing when you're done with verification?
Apostol: We get a better sense, but in the end it was still more of an art than a science. Around here it's always said that tapeout is a state of mind. We decide as a team, getting everyone involved, including the software designers. When we decided that we were getting the functionality we expected on the FPGA, and had everything running, then we were comfortable.
EEdesign: If you could pick a few top issues, how would you most like to see EDA vendors help with your verification challenges?
Apostol: Number one would be RTL simulation performance. To me, right now, that is one of the biggest bottlenecks in doing these large designs. Secondly I would like a way to generate truly asynchronous traffic in our environment. The six-order bug I described didn't show up in simulation because the clocks were beating at a common frequency that wouldn't show that bug. We had to tune things in a particular way in order to create that scenario. I'd also like a way to use the on-board processor with the verification tools. I think that's an enormous opportunity for EDA companies for those of use doing system-on-chip designs with processors.
Verification has a lot to do with what you do up front. I think we've developed a proven methodology f or being able to design this class of chip, and to get pretty high coverage in terms of functionality and verification.