IBM leverages SoC tech in supercomputer design
IBM leverages SoC tech in supercomputer design
By Rick Merritt, EE Times
December 3, 2002 (10:16 a.m. EST)
SAN MATEO, Calif. In a move that some see as setting a new trend for how the world's fastest supercomputers will be designed, the U.S. Department of Energy has tapped IBM Corp.'s system-on-chip (SoC) technology to build a 65,000-processor machine. If the so-called Blue Gene/Light project performs up to expectations, backers maintain, it could point the way for using on-chip multiprocessing to build petaflop computers.
The research platform, which can theoretically realize peak performance of 367 teraflops, is one of two systems the Energy Department is working on with IBM under a $240 million contract announced Tuesday (Nov. 19). The other system, a more conventional supercomputer architecture dubbed ASCI Purple, has a 100-teraflops theoretical peak performance. IBM gave EE Times a glimpse inside its next-generation Power5 processor, which will drive that supercomputer.
The systems represent government researchers' two-p ronged approach to a pair of challenges: simulation testing for the nation's nuclear arms stockpile, and regaining U.S. supremacy in supercomputing. Japan usurped that position earlier this year, announcing that it had installed the NEC Vector SX6 Earth Simulator. A roughly 35-teraflops machine, it is currently ranked No. 1 among the world's top 500 supercomputers.
"We got a lot of criticism for not being innovative enough when Japan surpassed us after we spent so much money on the ASCI program, but we can't afford to bet the whole program on an untried technology. We have to go with what works," said Mark Seager, an assistant director for advanced technologies at Lawrence Livermore Laboratory, who led the procurement effort for the two systems.
"We know [ASCI Purple] will work," said Seager, but the Blue Gene/L program "is a factor of eight larger than what we have ever contemplated. We think we'll have several applications that will scale to this level. But it's really a computational science research platform to design and build something that will scale to this level of parallelism," he said.
If it succeeds, "the next rev after Blue Gene/L will take us to the petaflops machine that has been a Holy Grail," he added.
A spokesman for Cray Inc. (Seattle) said a supercomputer with sustained petaflops performance one capable of 1,000 trillion calculations a second will not arrive until about 2010. The IBM systems, while impressive for their intended applications, lack the 20-terabyte/second memory bandwidth of either the Cray X1 formally launched Nov. 14 or the NEC Earth Simulator.
"There's a whole different class of problems you can address when you have that amount of system bandwidth. With complex and ambiguous software codes for automotive and weather applications, you need his kind of bandwidth," the Cray spokesman said.
Oak Ridge National Laboratory, also under the Energy Department, is evaluat ing the Cray X1 for possible use as part of the government's reaction to the NEC system, he said.
"You are seeing an exodus of U.S. computer scientists to use the Japanese machine," he said.
The more traditional ASCI Purple machine slated for Lawrence Livermore will handle the increasingly tasking job of modeling the safety and robustness of the nation's aging nuclear weapons.
"We certify these systems are safe and prepared without doing underground testing," said Dona Crawford, a director of computation at Lawrence Livermore. "It will be nice to be No. 1 [in supercomputing] but that's not why we are doing this. We need [the computational power] to do the work we do."
Discussing the task of modeling aging nuclear missiles in their silos, Crawford said, "Imagine parking a car in a garage and leaving it for several years. The gas could evaporate from the tank, hoses and metal could wear out or become brittle just from climatic changes. Lots of thi ngs are happening in a system that has roughly the same number of parts as a car," she said.
Specifically, ASCI Purple runs "very complex physics software in development since 1996 to simulate button-to-burn weapons functions," said Seager, a task estimated to require 100 teraflops of performance by 2005.
Blue Gene/L, by contrast, will dig into tasks underlying the missile simulations molecular modeling, quantum molecular dynamics, fluid turbulence dynamics and material modeling, for instance and it will dig even deeper than ASCI Purple.
"We want to look at what is the onset of turbulence, how materials get brittle as they age, what happens to insensitive explosives as they age and the basic material properties of helium," he said.
The results of the Blue Gene/L applications could be fed back into existing or future ASCI program computers.
Besides plowing deeper into underlying issues of physics, the Blue Gene/L system is also expected to be much more c ompact than its gigantic forerunners. For example, Lawrence Livermore is building its Terascale Simulation facility to house ASCI Purple, which will consist of 196 interconnected computers.
"Our estimates are these machines will take 20,000 square feet of new space and 4 to 8 megawatts of power," said Seager. "We can do that for this machine but not for the next one. One megawatt per year costs a million dollars," he said. By contrast Blue Gene/L needs just 5,000 feet of floor space and 1.2 megawatts of power.
Jack Dongarra, a professor of computer science at the University of Tennessee who helps maintain the Top 500 supercomputer list, said both IBM systems were impressive though they won't be up and running until 2005.
"One of the achievements [of Blue Gene/L] is its packaging. Putting a high performance computer in a relatively small footprint with relatively small power consumption is impressive, but it's still quite expensive," Dongarra said.
Blue Gene will shrink the si ze and power budget by leveraging system-on-chip technology.
Inside Blue Gene/L
A single 5 million-transistor ASIC is at the heart of each of the 65,000 nodes in Blue Gene/L. Each ASIC contains two PowerPC 440-class microprocessors, four floating-point units, 8 Mbytes of embedded DRAM, a memory controller, support for gigabit Ethernet and three proprietary IBM interconnects.
The ASICs run a lightweight operating system kernel and achieve 8 flops per clock cycle. Neither Seager nor IBM would reveal the ASICs' clock rate. In most computations one processor acts as a computation engine while the other handles cache-coherent communications. The ASICs will be built in IBM's 8SF 130-nanometer CMOS process.
The three proprietary interconnects include 3-D Torus, a link to the six "nearest neighbor faces in a cell" to synchronize local chips for a grid application. A broadcast tree synchronizes all nodes in the system while a barrier network brings all nodes to a similar s tatus before beginning a new computation. Both of the later processes can be handled in about two microseconds.
"It's quite fast," said Seager.
A node comprises one ASIC, nine memory chips with a total of 256 Mbytes and connectors. Two nodes fit on a single 4 x 2-inch card.
The I/O subsystem is based on 1,024 PowerPC 440 I/O processors, one for every 64 nodes. Each runs a version of Linux, has access to 512 Mbytes of local memory and is connected via Gigabit Ethernet. Lawrence Livermore is designing a 400-terabyte remote file system for reading and writing data to Blue Gene/L.
The relatively small amount of memory per node and the vast number of nodes represent the two biggest challenges to programming Blue Gene/L. Seager said the secret to breaking through those issues boils down to "lots of hard work."
Applications are treated as grids broken down in a domain decomposition into cells to maximize parallelism. Programming is handled in standard Fortran, C and C++ with a message-passing interface between nodes. A production system works in a kind of batch mode to submit jobs to the system. They are then launched by a resource management system.
"Right now we are not quite certain what its applications range will be," said Crawford.
By next May, IBM is expected to build a 512-node prototype of the system. Based on measurements of that prototype, researchers will decide by September whether to build the full system, which could be installed at Lawrence Livermore in early 2005
The follow-on petaflops system could use faster processor cores or more than two cores per ASIC. "With only 5 million transistors per ASIC today, we have lots of room to make the ASICs more complex," said Seager.
"This is absolutely the trend. We expect to see more processors per die and more switching fabrics implemented internally. This translates to higher performance and lower latency," said Dongarra of the University of Tennessee.
The Blue Gene/L "is more lik e a really focused design for specific algorithms in areas like life sciences," said Ravi Arimilli, an IBM Fellow involved with both supercomputers. "The ASICs are very focused on a few simple problems all working in parallel and the compiler does most of the work," he said.
By contrast, ASCI Purple, built up from 196 nodes of eight 8-way IBM Power5 processors each, is "more in line with traditional, scalable parallel processors," he said.
The ASCI Purple and Blue Gene/L projects could put IBM's name on the top two supercomputers in the world. Currently its most powerful system ranks No. 4 behind NEC's Earth Simulator and two Hewlett-Packard Co. machines. HP has four of the top 10 systems. IBM has three.
Copyright © 2003 CMP Media, LLC | Privacy Statement