Cadence developed a digital twin of NVIDIAs SuperPod in a bid to boost the power efficiency and cooling effectiveness of AI data centers.
www.electronicdesign.com, Sept. 16, 2025 –
Behind every leap in machine learning is a data center crammed floor to ceiling with power-hungry silicon and heat-gushing server racks. But to maximize the performance of these facilities, it’s no longer enough to adopt the latest AI accelerators. It now depends on whether the underlying infrastructure can deliver sufficient power and efficiently reject all of the resulting heat under real-world operating conditions.
Cadence is trying to tackle the problem by giving companies the ability to simulate a data center down to each part, before spending billions of dollars to buy the components and physically build it out. The company is expanding its Reality Digital Twin solution to support NVIDIA’s most advanced AI supercomputer, the GB200-based DGX SuperPOD, which consumes around a megawatt of power.
Developed with NVIDIA, Cadence said the digital twin recreates the full stack of the SuperPOD: 72 Blackwell GPUs and 36 Grace CPUs linked together in each 120-kW rack, with eight racks connected using high-bandwidth networking switches. All of it is tightly integrated with power distribution and cooling units. The goal is to simulate the behavior of the multi-rack system — from the amount of power it draws under different loads, to how much heat it experiences, to how it interacts with power, cooling, and other infrastructure.
Using the digital twin, engineers can test different system architectures, cooling strategies, and the like, and assess how the changes will ripple through the entire data center in terms of power demand, operating cost, and carbon footprint. By modeling everything from the power distribution to cooling systems at this scale, Cadence said it gives engineers the flexibility to fine-tune the design for their needs before physical implementation, which ultimately speeds up deployment.
Large-scale digital twins are already starting to play a major role in supporting AI data centers — what NVIDIA refers to as AI factories—which consume hundreds of megawatts each. But they’re bound to become even more important as companies scale up computing, power, and cooling infrastructure into the gigawatt range, said Cadence. To support their AI ambitions, hyperscalers such as Amazon, Google, and Microsoft are all scrambling to secure renewable energy sources. They’re even eyeing nuclear power as a potential solution.
“Power is the new currency in data centers,” said Rob Knoth, Senior Group Director of Strategy and New Ventures at Cadence. “The goal now is to maximize the computational output per unit of power. We can do that with this simulation platform, which covers the integration of this very complicated system from chip to chiller.”
Power Demands of AI Growing Rapidly
As large language models (LLMs) and other machine-learning models grow in scale and complexity, the data centers used to train them are being constrained by power and cooling limits.
At the heart of these data centers are thousands — sometimes tens of thousands — of high-performance AI chips. The latest generation of AI GPUs such as NVIDIA’s Blackwell B100 and B200 chips burn through more than 1,000 W each, which is approximately 3X more power than a traditional CPU. The Grace Blackwell GB200 superchip at the heart of NVIDIA’s SuperPOD pushes the power envelope even further by linking a pair of Blackwell GPUs with a Grace CPU over the company’s NVLink interconnect.
To train and run the largest AI models, these chips are being deployed into ultra-dense configurations, which employ low-latency, high-bandwidth networking to get everything working as a single accelerator as much as possible. Consequently, power-per-rack specifications in data centers are rising from 30 to 40 kW to more than 100 kW.
On top of that, traditional fan-cooled systems can no longer handle these power densities. To prevent thermal failures, companies are increasingly adopting direct-to-chip and other types of liquid cooling.
These escalating demands make it crucial to precisely model power consumption at the chip level and then evaluate the implications at the package, circuit board, server, and rack levels. However, given the huge amount of money that hyperscalers are throwing at data centers and the huge amount of power required for them, it’s also important to model to the entire facility to make sure the power and cooling infrastructure can keep up.
Digital twins are ideal for large-scale simulation, stated Sherman Ikemoto, senior director at Cadence. They enable engineers to recreate all of a data center’s components to predict power demand and other aspects of the design. “When NVIDIA is designing a physical box, they will do a lot of virtual prototyping because of the scale of the physical box,” he said. “But they will also build physical prototypes to make sure that what they are eventually going to ship is going to work under all the operating conditions that are specified for the box.”
“But, at the data center scale, you can't build a physical prototype to test things out,” added Ikemoto. “It's too costly, and it takes too much time, and there is not enough power in the world to build a physical prototype for every data center you want to run. You need a virtual replica of a data center to do testing and verification, not only of the initial design but also the design changes that are going to occur over the lifespan of the data center.”
“It's virtual prototyping at a data-center scale,” he told Electronic Design.
Using Data Centers to Build Bigger Data Centers
Still, as the largest data centers scale up into the gigawatt range, the complexity of where to put everything from power distribution units and cooling systems to the server racks themselves is increasing drastically.
“The question is how close you can put these things together while getting as much performance as possible out of the design,” said Ikemoto, before comparing the challenge to the complexities of chip design. If you put power-hungry transistors too close together on the chip, they can negatively impact each other, causing hot spots that can degrade performance unless cooled. “You need to work out the problem of package density and how all these parts interact with each other to maximize the value of these data centers.”
In the worst case, companies find out too late that they've over-designed their data center, unable to get access to enough electricity to power the hardware in its entirety. Or they under-design the data center, unable to use all of the available capacity in the electric grid. Cadence wants to help companies avoid those pitfalls by using its Reality Digital Twin technology to see how their design will play out ahead of time.
The company said users can “drag and drop” computing, networking, storage, and other components as well as power distribution and cooling systems to test out different configurations and then optimize them.
The platform supports a library of more than 14,000 parts provided by more than 750 vendors. According to Cadence, customers can use the models to build data centers and then run computational fluid dynamics and other physics-based simulations on the digital twin to predict its power, space, and cooling needs, as well as its cost. These models behave identically to their physical counterparts. That way, engineers can take a closer look at what happens during different failure scenarios or when upgrading different components.
Cadence started working with NVIDIA to connect its digital-twin platform to the chip giant’s Omniverse technology in March. But it has taken a couple of months to develop a digital twin of NVIDIA's most advanced SuperPOD.
"Our platform is the only one that can model the entire AI factory — including liquid cooling and air cooling as well as the dynamic behavior of the IT systems — and simulate the whole thing as a massive system,” said Knoth.
Once the data center is up and running, the same digital twin can be used to monitor the physical system, using data from sensors and real-time monitoring systems to maintain optimal performance throughout its lifecycle.
“It's a simulation tool, so it can be used to project what is going to happen in the future, too, so you plan things out so that you can get the highest level of capacity utilization and the highest level of power usage effectiveness,” added Ikemoto.