Die-to-Die Interfaces for Data Centers Bandwidth & Latency 

Manuel Mota

Jun 02, 2021 / 4 min read

High-performance computing (HPC) is a hot topic these days, and for good reason. Consider the can containing your favorite soda – countless hours of simulation and engineering work using HPC systems have gone into designing streamlined cans that minimize aluminum waste. Indeed, the benefits of HPC are far-reaching, from its use in mining cryptocurrencies to drug testing, genome sequencing, creating lighter planes, researching space, running artificial intelligence (AI) workloads, and modeling climate change.

Processing massive amounts of data is driving demands for larger and more complex chips at advanced process nodes. What’s more, hyperscalers are transforming the way that data centers are designed and how computing resources are organized. To support their business imperatives of delivering ready access to multimedia resources, fast e-commerce transactions, quick search engine results, and the like, these companies are innovating with new data center architectures—namely, chiplets or multi-die SoC architectures.

In this blog post, I’ll discuss how splitting SoCs into smaller dies for advanced packaging and using die-to-die interfaces to enable high bandwidth, low latency, and low power connectivity can benefit hyperscale data centers. Additionally, I’ll also cover what’s needed to optimize these applications.

Futuristic Die Chip Processor Technology Background

Die-to-Die Interfaces for Hyperscale Data Centers

Traditional data centers are built with CPUs or, in some cases, a mix of CPUs, GPUs, and specialized ASICs. For hyperscale data centers—which are designed to scale up quickly and in a massive way to support HPC applications—these types of chips are not sufficient. Their high volumes of data demand the powerful processing that’s been found in larger chips. Cerebras, for example, is noteworthy for developing the Wafer-Scale Engine (WSE), the biggest chip ever built. Designed for deep-learning systems, the Cerebras WSE provides the compute, memory, and communication bandwidth to support dramatically faster and more scalable AI research compared with traditional architectures.

However, going big with monolithic dies—and at advanced nodes, no less—is an expensive endeavor and may not result in the most optimal yield. To alleviate the costs as well as yield issues as chip sizes approach full reticle size, designers are choosing multi-die chip architectures. In a multi-die chip architecture, SoCs are partitioned into smaller modules, also called chiplets, in advanced multi-chip packaging. Compared to a monolithic design, where all of the functionality is on a single piece of silicon, this disaggregated approach provides economic benefits from a yield standpoint and also the product modularity and flexibility to mix and match functional blocks in separate chiplets to address different market segments.

Connected by die-to-die interfaces, the chiplets can be placed side by side, which is the prevalent and lower cost approach. Or, the blocks can be assembled in a 2.5D or 3D package to allow even greater density. High-bandwidth memory (HBM) designs, which consist of large 3D stacked DRAM integrated on the SoC, are one of the increasingly popular applications driving the move to 3DICs.

Choosing the Right Die-to-Die Interfaces

Choosing the right die-to-die interfaces is an important factor that influences chiplet performance. Die-to-die interfaces are functional blocks that provide the data interface between two silicon dies assembled in the same package. For power efficiency and high bandwidth, they take advantage of the characteristics of the very short channels that connect the dies. These interfaces typically consist of a PHY and a controller block that provide a seamless connection between the internal interconnect fabric on the two dies.

While scenarios such as interoperability of devices from different vendors is governed by standards, in the die-to-die space, there currently aren’t yet any such industry standards. Synopsys, however, is involved in defining the standards and driving them to completion via organizations including the Open Compute Project.

When it comes to inter-die connectivity, what’s essential are interfaces that can support:

  • High power efficiency
  • Short-reach, low-loss channels without any significant discontinuities
  • Low latency
  • High bandwidth efficiency
  • Robust, error-free performance

For applications like HPC, hyperscale data centers, AI, and networking, there are four major use cases for die-to-die interfaces:

  • Scale SoC increases compute power and creates multiple SKUs for server and AI accelerators by connecting dies through virtual connections for tightly coupled performance across dies.
  • Split SoC fosters very large SoCs while also improving yield, lowering cost, and extending Moore’s law by dividing the large monolithic SoC into smaller dies that are assembled together.
  • Aggregate implements multiple disparate functions in different dies to take advantage of the optimal process node for each function. This approach also helps reduce power and improve form factor in applications such as FPGAs, automotive, and 5G base stations.
  • Disaggregate separates the central digital chip from the I/O chip for easy migration of the central chip to advanced processes. This approach keeps the I/O chips in more conservative nodes to lower the risks and cost of product evolution, support reuse, and improve time to market.

Comprehensive Die-to-Die IP Solution

Synopsys offers a complete die-to-die IP solution that provides the high bandwidth and low latency that SoCs for applications like HPC, AI, and networking demand. Having a complete solution provides an infrastructure that eliminates the need to write pieces of code or develop bridges. The IP solution consists of:

  • DesignWare® Die-to-Die Controller IP, which is integrated with the DesignWare USR/XSR PHY IP to provide the industry’s lowest latency for an end-to-end die-to-die link, featuring error recovery mechanisms for higher data integrity and link reliability. The controller IP supports the AMBA® CXS and AXI protocols for coherent and non-coherent data communication. It also integrates with Arm® Neoverse™ Coherent Mesh Network for enhanced performance of multi-chip, memory expansion, and accelerator solutions.
  • DesignWare Die-to-Die PHY IP, including USR/XSR PHY IP that uses high-speed SerDes PHY technology up to 112 Gbps per lane for ultra- and extra-short reach links and High-Bandwidth Interconnect (HBI) PHY IP delivering 8 Gbps per pin die-to-die connectivity with low latency for high-density 2.5D packaged SoCs.

The Die-to-Die Controller and PHY IP are part of a multi-die solution from Synopsys that also includes DesignWare HBM IP for HBM requirements of HPC SoCs and the 3DIC Compiler unified platform for advanced multi-die system design and integration. This multi-die solution helps accelerate SoC designs that require advanced packaging.

The move from monolithic silicon chips to die-to-die architectures will only continue to grow given the rise of compute-intensive, workload-heavy HPC applications. High-bandwidth, low-latency IP that are developed and designed based on ongoing standards specifications are essential to ensuring that hyperscale data centers and the like can deliver on their promise of massive volumes of fast computations for applications as varied as video streaming, vaccine discovery, and storm tracking.

Continue Reading