Panmnesia, the Korean CXL specialist, has proposed a datacentre architecture which adds fast links, collectively named X-links including UALink, NVLink and others, with the resource sharing advantages of CXL. Panmesia calls the approach CXL-over-Xlink.
www.electronicsweekly.com, Jul. 16, 2025 –
Panmnesia reports up to 5.3x faster AI training and a 6x reduction in inference latency compared to existing PCIe and RDMA-based designs.
The architecture enables several enhancements for scalable AI datacentres:
1. Compute and memory can be scaled independently. GPUs and CPUs gain access to large, shared pools of external memory via the CXL fabric, which eliminates the memory bottlenecks of traditional architectures, especially for memory-bound AI workloads. Instead of being limited by the fixed memory within each GPU, workloads can draw on terabytes or even petabytes of memory as needed.
2. Composable Infrastructure: Resources—whether compute, memory, or accelerators—can be dynamically allocated, pooled, and shared across disaggregated systems. This flexibility enables operators to adapt quickly to changing AI workload demands without costly overprovisioning or hardware upgrades.
3. Reduced Communication Overhead: By using accelerator-optimized links for carrying CXL traffic, Panmnesia’s architecture minimizes the “communication tax” that plagues GPU-centric clusters, reducing data movement between distant nodes and keeping memory access coherent and high-throughput. This leads to substantially lower latency (with CXL IP delivering sub-100ns latency) and increased effective bandwidth.
4. Hierarchical Memory Model: AI workloads benefit from a new memory hierarchy that combines local high-bandwidth memory (like HBM) with pooled CXL memory, allowing efficient training and inference of large models without constant swapping or bottlenecks.
5. Scalable, Low-Latency Switching Fabric: Panmnesia’s CXL 3.1 switches support cascading and multi-level connectivity, so hundreds of devices across many servers can access memory pools and accelerators efficiently, avoiding single-switch bottlenecks and enabling true scale-out AI fabrics