GPU Memory Cluster and Memory Fabric

You can use GPU memory clusters to group, monitor, and manage high performance computing (HPC), GPU, or optimized instances together, and run high-performance clusters with more flexibility. Each GPU memory cluster is built on a single GPU memory fabric, the infrastructure that enables communication between GPUs. You use GPU memory clusters in conjunction with, not instead of, compute clusters.

Important

You must be a Dedicated Capacity customer to use GPU memory clusters and GPU memory fabric. To switch your host capacity, contact Oracle Support by opening a Support Request (SR).
With GPU memory clusters, you can:
  • Create a memory cluster from a set of GPUs.

    For example, NVIDIA NVLink 72 supports up to 18 Compute hosts each.

  • Combine many memory clusters into one large cluster that spans across a large network. GPU memory clusters are designed to scale at the rack level and let you scale up, whereas compute clusters let you scale out.
    • GPU memory clusters facilitate host-to-host/GPU-to-GPU communication.
    • Compute clusters facilitate communication, via RoCE or InfiniBand, between hosts/GPUs on different GPU memory fabrics.
  • View all GPU memory clusters and see how they're connected.

    See ListComputeGpuMemoryClusters and Exploring Your GPU Memory Clusters and Memory Fabric.

  • Track performance metrics for each memory cluster.
  • Add or remove GPUs as needed.

Supported Compute shapes are BM.GPU.GB200.4 and BM.GPU.GB300.4.