Use GPUs with Clustered VMs through Direct Device Assignment

19 June 2024 at 21:38

In the rapidly evolving landscape of artificial intelligence (AI), the demand for more powerful and efficient computing resources is ever-increasing. Microsoft is at the forefront of this technological revolution, empowering customers to harness the full potential of their AI workloads with their GPUs. GPU virtualization makes the ability to process massive amounts of data quickly and efficiently possible. Using GPUs with clustered VMs through DDA (Discrete Device Assignment) becomes particularly significant in failover clusters, offering direct GPU access.

Using GPUs with clustered VMs through DDA allows you to assign one or more entire physical GPUs to a single virtual machine (VM). DDA allows virtual machines (VMs) to have direct access to the physical GPUs. This results in reduced latency and full utilization of the GPU’s capabilities, which is crucial for compute-intensive tasks.

Figure 1: This diagram shows users using GPU with clustered VMs via DDA, where full physical GPU are assigned to VMs.

Using GPUs with clustered VMs enables these high-compute workloads to be executed within a failover cluster. A failover cluster is a group of independent nodes that work together to increase the availability of clustered roles. If one or more of the cluster nodes fail, the other nodes begin to provide service, meaning high availability by failover clusters. By integrating GPU with clustered VMs, these clusters can now support high-compute workloads on VMs. Failover clusters use GPU pools, which are managed by the cluster. An administrator creates these GPU pools name and declares a VM’s GPU needs. Pools are created on each node with the same name. Once GPUs and VMs are added to the pools, the cluster then manages VM placement and GPU assignment. Although live migration is not supported, in the event of a server failure, workloads can automatically restart on another node, minimizing downtime and ensuring continuity.

Using GPU with clustered VMs through DDA will be available in Windows Server 2025 Datacenter and was initially enabled in Azure Stack HCI 22H2.

To use GPU with clustered VMs, you are required to have a Failover Cluster that operates on Windows Server 2025 Datacenter edition and ensure the functional level of the cluster is at the Windows Server 2025 level. Each node in the cluster must have the same set up, and same GPUs in order to enable GPU with clustered VMs for failover cluster functionality . DDA does not currently support live migration. DDA is not supported by every GPU. In order to verify if your GPU works with DDA, contact your GPU manufacturer. Ensure you adhere to the setup guidelines provided by the GPU manufacturer, which includes installing the GPU manufacturer specific drivers on each server of the cluster and obtaining manufacturer-specific GPU licensing where applicable.

For more information on using GPU with clustered VMs, please review our documentation below:

Use GPUs with clustered VMs on Hyper-V | Microsoft Learn

Deploy graphics devices by using Discrete Device Assignment | Microsoft Learn

Windows OS Platform Blog articles
Introducing GPU Innovations with Windows Server 2025afiaboakye 10 June 2024 at 19:14

Introducing GPU Innovations with Windows Server 2025

Windows OS Platform Blog articles

By: afiaboakye

10 June 2024 at 19:14

Afia Boakye and Rebecca Wambua

AI empowers businesses to innovate, streamline operations, and deliver exceptional value. With the upcoming Windows Server 2025 Datacenter and Azure Stack HCI 24H2 releases, Microsoft is empowering customers to lead their businesses through the AI revolution.

Here is what Hari Pulapaka, GM of Windows Server at Microsoft, says about how Windows Server empowers customers with AI: Windows Server 2025 is well positioned to help our customers be part of the AI revolution with its advanced GPU capabilities, allowing our customers to do training, learning, or inferencing using powerful NVIDIA GPUs.

GPUs are essential for AI due to their parallel processing capabilities and highly scalable architecture. Using the upcoming OS releases, Microsoft’s customers can provide an entire GPU to a VM, which can run either Linux or Server, in a failover cluster using discrete device assignment (DDA). This means that mission-critical AI workloads can easily run in a clustered VM and, upon an unexpected fault or a planned move, the VM will restart on another node in the cluster, using a GPU on that node.

GPU Partitioning (GPU-P) is a powerful new capability we are adding with Windows Server 2025. GPU-P empowers customers to partition a supported GPU and assign those partitions to different VMs in a failover cluster. This means that multiple VMs can share a single physical GPU, giving each VM an isolated fraction of the physical GPU's capabilities.

Further, due to a planned or unplanned move, the VMs will restart on different nodes in the cluster, using GPU partitions on those different nodes. Besides enabling clustered VMs to use GPU-P, the upcoming OS releases are bringing live migration to VMs using GPU-P. Live migration for GPU-P enables customers to balance mission-critical workloads across their fleet and to conduct hardware maintenance and software upgrades without stopping their VMs.

Windows Administration Center (WAC) empowers customers to configure, use, and manage VMs using virtualized GPUs. WAC enables administrators to manage GPU virtualization for both standalone and failover clusters from a single, centralized location, thereby reducing management complexity.

The screenshots below highlight GPU-P management in WAC, demonstrating how users can seamlessly view, configure, and assign GPU partitions to VMs.

In this first image, customers can view a comprehensive list of their partitioned GPUs.

Figure 1: The GPU partitions inventory page

Customers can partition eligible GPUs with their desired number of partitions.

Figure 2: The partition count configuration page

Finally, customers can assign GPU partitions to different VMs.

Figure 3: The GPU partition assignment tool

These high-value GPU innovations are a result of Microsoft's and NVIDIA's continual close collaboration.

Here is what Bob Pette, Vice President of Enterprise Platforms at NVIDIA has to say. “GPU virtualization requires advanced security, maximum cost efficiency, and accurate horsepower. With GPU-P now available on NVIDIA GPUs in Windows Server Datacenter, customers can meet these requirements and run their key AI workloads to achieve next-level efficiencies.”

Windows Server 2025 is now available for customers to try out. Click here to download preview media and use these powerful new capabilities.