❌

Normal view

There are new articles available, click to refresh the page.
Today β€” 13 June 2024Reverse Engineering

Hyper-V live migration network selection in Windows Server 2025

Microsoft continues to bring innovation and improvements to our Hyper-V platform. Live migration has been around for a while and is a key component to managing virtual machines (VMs). With Windows Server 2025 you will see improvements that make Hyper-V more reliable, increase scale, and improve performance. This article covers an improvement with Live Migration, and you can expect to see more articles soon to cover other innovations for Windows Server 2025.

Β 

NEW! Live migration network selection for Windows Server 2025

The live migration network selection logic in failover clusters has been improved for Windows Server 2025 to accommodate both directly connected cluster interconnects, and multi-site clusters that do not use a stretched subnet (common cluster network).

Β 

Directly connected cluster interconnects

The network configuration for most failover clusters is either flowing through switches (see diagram 1 below), or direct connections between each node (see diagram 2 below).

The most common reason to use direct connection topology is Storage Spaces Direct (S2D). It requires high bandwidth, low latency, and reliable network interconnects between each node, and recommends enabling RDMA. This can be satisfied through either the switched or switchless topology. Switched allows for easier scale-out and fewer network interfaces per node. Switchless removes the cost of one or more high-bandwidth switches and the complexity of configuring a switch for RDMA. Reliability can be better with switchless configuration since it removes the potential for network interruptions due to switch resets or switch maintenance and misconfiguration. Both networking topologies are valid and have their own advantages and are fully supported.

Β 

StevenEkren_1-1718231644177.png

Diagram 1: Switched interconnect topology

StevenEkren_2-1718231644186.png

Diagram 2: Direct Connected Topology

Β 

Optimizing live migration in directly connected clusters

Live migration moves a VM between servers, and in the case of a failover cluster between cluster nodes of the same cluster. It’s a critical component of the system, allowing the VM to stay running during host maintenance or to load-balance the cluster.

The state of the VM is moved from the source node to the destination node of the cluster through a network. Since most clusters have multiple networks, there is logic implemented to allow identifying and selecting preferred and possible live migration networks. In the switched topology most, if not all, networks are capable of connecting between the nodes. In the switchless topology, most networks only allow connection between pairs of nodes.

Windows Server 2025 has improved logic to more quickly identify which network is optimal between a specific source and destination set of nodes for the live migration. It gets the list of networks that can send traffic between the source and destination from the cluster, then uses the most preferred network and only interfaces that are on cluster networks that are enabled for live migration will be considered. In previous versions, the logic could take more time because the first preferred network would be tried and would wait approximately 20 seconds for it to succeed. If the connection doesn’t succeed, it will try the next until it finds one that does. Therefore, with Windows Server 2025, live migration initiation will be faster and more consistent.

Β 

Optimizing live migration in multi-site clusters

Multi-site clusters (also known as stretched clusters) are commonly deployed for disaster recovery scenarios. VMs can run at either site. If a site goes down, VMs are automatically recovered (restarted) at the other site. While common host maintenance activities like patch/update involve live migration of VMs, it is usually to other nodes in the same site. Live migration of VMs between nodes in different sites is usually used for load balancing or maintenance of systems involving the entire site.

Windows Server 2025 improves the logic in identifying which NICs on the source node of a live migration have a routed path to the destination node. Previously routed paths between nodes were not discovered and could cause issues for live migration. In the examples above (diagrams 1 and 2), there are one or more NICs on the same subnet (cluster network) between every possible pair of nodes. With the multi-site cluster configuration (diagram 3 below), it’s typical that there is no subnet that is common between nodes in different sites. Previously, routed paths between nodes were not discovered and could cause issues for live migration. Windows Server 2025 now accommodates this configuration. When the cluster provides the list of networks in which the source and destination can connect through, it will include routed paths.

StevenEkren_3-1718231644190.png

Diagram 3: Multi-site cluster showing a routed network path between source and destination servers in different sites

Β 

Summary

Hyper-V is a core technology that continues to bring innovation to our on-premises server platforms by bringing new features and functionality that enhance reliability, improve performance, and light up new value. These live migration optimizations are part of the ongoing platform improvement and accrue to both Windows Server 2025 and Azure Stack HCI 24H2.

Β 

Helpful References:

Failover Clustering Networking Basics and Fundamentals - Microsoft Community Hub

New Cluster-Wide Control for Virtual Machine Live Migrations in Windows Server and Azure Stack HCI - Microsoft Community Hub

❌
❌