Objective 2.3 – Troubleshoot virtual switch solutions

Understand the NIC Teaming failover types and related physical network settings / Determine and apply Failover settings

You can change NIC teaming by editing the VSS Setting -> Teaming and Failover -> Change you desired settings.

vSwitch14

NIC team policies allow you to determine how network traffic is distributed between physical adapters and how to route traffic in the event of an adapter failure. NIC teaming policies include load-balancing and failover settings. Default NIC teaming policies are set for the entire standard switch however you can override these default settings at the port group level.

Load Balancing Polices

Routed Based on the Originating Virtual Port: The virtual switch selects uplinks based on the virtual machine port IDs on the vSphere Standard Switch or vSphere Distributed Switch.

Each virtual machine running on an ESXi host has an associated virtual port ID on the virtual switch. To calculate an uplink for a virtual machine, the virtual switch uses the virtual machine port ID and the number of uplinks in the NIC team. After the virtual switch selects an uplink for a virtual machine, it always forwards traffic through the same uplink for this virtual machine as long as the machine runs on the same port. The virtual switch calculates uplinks for virtual machines only once, unless uplinks are added or removed from the NIC team.

The port ID of a virtual machine is fixed while the virtual machine runs on the same host. If you migrate, power off, or delete the virtual machine, its port ID on the virtual switch becomes free. The virtual switch stops sending traffic to this port, which reduces the overall traffic for its associated uplink. If a virtual machine is powered on or migrated, it might appear on a different port and use the uplink, which is associated with the new port.

Route Based on IP Hash:The virtual switch selects uplinks for virtual machines based on the source and destination IP address of each packet.

To calculate an uplink for a virtual machine, the virtual switch takes the last octet of both source and destination IP addresses in the packet, puts them through a XOR operation, and then runs the result through another calculation based on the number of uplinks in the NIC team. The result is a number between 0 and the number of uplinks in the team minus one. For example if a NIC team has four uplinks, the result is a number between 0 and 3 as each number is associated with a NIC in the team. For non-IP packets, the virtual switch takes two 32-bit binary values from the frame or packet from where the IP address would be located.

Any virtual machine can use any uplink in the NIC team depending on the source and destination IP address. In this way, each virtual machine can use the bandwidth of any uplink in the team. If a virtual machine runs in an environment with a large number of independent virtual machines, the IP hash algorithm can provide an even spread of the traffic between the NICs in the team. When a virtual machine communicates with multiple destination IP addresses, the virtual switch can generate a different hash for each destination IP. In this way, packets can use different uplinks on the virtual switch that results in higher potential throughput.

However, if your environment has a small number of IP addresses, the virtual switch might consistently pass the traffic through one uplink in the team. For example, if you have a database server that is accessed by one application server, the virtual switch always calculates the same uplink, because only one source-destination pair exists.

Route Based on Source MAC Hash: The virtual switch selects an uplink for a virtual machine based on the virtual machine MAC address. To calculate an uplink for a virtual machine, the virtual switch uses the virtual machine MAC address and the number of uplinks in the NIC team.

Use explicit Failover Oder: No actual load balancing is available with this policy. The virtual switch always uses the uplink that stands first in the list of Active adapters from the failover order and that passes failover detection criteria. If no uplinks in the Active list are available, the virtual switch uses the uplinks from the Standby list.

The above descriptions lifted from VMware

Network Failure detection: Using this will detect the link state of the physical adapter. If the physical switch fails or someone unplugs the cable from the NIC or the physical switch, failure will be detected and failover initiated.

  • Link Status only: Relies solely on the link status that the network adapter provides. This option detects failures, such as cable pulls and physical switch power failures, but not configuration errors, such as a physical switch port being blocked by spanning tree or that is misconfigured to the wrong VLAN or cable pulls on the other side of a physical switch.
  • Bacon probing: Sends out and listens for beacon probes on all NICs in the team and uses this information, in addition to link status, to determine link failure. This detects many of the failures previously mentioned that are not detected by link status alone.

Notify Switch: If you select Yes, whenever a virtual NIC is connected to a vSS or whenever that virtual NIC’s traffic would be routed over a different physical NIC in the team because of a failover event, a notification is sent out over the network to update the lookup tables on physical switches. In almost all cases, this process is desirable for the lowest latency of failover occurrences and migrations with vMotion.      

Failback: This option determines how a physical adapter is returned to active duty after recovering from a failure.

These configurations can all be made from the CLI:

Example 1: Change the load balancing policy of vSwitch2 into IP based Hashing

#esxcli network vswitch standard policy failover set -v vSwitch2 -l iphash where -l = load balancing (portid, iphash, mac, explicit) and -v = vswitch name

Example 2: Change the Network failover detection method of vSwitch2 into beacon probing

#esxcli network vswitch standard policy failover set -v vSwitch2 -f beacon where -f = failure detection (link, beacon) and -v = vswitch name

Example 3: Set the Notify Switch option of a vSwitch2 to No

#esxcli network vswitch standard policy failover set -v vSwitch2 -n false where  -n = notify switch

Example 4: Set the failback of vSwitch2 to No

#esxcli network vswitch standard policy failover set -v vSwitch2 -b false where -b = failback

Configure explicit failover to conform with VMware best practices / Configure port groups to properly isolate network traffic

vSphere Networking, Chapter 7 “Networking Best Practices”, page 75

Summary: The vSphere Networking Guide contains a small section on Networking Best Practices.

Concerning this objective, the idea is to separate network services from one another, provide bandwith and failover in case of failure.

The Management Network uses vmnic0 as a active uplink and vmnic1 as a Standby adapter. The second Portgroup vMotion is configured exactly the other way around.

Management Network
VLAN 2
Management Traffic is Enabled
vmk0: 192.168.2.53
vmnic0 Active / vmnic1 Standby
Load balancing: Use explicit failover order
Failback: No

vMotion
VLAN 21
vMotion is Enabled
vmk1: 192.168.21.53
vmnic1 Active / vmnic0 Standby
Load balancing: Use explicit failover order
Failback: No

Given a set of network requirements, identify the appropriate distributed switch technology to use

Along with the dvSwitch, you can also make use of 3rd party virtual switches (which sit on top of the dvSwitch, but allow additional functionality). Currently the only “supported”/available vendor supplied virtual switch is the Cisco Nexus 1000v. The intention of this post isn’t to go into the installation or configuration of the Nexus 1000v, but to mention some of the similarities and differences when compared to the built-in dvSwitch.

  • Both the vDS and the Cisco Nexus 1000v require Enterprise Plus Licensing
  • The Nexus 1000v requires additional licencing from Cisco.
  • vDS is managed using the vSphere client, whereas the Cisco Nexus 1000v is managed like a physical Cisco switch (Cisco IOS and network management tools).
  • The Nexus 1000v uses a virtual supervisor module and a virtual ethernet module. The supervisor modules are installed as virtual appliances.

The main advantage to using the Nexus 1000v is that the virtual switches can be managed as if they are part of the physical switch infrastructure, with the same management and monitoring tools. This allows an organisation’s networking team to take ownership of the networking infrastructure of the virtual machines, and allows for features present in Cisco IOS that aren’t available on VMware’s dvSwitch.

Configure and administer vSphere Network I/O Control

Network I/O Control (NIOC) is a feature of a dvSwitch which allows the use of network resource pools which determine the bandwidth that different network types are granted. When enabled, NIOC divides traffic into the following resource pools:

  • Fault Tolerance traffic
  • iSCSI traffic
  • vMotion traffic
  • Management traffic
  • vSphere Replication traffic
  • NFS traffic
  • Virtual Machine traffic

These can be seen under the Resource Allocation tab of the dvSwitch:

nioc-network-resource-pools

In addition to the default network resource pools, you can create User-defined network resource pools, and set a QoS (Quality of Service) tag, so that the physical network switches understand the priority given to various traffic types.

NIOC can only be configured on a vDS, and therefore requires Enterprise Plus licensing.

Enabling Network I/O Control

The first step to configuring NIOC is to enable the feature on the dvSwitch. To do so, browse to the Resource Allocation tab for the dvSwitch, then click properties:

enable_network-io-control

In the dialog box, tick the box to enable NIOC:

enable-nioc

You should now see that NIOC has been enabled:

nioc

Creating User-Defined Network Resource Pools

To create a new Network Resource Pool, click on ‘New Network Resource Pool’:

new-network-resource-pool

If necessary you can also edit the existing resource pools:

edit-network-resource-pools

Assigning Portgroups to Network Resource Pools

Once the resource pool has been created, the next step is to assign portgroups to the resource pool. To do so, click ‘Manage Port Groups’:

assign-port-groups-nioc

In the dialog box (shown above), you can assign network resource pools to port groups. The benefit of doing the assignments here is that you can select multiple portgroups to assign a resource pool to. You can also select a network resource pool for a port group in the port group’s settings:

Use command line tools to troubleshoot and identify configuration items from an existing vDS

Whilst distributed switches are mostly configured through vCenter, there are a number of commands that can be used via an ESXi host’s CLI to display and alter vDS configuration (although these are limited to adding and removing uplinks).

To view information on the all the vSwitches on a given host you can use:

# esxcfg-vswitch -l

vds22

The esxcfg-vswitch commands relating directly to operations on a dvSwitch are:

 -P|--add-dvp-uplink=uplink  Add an uplink to a DVPort on a DVSwitch.
                              Must specify DVPort Id.
  -Q|--del-dvp-uplink=uplink  Delete an uplink from a  DVPort on a DVSwitch.
                              Must specify DVPort Id.
  -V|--dvp=dvport             Specify a DVPort Id for the operation.

The output above shows that vmnic4 and vmnic5 are being used as uplinks on the dvSwitch. If we wanted to remove vmnic5 from the switch we could run:

# esxcfg-vswitch -Q vmnic5 -V 5 DSwitch-Lab-VMs

Listing the host’s vSwitches now shows that vmnic9 isn’t being used as an uplink on the dvSwitch:

# esxcfg-vswitch -l
DVS Name         Num Ports   Used Ports  Configured Ports  MTU     Uplinks
dvSwitch         256         2           256               1500    vmnic8

dvs23

To add vmnic9 back as an uplink, run:

# esxcfg-vswitch -P vmnic9 -V 5 DSwitch-Lab-VMs