Objective 3.3 – Troubleshoot vSphere clusters

Configure EVC using appropriate baseline

EVC increases vMotion compatibility by masking off CPU features which aren’t consistent across the cluster.  It’s enabled at cluster level and is disabled by default.

NOTE: EVC does NOT stop VMs from using faster CPU speeds or hardware virtualisation features that might be available on some hosts in the cluster.

NOTE: EVC is required for FT to work with DRS.

Requirements

  • All hosts in the cluster must have CPU’s from the same vendor (Intel or AMD)
  • All hosts must have vMotion enabled (if not who cares about CPU compatibility?)
  • Hardware virtualisation must be enabled in the BIOS (if present). This is because EVC runs a check to ensure the processor has the features it thinks should be present in that model of CPU.

NOTE: This includes having the ‘No Execute’ bit enabled.

Configuring a new cluster for EVC

  1. Determine which baseline to use based on the CPUs in your hosts (check the Datacenter Administration Guide chapter 18 for a compatibility table if you don’t know, or the more in-depth VMwareKB1003212 although this won’t be available during the exam)
  2. Configure EVC on the cluster (prior to adding any hosts)
  3. If the host has newer CPU features (compared to your EVC baseline) power off all VMs on the host.
  4. Add hosts to the cluster.

Changing the EVC level on an existing cluster

  • You can change EVC to a higher baseline with no impact. VMs will not benefit from new CPU features until each VM has been power cycled (a reboot isn’t sufficient).
  • When downgrading the EVC baseline you need to power off (or vMotion out of the cluster) all running VMs
  • An alternative approach is to create a new cluster, enable the correct EVC mode, and then move the hosts from the old cluster to the new cluster one at a time.

Create and manage DRS and DPM alarms

vCenter allows you to create alarms to monitor DRS related events. In the Alarms tab, for a host or cluster, under ‘Definitions’, right click and select ‘New Alarm’. In the dialog box for creating the new alarm, choose ‘cluster’ as the alarm type:

drs01

You can then create the required triggers to monitor the DRS cluster:

drs02

Alerts can be created for DPM by looking for host events. When setting up the alert, select ‘Hosts’ rather than ‘Cluster’ for the alert type, ensure that ‘Monitor for specific events’ is selected. Then under the ‘Triggers’ tab you can select relevant DPM related events:

drs03

These include (names a bit misleading but DPM is a part of DRS):

  • DRS cannot exit the host out of standby mode
  • DRS entered standby mode
  • DRS entering standby mode
  • DRS exited standby mode
  • DRS exiting standby mode

Properly size virtual machines and clusters for optimal DRS efficiency

When designing/configuring a DRS cluster it is important to size the virtual machines within the cluster correctly. Likewise, it’s important to size your clusters correctly for the virtual machine workloads you intend to run. Virtual machines that have smaller memory allocations and fewer virtual CPUs assigned give DRS more chance of migrating them in order to improve cluster performance. It’s important to give virtual machines only the amount of memory and vCPU resources that they actually need. Virtual machines that have over-allocated resources can cause DRS to be less efficient than it otherwise would be. This can lead to a cluster becoming over committed or invalid.

vCenter will alert you to these conditions by a presenting a yellow warning when the cluster is over committed and a red warning when the cluster is invalid. There are a number of possible causes for a cluster to become over committed or invalid, including:

  • In the event of a host failure a cluster can become over committed due to the reduced amount of physical resources available.
  • A cluster will become invalid in vCenter if a virtual machine is powered on outside the control of vCenter (e.g. connecting the vSphere client/PowerCLI directly to a host, therefore bypassing vCenter if it is unavailable)
  • Likewise, if changes to a virtual machine are made outside of vCenter, then the cluster may be invalid once the host regains connectivity to vCenter.

Its important to set the resource settings for virtual machines carefully. This includes reservations, limits and shares. Setting reservations too high can leave too few spare resources in the cluster, limiting the effectiveness of DRS.

Along with over allocated virtual machines, another thing that can impact DRS efficiency is affinity rules.

Properly apply virtual machine automation levels based upon application requirements

When you create a DRS enabled cluster you set the cluster’s default automation level, shown below:

drs04

The DRS automation level set here can be overridden on a per virtual machine basis. This can be configured under ‘Virtual Machine Options’ within the cluster’s DRS settings. There are a number of different options available here:

  • Default
  • Disabled
  • Fully Automated  (Automatic Initial Placements and Migrations)
  • Partially Automated (Automatic Initial Placements only)
  • Manual

Along with configuring these settings using the vSphere client, you can also set per virtual machine DRS automation levels using PowerCLI, which can be useful if making changes in bulk. As an example, to change the automation level for a virtual machine called ‘Temp′ you could run:

drs05

Looking at the settings in the vSphere client, Temp has been changed, with it’s automation now being ‘Disabled’:

drs06