HA works differently on a VSAN cluster than on a non-VSAN cluster.
- When HA is turned on in the cluster, FDM agent (HA) traffic uses the VSAN network and not the Management Network. However, when a potential isolation is detected HA will ping the default gateway (or specified isolation address) using the Management Network.
- When enabling VSAN ensure vSphere HA is disabled. You cannot enable VSAN when HA is already configured. Either configure VSAN during the creation of the cluster or disable vSphere HA temporarily when configuring VSAN.
- When there are only VSAN datastores available within a cluster then Datastore Heartbeating is disabled. HA will never use a VSAN datastore for heartbeating as the VSAN network is already used for network heartbeating using the Datastore for heartbeating would not add anything,
- When changes are made to the VSAN network it is required to re-configure vSphere HA.
ESXi Isolation – VM with no underlying storage
A four node cluster with a single VM (with a single VMDK less that 255GB), deployed with a Storage Policy of “Failures to Tolerate = 1” and “Disks to Stripe = 1” would look like this:
vCenter Server with Embedded PSC and Database
The embedded PSC is meant for standalone sites where vCenter Server will be the only SSO integrated solution. In this case replication to another PSC should not be required and is not possible.
- Device is a single point of failure.
- Supports Windows and VCSA (vCenter Server Appliance) based deployments
- Replication between PSCs not required
- Multiple standalone instances supported
- Sufficient for small scale deployments
- Not suitable for use with other VMware products (vRA, NSX etc.)
- Easy to deploy and maintain
Recently I added an ESXi host to a VSAN cluster (HP DL380 G9 with 2 x 800GB SSD and 6 x 4TB Magnetic) however when creating the disk group only 2 disks were available for use.
After some investigation it turns out the other disks already had partitions, this was confirmed by running the following command:
Running the PartedUtil getptbl /vmfs/devices/disks/naa.5000xxxxxxxxx command fails with the error: “Error: Can’t have a partition outside the disk!”
To resolve this (and allow the disks to be added to a VSAN disk group) I ran the following:
partedUtil setptbl /vmfs/devices/disks/naa.5000xxxxxxxxxxx msdos
Then reboot the host.
***This will destroy any data already on the disks***
Always validate the network configuration. The VSAN Misconfiguration detected error is by far the most common error seen when configuring VSAN. Normally this means that either the port group has not been successfully configured for Virtual SAN or multicast has not been set up properly.
VSAN Misconfiguration Detected
On Cisco switches, unless an IGMP Snooping Carrier has been configured OR IGMP snooping has been explicitly disabled on the ports used for Virtual SAN, configuration will generally fail. In the default configuration it is simply not configured, and therefore, even if the network admin says it is configured properly it may not be configured at all, double check it to avoid any pain.
IOAnalyzer will boot and attempt to acquire a DHCP allocated address, if this process fails you will need to configure a static IP. To do this complete the following steps:
Log onto the appliance (via the vSphere Remote Console) with the username: “root” and password “vmware”:
Wait until the terminal displays the following error:
Host memory is a limited resource. VMware vSphere incorporates sophisticated mechanisms that maximize the use of available memory through page sharing, resource-allocation controls, and other memory management techniques. However, several of vSphere Memory Over-commitment Techniques only kick-in when the host is under memory pressure.
Active Guest Memory
Amount of memory that is actively used, as estimated by VMkernel based on recently touched memory pages so it is what the VMkernel believes is currently being actively used by the VM.
The following is a description of some common ESXi and VM CPU Performance Issues:
High Ready Time
Ready Time above 10% could indicate CPU contention and might impact the Performance of CPU intensive application. However, some less CPU sensitive application and virtual machines can have much higher values of ready time and still perform satisfactorily.
High CoStop (CSTP) Time
CoStop time indicates that there are more vCPUs than necessary, and that the excess vCPUs make overhead that drags down the performance of the VM. The VM will likely run better with fewer vCPUs. The vCPU(s) with high CoStop is being kept from running while the other, more-idle vCPUs are catching up to the busy one.
vCPUs are always in one of four states (vCPU States):
WAIT – This can occur when the virtual machine’s guest OS is idle (Waiting for Work), or the virtual machine could be waiting on vSphere tasks. Some examples of vSphere tasks that a vCPU may be waiting on are either waiting for I/O to complete or waiting for ESX level swapping to complete. These non-idle vSphere system waits are called VMWAIT.
ESXi 5.5 and above stores coredumps on a datastore attached to the host, it can also create a vsantraces directory. Both of these can lock the datastore and prevent it from being deleted.
To check for and remove the coredump file do the following:
esxcli system coredump file list Path
Active Configured Size
------------------------------------- ------ ---------- ---------
/vmfs/volumes/xx/vmkdump/xxx.dumpfile false false 702545920
/vmfs/volumes/xxx/vmkdump/xxx.dumpfile true true 702545920
/vmfs/volumes/xxx/vmkdump/xxx.dumpfile false false 702545920
The output shows that there are 3 dump files which are blocking the datastore. Only the owning ESXi host can disable and delete them, so you have to find out which ESXi is responsible for each file: