According to the VSAN maximums, there is a 100 VM limit per host in a VSAN 5.5 cluster and 200 in a VSAN 6 cluster. This seems to be a soft limit as I was recently able to deploy 999 VMs in to a 4 node VSAN 5.5 cluster (with one host acting as a dedicated HA node, so not running any compute). I got to ~333 VMs per host before I reached the 3000 component limit (which is a hard limit) on each host. The below is a screen grab of vsan.check_limits from RVC:
After upgrading VSAN 5.5 to VSAN 6.0 I thought it would be a good idea to run the same set of tests that I ran previously (VSAN 5.5 Performance Testing) to see how much of a performance increase we could expect.
The test was run using the same IOAnalyser VMs and test configuration, on the same hardware. The only different was the vCenter/ESXi/VSAN version.
Read Only Write Only Real World
VSAN 6 IOPs (Sum) 113,169.88 28,552.85 38,658.47
VSAN 5 IOPs (Sum) 61,964.81 5,666.06 24,228.98
The detailed test results for VSAN 6, using a “Real World” test pattern (70% Random / 80% Read) as below:
Great increase in IOPs!
When running through some VSAN operational readiness tests I stumbled across an issue when simulating host failures. When there are more VSAN Components than physical disks and a host fails, the components will not be rebuilt on remaining hosts.
Firstly here is some background information about the test cluster:
- 4 x Dell R730XD Servers
- 1 Disk Group per server with one 800GB SSD fronting six 4TB Magnetic Disks
- 1 Test VM with a single 1.98TB VMDK
- Disks to Stripe set to 1 on the storage policy applied to the VM
- Failure to Tolerate set to 1 on the storage policy applied to the VM
- ESXi 5.5 and VSAN 5.
- All drivers/firmware on the VSAN HCL
The VMDK Object is split into 24 components (8 x “Primary” components (each 250GB), 8 x “Copy” components (each 250GB) and 8 x “Witness” components
Note: VSAN does not really have “Primary” and “Copy” components but for the sake of the following diagrams and ease of explanation I’ll group the components this way.
More VSAN Components than Physical Disks – Failure Scenario
Today I installed the VSAN Health Plugin – VSAN 6 Health Check Plugin. Unfortunately I did not RTFM (Read the… frigging… manual).
When I logged into the web client after restarting the vCenter services this is all I could see:
Turns out I didn’t install the msi using the “run as admin” option… really should have read that manual.
Cormac Hogan to the rescue VMware Blogs
After searching around and not finding anything that covers the entire string, I figured I’d throw in what information I have. I like to label my datastores with my source information, which makes it easy to search and isolate when SAN work has to be performed. The labels make it easy, but I’m relying on information gathered to create these labels. So, this is a way, if nothing else, to validate that the information is being applied to the correct identifier.
I recently carried out some VSAN performance testing using 3 Dell R730xd servers:
- Intel Xeon E5-2680
- 530GB RAM
- 2 x 10GbE NICs
- ESXi5.5, 2068190
- 800GB SSD (12GB/s Transfer Rate)
- 3 x 4TB (7200RPM SAS disks)
On each of these hosts I built a IOAnalyzer Appliance (https://labs.vmware.com/flings/io-analyzer) (1 with it’s disks placed on the same host as the VM and the other 2 with “remote” disks). Something similar to this:
HA works differently on a VSAN cluster than on a non-VSAN cluster.
- When HA is turned on in the cluster, FDM agent (HA) traffic uses the VSAN network and not the Management Network. However, when a potential isolation is detected HA will ping the default gateway (or specified isolation address) using the Management Network.
- When enabling VSAN ensure vSphere HA is disabled. You cannot enable VSAN when HA is already configured. Either configure VSAN during the creation of the cluster or disable vSphere HA temporarily when configuring VSAN.
- When there are only VSAN datastores available within a cluster then Datastore Heartbeating is disabled. HA will never use a VSAN datastore for heartbeating as the VSAN network is already used for network heartbeating using the Datastore for heartbeating would not add anything,
- When changes are made to the VSAN network it is required to re-configure vSphere HA.
ESXi Isolation – VM with no underlying storage
A four node cluster with a single VM (with a single VMDK less that 255GB), deployed with a Storage Policy of “Failures to Tolerate = 1” and “Disks to Stripe = 1” would look like this:
vCenter Server with Embedded PSC and Database
The embedded PSC is meant for standalone sites where vCenter Server will be the only SSO integrated solution. In this case replication to another PSC should not be required and is not possible.
- Device is a single point of failure.
- Supports Windows and VCSA (vCenter Server Appliance) based deployments
- Replication between PSCs not required
- Multiple standalone instances supported
- Sufficient for small scale deployments
- Not suitable for use with other VMware products (vRA, NSX etc.)
- Easy to deploy and maintain
Recently I added an ESXi host to a VSAN cluster (HP DL380 G9 with 2 x 800GB SSD and 6 x 4TB Magnetic) however when creating the disk group only 2 disks were available for use.
After some investigation it turns out the other disks already had partitions, this was confirmed by running the following command:
Running the PartedUtil getptbl /vmfs/devices/disks/naa.5000xxxxxxxxx command fails with the error: “Error: Can’t have a partition outside the disk!”
To resolve this (and allow the disks to be added to a VSAN disk group) I ran the following:
partedUtil setptbl /vmfs/devices/disks/naa.5000xxxxxxxxxxx msdos
Then reboot the host.
***This will destroy any data already on the disks***