CMSSO-UTIL – Failed Service Registration

All credit for this post belongs to one of my colleagues called Imran Mughal, a very talented vSphere engineer!

We recently came across an issue during some site migration work on our PSC’s. The scenario which drove our PSC site migration work was the fact that we noticed all of our vCenter Servers were trying to authenticate users via a single load balanced pair of PSCs regardless of physical location instead of the PSCs in the datacentre local to the vCenter. This was due to us only using a single site name that covered 4 separate datacentre locations. All of our PSCs regardless of which physical location formed part of this single site.

As we already had live service running on out vCenters we decided to re-direct vCenters to another PSCs and rebuild in pairs (as there is currently no way to manually change the site name or create new sites over an existing deployment).

This was done using a combination of these two KB articles:

KB2131191 & KB2113917

The initial repoint of the vCenters was successful and no issues were seen as we didn’t need to move and service registrations. Only after we had re-installed pairs of PSCs in their new sites and did the second vCenter repoint did we start to see issues (this was a site repoint also). This repoint was completed using the updated cmsso_util from VMware (listed in the KB 2131191). This version of the cmsso_util has an extra command line option of “move-services” which doesn’t exist out of the box. The services in question are service registrations from the vCenter install…a bit like what you see in the managed object browser extension manager when you connect to the VC (https://vc-name/mob/).

MOB_Image_2

When you connect to the PSC via a LDAP browser like Jxplorer (unsupported by VMware!!!), you can see a similar set of registrations in the site->service registrations. William Lam’s blog on Jxplorer and connecting to the PSC here was very useful.

JExplorer_Image_2

 

So during a vCenter site migration using cmsso_util tool we found that some registrations failed to migrate and rather than still leftover on our old site they had actually gone.

The vCenters were repointed and services migrated using these two commands.

"%VMWARE_PYTHON_BIN%" cmsso-util repoint --repoint-psc FQDN_of_PSC_New_Site

"%VMWARE_PYTHON_BIN%" cmsso-util move-services

It was the second command which failed and we were left with the error “unable to move services across sites”

Failure_1

At first we didn’t actually see any problems on the vCenter that was moved but then we noticed a couple of issues with VMotion and storage policies (PBM) as below:

Failure_2

From this we found that the Profile-Driven storage service was in an unknown state but running:

Failure_3

Failure_4

This is where we raised a case with VMware GSS whilst we internally investigated.

VMware found the relevant logs:

2016-08-23T14:40:32.212+01:00 [WrapperSimpleAppMain] ERROR opId= com.vmware.vim.storage.common.kv.KvClientManager - Failed to lookup KV store from Component Manager.
 java.lang.NullPointerException
 at com.vmware.vim.storage.common.util.ComponentManagerService.getServiceEndpointData(ComponentManagerService.java:455)
 at com.vmware.vim.storage.common.util.ComponentManagerService.lookupKvStore(ComponentManagerService.java:320)

Which basically translated into “the KV Store endpoint is missing”

After thorough searching of the PSC via Jxplorer and comparisons with other sites we found the missing service registration.

Site 1

Site_1b

 

Site 2

Site_2

Site 3

Site_3

Site 4 (The site with the issue!)

Site_4

VMware agreed that this was the issue and worked on a fix with their Engineering team. We did some testing on a test vCenter stack and found we could recreate the entries manually by exporting branches of the LDAP tree and then modifying and re-importing but that was completely unsupported by VMware and hadn’t ever been tested. Testing for this method could have taken weeks and we didn’t have that long!

Thankfully VMware came back with a solution from engineering that used current tools that come with vCenter 6.0. The only problem was that this depended on some temp files being leftover in my %temp% directory when I ran the move-services command on the VC.

Luckily we had them (they are always named cmsso_svspec_…….). I’m not sure how they make up the last few characters of the file!

Recovery_1

These files can be used to manually register the service on the Platform Services Controller. (Snapshot everything beforehand… seriously!)

To manually register the service, you must logon to the OS of the Platform Service Controller then copy the cmsso_svcspec file/files to folder C:\Program Files\VMware\vCenter Server\VMware Identity Services\lstool\scripts

You can identify which file is used for each service by searching for “serviceId” within the file (open in notepad) for example in the file cmsso_svcspec_3v2mps we find the entry:

serviceId=88d941cd-ae85-46f2-aeaa-9c30d2897137_kv

So to register this service we run the following command from the folder C:\Program Files\VMware\vCenter Server\VMware Identity Services\lstool\scripts (On the VC):

"%VMWARE_PYTHON_BIN%" lstool.py register --url https://localhost/lookupservice/sdk --user administrator@vsphere.local --password "password" --spec cmsso_svcspec_3v2mps --id 88d941cd-ae85-46f2-aeaa-9c30d2897137_kv --no-check-cert

–id is the serviceId entry in the cmsso_svcspec file.

Recovery_2

After this, we must restart the vCenter (or all vCenter services.) (PSC restart not required.)

In the LDAP directory via Jxplorer on the PSC you should now see the missing registration.

Recovery_3b

 

Recovery_4

Once the VC has been restarted we can check the health via the GUI again:

Recovery_5

Recovery_6

VMotion was working again and we could check the storage policies as normal!!

Leave a Reply

Your email address will not be published. Required fields are marked *