Flapping Alerts in vRealize Operations 8.x

Flapping vRealize Operations Alerts

I have just discovered this bug care of the tens of thousands of flapping alerts I’ve received in the last month.

Checking my federated vROps cluster to compile a report on the number of alerts generated over December I was greeted with a significantly higher number than I was expecting, especially considering the Christmas Change Freeze which would stop any non urgent tasks. Further investigation showed that it was due to a few dozen alert definition appearing thousands of times each (>6k alerts for one alert type on one cluster for example)

This appears to affect any alert based on receiving a fault symptom, such as all the default vSAN Management Pack Alerts for example.

This manifests itself as an alert going active, and then soon after cancelling, and then reactivating aka flapping. See below for an example for one cluster where the HCL DB wasn’t up to date.

And the cause of this bug is seen in the symptoms view on the object where it creates a new symptom every time instead of updating the existing fault symptom.

If you look at the “cancelled on” value, they were all showing active at the same time, and cancelled when the vSAN HCL DB was updated around 3:30pm on the 23th December. The 50 minute regularity seems to tie in with the vSAN Health Check interval on the vCenter.

I am running vROps 8.1.1 (16522874), but not sure whether this impacts all versions of vROps 8.x but if you see this on any other versions, let me know.

Luckily there is a fix, HF4 which will take you to vROps version 8.1.1 (17258327)

As this pak file is 2.2GB in size, I am unable to host it on my blog for easy download, so I suggest you speak to your VMware TAM, Account Manager, or open a case with Global Support Services and reference this hotfix.

If all else fails I might be able to share it with you using onedrive, however I cannot promise a quick turnaround for that.

UPDATE: I have had it confirmed that this bug affects 8.0 and 8.2 as well, and there are hotfixes for those versions too. The next full release will have the fix built in.

If you are currently on 8.0.x or 8.1.x I would suggest either applying the HF and then upgrading straight to 8.3 when it is released or upgrading to 8.2 first and then applying the HF.

VMware vCenter Security Log Events

I had a requirement from a customer to identify log events in order to create alerts for several threat scenarios. This post is intended to provide a high-level description of the results for the scenarios for future reference or in case anyone finds a use. Please see the earlier post on enabling additional vCenter and PSC logging. http://www.caenotech.co.uk/vmware/configuration-of-rsyslog-on-vcsa-and-psc/

Access to vCenter Administrator role

The objective of the following is to ensure nobody other than certain colleagues have access to the Cryptography operations within vCenter and that all work carried out on crypto operations is done under suitable change control.

As can be seen the default syslog details the Administrator user logging in as VSPHERE.LOCAL\Administrator and the IP it has originated from

<datetime> <vCenterHostname> vcenter-server: User <Domain>\<Username>@<IPAddress> logged in as JAX-WS RI 2.2.9-b130926.1035 svn-revisions#<UID>

<datetime> <vCenterHostname> vpxd <eventID> - - Event [<LineID>] [1-1] [<datetime>] [vim.event.UserLoginSessionEvent] [info] [<Domain>\<Username>] [] [LineID] [User <Domain>\<Username>@<IPAddress> logged in as JAX-WS RI 2.2.9-b130926.1035 svn-revisions#<UID>]

<datetime> <vCenterHostname> vcenter-server: User <Domain>\<Username>@<IPAddress> logged out (login time: <datetime>, number of API invocations: <x>, user agent: JAX-WS RI 2.2.9-b130926.1035 svn-revisions#<UID>)

<datetime> <vCenterHostname> vpxd <eventID> - - Event [<LineID>] [1-1] [<datetime>] [vim.event.UserLoginSessionEvent] [info] [<Domain>\<Username>] [] [LineID] [User <Domain>\<Username>@<IPAddress> logged out (login time: <datetime>, number of API invocations: <x>, user agent: JAX-WS RI 2.2.9-b130926.1035 svn-revisions#<UID>)]

the text strings “vim.event.UserLoginSessionEvent” and “vim.event.UserLogoutSessionEvent” can be used to alert on people logging into the vCenter


Alteration of vCenter Roles

Creation of a new vCenter role “newCryptoRole”

From the default log we can show that the new role is created however does not show whom by or which permissions it is given.

<datetime> <vCenterHostname> vcenter-server: New role <roleName> created

<datetime> <vCenterHostname> vpxd <eventID> - - Event [<LineID>] [1-1] [<datetime>] [vim.event.RoleAddedEvent] [info] [] [] [LineID] [New role <roleName> created]

This is where the additional vpxd-svcs log is required for details of who completed the action and what permissions were assigned to the role

[tomcat-exec-176  INFO  AuthorizationService.AuditLog  opId=] Action performed by principal(name=VSPHERE.LOCAL\Administrator,isGroup=false):Add role Id=-922973018,Name=newCryptoRole,Description=,Tenant=Privileges=[System.Anonymous, System.Read, System.View, Cryptographer.Clone, Cryptographer.Encrypt, Cryptographer.Migrate, Cryptographer.RegisterVM, Cryptographer.ManageKeyServers, Cryptographer.Decrypt, Cryptographer.AddDisk, Cryptographer.ManageKeys, Cryptographer.ManageEncryptionPolicy, Cryptographer.Access, Cryptographer.Recrypt, Cryptographer.RegisterHost, Cryptographer.EncryptNew]

Modification of permissions to any vCenter role

<datetime> <vCenterHostname> vcenter-server: Role modified 
Previous name: <roleName>, new name <newRoleName>
Added privileges: <privilegesAdded>
Removed privileges: <privilegesRemoved>

<datetime> <vCenterHostname> vpxd <eventID> - - Event [<LineID>] [1-1] [<datetime>] [vim.event.RoleUpdatedEvent] [info] [] [] [LineID] [Role modified 
Previous name: <roleName>, new name <newRoleName>
Added privileges: <privilegesAdded>
Removed privileges: <privilegesRemoved>]

From the default log we can show that the role is modified and which permissions have been added, however does not show whom by. This is where the additional vpxd-svcs log is required for details of who completed the action

[tomcat-exec-17  INFO  AuthorizationService.AuditLog  opId=a794037d-a725-4b89-ab96-d3a23a58648c] Action performed by principal(name=VSPHERE.LOCAL\Administrator,isGroup=false):Update role Id=-922973018,Name=newCryptoRole,Description=,Tenant=Privileges=[System.Anonymous, Cryptographer.Clone, Cryptographer.Encrypt, Cryptographer.Migrate, Cryptographer.RegisterVM, Cryptographer.ManageKeyServers, Cryptographer.Decrypt, Cryptographer.AddDisk, Cryptographer.ManageKeys, Cryptographer.ManageEncryptionPolicy, System.View, Cryptographer.Access, Cryptographer.Recrypt, Cryptographer.RegisterHost, System.Read, Cryptographer.EncryptNew, Network.Assign, Network.Config, Network.Move, Network.Delete, Task.Create, Task.Update]

Deletion of a vCenter role

<datetime> <vCenterHostname> vcenter-server: New role <roleName> removed

<datetime> <vCenterHostname> vpxd <eventID> - - Event [<LineID>] [1-1] [<datetime>] [vim.event.RoleRemovedEvent] [info] [] [] [LineID] [Role <roleName> removed]

From the default log we can show that the role is removed, however does not show whom by. This is where the additional vpxd-svcs log is required for details of who completed the action

 
[tomcat-exec-2  INFO  AuthorizationService.AuditLog  opId=c0100be8-9114-4e60-9520-4cf1b6015793] Action performed by principal(name=VSPHERE.LOCAL\Administrator,isGroup=false):Delete role -922973018  

Assignment of User to a Role

Assigning a user to a role is not recorded in the default logs, this requires the additional vpxd-svcs log

 [tomcat-exec-232  INFO  AuthorizationService.AuditLog  opId=] Action performed by principal(name=VSPHERE.LOCAL\Administrator,isGroup=false):Added access control [ Principal=Name=VSPHERE.LOCAL\newCryptoUser,isGroup=false,roles=[-922973018],propogating=true ] to document urn:acl:global:permissions

If you attempt to assign a user to a role with higher permissions that your current user you will receive the following error message in the vCenter Web UI

Additionally the following event is recorded in the vpxd-svcs.log

[tomcat-exec-293  WARN  com.vmware.cis.authorization.impl.AclPrivilegeValidator  opId=] User VSPHERE.LOCAL\newUser does not have privileges [System.Anonymous, Cryptographer.Clone, Cryptographer.Encrypt, Cryptographer.Migrate, Cryptographer.RegisterVM, Cryptographer.ManageKeyServers, Cryptographer.Decrypt, Cryptographer.AddDisk, Cryptographer.ManageKeys, Cryptographer.ManageEncryptionPolicy, System.View, Cryptographer.Access, Cryptographer.Recrypt, Cryptographer.RegisterHost, Authorization.ModifyPermissions, System.Read, Cryptographer.EncryptNew] on object urn%3Aacl%3Aglobal%3Apermissions

Adding user to Platform Services Controller SSO Groups

In order to capture logs showing adding user to the “SystemConfiguration.BashShellAdministrators” group we require the additional logs ssoAdminServer.log and vmdir-syslog.log

./sso/ssoAdminServer.log:

pool-4-thread-1 opId=73c87e6b-746c-46f2-9b59-a5da95f5a1c1 INFO  com.vmware.identity.admin.vlsi.PrincipalManagementServiceImpl] [User {Name: Administrator, Domain: vsphere.local} with role 'Administrator'] Adding users to local group 'SystemConfiguration.BashShellAdministrators'

./vmdird/vmdird-syslog.log:

info vmdird  t@139993972463360: MOD 1,add,member: (CN=Administrator,CN=Users,DC=vsphere,DC=local) info vmdird  t@139993972463360: Modify Entry (CN=SystemConfiguration.BashShellAdministrators,DC=vsphere,DC=local)(from 127.0.0.1)(by <PSCName>@vsphere.local)(via Ext)(USN 4974) 


Cryptographic Components

The objective of these alerts are to ensure that vSAN encryption is not disabled (where enabled) or enabled (where it’s not).  Equally, any tampering with KMS (required for encryption) should be correlated back to change control / incident management.

As user with “Administrator – No Cryptography” if you try to disable encryption on vSAN they do not receive the option due to a lack of privileges

Disable vSAN Encryption

In this test, vSAN encryption was disabled.  This is considered a reconfiguration of vSAN and logged accordingly.

Default vCenter logs show that vSAN is being reconfigured:

<datetime> <vCenterHostname> vcenter-server: Task: Reconfigure vSAN cluster

However this is not much help as it only indicates that a change has been made, but no details of the changes.

ESXi Host logs show that on the string [VsanSystemImpl::Update] the vSAN is being reconfigured and has encryption set to ‘enabled=false’.

The result was a vSAN with no encryption.

Enabling vSAN encryption

In this test, vSAN encryption was enabled.  This is considered a reconfiguration of vSAN and logged accordingly.

Default vCenter logs show that vSAN is being reconfigured:

<datetime> <vCenterHostname> vcenter-server: Task: Reconfigure vSAN cluster

<datetime> <vCenterHostname> vpxd <eventID> - - Event [<LineID>] [1-1] [<datetime>] [vim.event.TaskEvent] [info] [<domain>\<username>] [<clusterName>] [LineID] [Task: Reconfigure vSAN cluster]

ESXi Host logs show that on the string [VsanSystemImpl::Update] the vSAN is being reconfigured and has encryption set to ‘enabled=true’.

Adding a KMS Server

The event of adding an additional KMS is logged, specifying the KMS alias name and the KMS Cluster into which it is added.

<datetime> <vCenterHostname> vpxd <eventID> - - <date> info vpxd[<Rand>] [Originator@xxxx sub=CryptoManager opID-KmipServerPageMediator-add-xxxxx-ngc:<rand>] A new Kmip Server <KMSName> is registered in cluster <KMSCluster>

The string “A new Kmip Server” can be used to alert on a new KMS server being added to the KMS Cluster.

Delete a KMS Server

The KMS Server was unregistered from the VMware vCenter.

The following event described the removal.

<datetime> <vCenterHostname> vpxd <eventID> - - <date> info vpxd[<Rand>] [Originator@xxxx sub=CryptoManager opID-KmipServerActionResolver-remove-xxxxx-ngc:<rand>] Kmip Server <KMSName> is removed from cluster <KMSCluster>

vMotion

vMotion a VM from vSAN Datastore to Local Storage

The Test Virtual Machine (permbound1) was migrated from vSAN ‘vSANDatastore’ to local storage named ‘ds-local-ESXiHostnameLocalDS’

The following events were recorded by the default vCenter logs.

vcenter-server: Migrating <VMname> from <ESXiHostname>, <datastoreName> to <ESXiHostname>, <datastoreName> in <vCenterDatacenter>

The event is in the format and notes the time, who carried out the migration under the field “vc_username”, what was migrated, and the source/destination hosts and datastores.