vSAN Health Alarm Check Script (Using PowerCLI)

NSX Manager to login with a local account

In the third and final part of this series I have taken my basic skeleton from the previous two blogs in order to solve the issue of bringing all of the vSAN Skyline Health checks into one central location using a vSAN Health Alarm Check Script.

My two previous blogs can be found here:
NSX Backup Check Script (Using the NSX Web API)
NSX Alarm Check Script (Using the NSX REST API)

Unfortunately this time, despite best efforts I was unable to get a suitable result using the vCenter REST API. Documentation is lacking and I was not able to get full results for the Skyline Health Checks. From asking around it seems that PowerCLI holds the answer for me, so it gave me an excuse to adapt the script again and get it to work with PowerCLI.

Again you might be asking ‘why not just use the vSAN Management Pack for vROps?’ but alas it does not keep pace with the vSAN Skyline Health and it is missing some alarms.

PowerCLI

For those not aware of everything PowerCLI can do you can find the full reference of the vSphere and vSAN cmdlets here:

https://developer.vmware.com/docs/powercli/latest/products/vmwarevsphereandvsan/

We are going to be using the Get-VSANView cmdlet in order to pull out the information from the vCenter.

The health information we can get with the “VsanVcClusterHealthSystem-vsan-cluster-health-system” Managed Object. Details of this can be found here:

https://vdc-download.vmware.com/vmwb-repository/dcr-public/3325c370-b58c-4799-99ff-58ae3baac1bd/45789cc5-aba1-48bc-a320-5e35142b50af/doc/vim.cluster.VsanVcClusterHealthSystem.html

The Code Changes

The Try Catch has been changed to connect to the vCenter first and then call a function to get the vSAN Health Summary

try{
    Connect-VIServer -Server $vCenter -Credential $encodedlogin
    $Clusters = Get-Cluster

    foreach ($Cluster in $Clusters) {
        Get-VsanHealthSummary -Cluster $Cluster
    }
 }
catch {catchFailure}

So lets have a look at the function itself.

The Get vSAN Cluster Health function

I have written a function to take in a cluster name as a parameter, find the Managed Object Reference (MORef) for the cluster, and then query the vCenter for the vSAN cluster health for that MORef and output any which are Yellow (Warning) or Red (Critical)

Function Get-VsanHealthSummary {

    param(
        [Parameter(Mandatory=$true)][String]$Cluster
    )
    
    $vchs = Get-VSANView -Id "VsanVcClusterHealthSystem-vsan-cluster-health-system"
    $cluster_view = (Get-Cluster -Name $Cluster).ExtensionData.MoRef
    $results = $vchs.VsanQueryVcClusterHealthSummary($cluster_view,$null,$null,$true,$null,$null,'defaultView')
    $healthCheckGroups = $results.groups
    $timestamp = (Get-Date).ToString("yyyy/MM/dd HH:mm:ss")

    foreach($healthCheckGroup in $healthCheckGroups) {
        
        $Health = @("Yellow","Red")
        $output = $healthCheckGroup.grouptests | where TestHealth -in $Health | select TestHealth,@{l="TestId";e={$_.testid.split(".") | select -last 1}},TestName,TestShortDescription,@{l="Group";e={$healthCheckGroup.GroupName}}
        $healthCheckTestHealth = $output.TestHealth
        $healthCheckTestName = $output.TestName
        $healthCheckTestShortDescription = $output.TestShortDescription
        
        if ($healthCheckTestHealth -eq "yellow") {
            $healthCheckTestHealthAlt = "Warning"
        }
        if ($healthCheckTestHealth -eq "red") {
            $healthCheckTestHealthAlt = "Critical"
            }
        if ($healthCheckTestName){
            Add-Content -Path $exportpath -Value "$timestamp [$healthCheckTestHealthAlt] $vCenter - vSAN Clustername $Cluster vSAN Alarm Name $healthCheckTestName Alarm Description $healthCheckTestShortDescription"
            Start-Sleep -Seconds 1
        }
    }
}

Saving Credentials

This time as we are using PowerCLI and Connect-VIServer we cannot use the encoded credentials we used last time for the Web and REST API, so we will use the cmdlet Export-CLIxml which allows us to create an XML-based representation of an object and stores it in a file.

Further details of this utility can be found here:

https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/export-clixml?view=powershell-7.3

We will use the Get-Credential to bring in the username and password to store and then export it to the path defined in the variables at the top of the script.

if (-Not(Test-Path -Path  $credPath)) {
    $credential = Get-Credential
    $credential | Export-Clixml -Path $credPath

}

$encodedlogin = Import-Clixml -Path $credPath

Handling the Outputs.

As per my previous scripts the outputs are formatted to be ingested into a syslog server (vRealize Log Insight in this case) which would then send emails to the appropriate places and allow for a nice dashboard for quick whole estate checks.

The Final vSAN Health Alarm Check Script

I have put all the variables at the top and the script is designed to be run in a folder and to have another separate folder with the logs. This was done in order to manage multiple scripts logging to the same location
eg:
c:\scripts\NSXBackupCheck\NSXBackupCheck.ps1
c:\scripts\Logs\NSXBackupCheck.log

param ($vCenter)

$curDir = &{$MyInvocation.PSScriptRoot}
$exportpath = "$curDir\..\Logs\vSANAlarmCheck.log"
$credPath = "$curDir\$vCenter.cred"
$scriptName = &{$MyInvocation.ScriptName}

add-type @"
   using System.Net;
   using System.Security.Cryptography.X509Certificates;
   public class TrustAllCertsPolicy : ICertificatePolicy {
      public bool CheckValidationResult(
      ServicePoint srvPoint, X509Certificate certificate,
      WebRequest request, int certificateProblem) {
      return true;
   }
}
"@
[System.Net.ServicePointManager]::CertificatePolicy = New-Object TrustAllCertsPolicy

Function Get-VsanHealthSummary {

    param(
        [Parameter(Mandatory=$true)][String]$Cluster
    )
    
    $vchs = Get-VSANView -Id "VsanVcClusterHealthSystem-vsan-cluster-health-system"
    $cluster_view = (Get-Cluster -Name $Cluster).ExtensionData.MoRef
    $results = $vchs.VsanQueryVcClusterHealthSummary($cluster_view,$null,$null,$true,$null,$null,'defaultView')
    $healthCheckGroups = $results.groups
    $timestamp = (Get-Date).ToString("yyyy/MM/dd HH:mm:ss")

    foreach($healthCheckGroup in $healthCheckGroups) {

        
        $Health = @("Yellow","Red")
        $output = $healthCheckGroup.grouptests | where TestHealth -in $Health | select TestHealth,@{l="TestId";e={$_.testid.split(".") | select -last 1}},TestName,TestShortDescription,@{l="Group";e={$healthCheckGroup.GroupName}}
        $healthCheckTestHealth = $output.TestHealth
        $healthCheckTestName = $output.TestName
        $healthCheckTestShortDescription = $output.TestShortDescription
        
        if ($healthCheckTestHealth -eq "yellow") {
            $healthCheckTestHealthAlt = "Warning"
        }

        if ($healthCheckTestHealth -eq "red") {
            $healthCheckTestHealthAlt = "Critical"
            }


        if ($healthCheckTestName){
            Add-Content -Path $exportpath -Value "$timestamp [$healthCheckTestHealthAlt] $vCenter - vSAN Clustername $Cluster vSAN Alarm Name $healthCheckTestName Alarm Description $healthCheckTestShortDescription"
            Start-Sleep -Seconds 1
        }
    }

}

function catchFailure {
    $timestamp = (Get-Date).ToString("yyyy/MM/dd HH:mm:ss")
    if (Test-Connection -BufferSize 32 -Count 1 -ComputerName $vCenter -Quiet) {
        Add-Content -Path $exportpath -Value "$timestamp [ERROR] $vCenter - $_"
    }
    else {
        Add-Content -Path $exportpath -Value "$timestamp [ERROR] $vCenter - Host Not Found"
    }
exit
}

if (!$vCenter) {
    Write-Host "please provide parameter 'vCenter' in the format '$scriptName -vCenter [FQDN of vCenter Server]'"
    exit
    }

if (-Not(Test-Path -Path  $credPath)) {
    $credential = Get-Credential
    $credential | Export-Clixml -Path $credPath

}

$encodedlogin = Import-Clixml -Path $credPath


try{
    Connect-VIServer -Server $vCenter -Credential $encodedlogin
    $Clusters = Get-Cluster

    foreach ($Cluster in $Clusters) {
        Get-VsanHealthSummary -Cluster $Cluster
    }
 }
catch {catchFailure}

Disconnect-VIServer $vCenter -Confirm:$false

Overview

The final script above can be altered to be used as a skeleton for any other PowerShell or PowerCLI commands, as well as being adapted for REST APIs and Web API as per the previous Blogs. Important to note that these will use a different credential store function.

The two previous blogs can be found here:
NSX Backup Check Script (Using the NSX Web API)
NSX Alarm Check Script (Using the NSX REST API)

NSX Alarm Check Script (Using the NSX REST API)

NSX Manager to login with a local account

In my previous blog I created a script to get the last backup status from NSX Manager in order to quickly check multiple NSX Managers. Today I had a need to bring the alarms raised in all of these NSX Managers into one single location, which necessitated creating an NSX Alarm Check Script.
‘But surely the NSX Management Pack would allow you to do this you?’ may ask. Unfortunately it is missing some of the alarms which gets raised on the NSX Managers such as passwords expiring for example. This one being an annoyance if you do not notice until after it’s expired and you are having LDAP issues.

Now luckily this time, we CAN use the NSX REST API to get these details, and I had a script lying around which could provide a skeleton for this. You can find that script here: NSX Backup Check Script

In order to adapt this script to use REST we need to change the Invoke-WebMethod to Invoke-RestMethod

Interrogating NSX REST API

I used the documentation from VMware {code} to find this API and how to handle the results. Luckily this is a lot more detailed than the web API. You can find the NSX API details here:

https://developer.vmware.com/apis/547/nsx-t

so we want to request /api/v1/alarms in order to return a list of all alarms on the nsx managers.

$result = Invoke-RestMethod -Uri https://$nsxmgr/api/v1/alarms -Headers $Header -Method 'GET' -UseBasicParsing

Handling the Outputs.

Running this command will give a response similar to this:

{
  "result_count": 4,
  "results": [
      {
        "id": "xxxx",
        "status": "OPEN",
        "feature_name": "manager_health",
        "event_type": "manager_cpu_usage_high",
        "feature_display_name": "Manager Health",
        "event_type_display_name": "CPU Usage High",
        "node_id": "xxxx",
        "last_reported_time": 1551994806,
        "description": |
          "The CPU usage for the manager node identified by 
           appears to be\nrising.",
        "recommended_action": |
          "Use the top command to check which processes have the most CPU
           usages, and\nthen check \/var\/log\/syslog and these processes'
           local logs to see if there\nare any outstanding errors to be
           resolved.",
        "node_resource_type": "ClusterNodeConfig",
        "severity": "WARNING",
        "entity_resource_type": "ClusterNodeConfig",
      },
      ...
  ]
}

From this output I wanted to pull out the severity, status, alarm description and the node which was impacted, so I pulled these into an array and add the items to variables.

$nsxAlarms = $result.results 
    foreach ($nsxAlarm in $nsxalarms) {
        $nsxAlarmCreated = (get-date 01.01.1970).AddSeconds([int]($nsxAlarm._create_time/1000)).ToString("yyyy/MM/dd HH:mm:ss")
        $timestamp = (Get-Date).ToString("yyyy/MM/dd HH:mm:ss")
        $nsxAlarmSeverity = $nsxAlarm.severity
        $nsxAlarmStatus = $nsxAlarm.status
        $nsxAlarmNode_display_name = $nsxAlarm.node_display_name
        $nsxAlarmDescription = $nsxAlarm.description

From here I wanted to only include any alarms which had not been marked acknowledged or resolved to avoid constantly reporting a condition which was known about.

if($nsxAlarm.status -ne "ACKNOWLEDGED" -and $nsxAlarm.status -ne "RESOLVED"){ 
    Add-Content -Path $exportpath -Value "$timestamp [$nsxAlarmSeverity] $NSXMGR - Alarm Created $nsxAlarmCreated Status $nsxAlarmStatus Affected Node $nsxAlarmNode_display_name Description  $nsxAlarmDescription"
}

It is also possible to bypass this by running the following command, however I wanted to pull in all alarms for my specific use case.

GET /api/v1/alarms?status=OPEN

As per the previous script, this was wrapped in a try catch and the catch failure tested if the host was up. A full explanation can be found on the blog about this script here: NSX Backup Check Script

.

The Final NSX Alarm Check Script

param ($nsxmgr)

$curDir = &{$MyInvocation.PSScriptRoot}
$exportpath = "$curDir\..\Logs\NSXAlarmCheck.log"
$credPath = "$curDir\$nsxmgr.cred"
$scriptName = &{$MyInvocation.ScriptName}

add-type @"
   using System.Net;
   using System.Security.Cryptography.X509Certificates;
   public class TrustAllCertsPolicy : ICertificatePolicy {
      public bool CheckValidationResult(
      ServicePoint srvPoint, X509Certificate certificate,
      WebRequest request, int certificateProblem) {
      return true;
   }
}
"@
[System.Net.ServicePointManager]::CertificatePolicy = New-Object TrustAllCertsPolicy

function catchFailure {
    $timestamp = (Get-Date).ToString("yyyy/MM/dd HH:mm:ss")
    if (Test-Connection -BufferSize 32 -Count 1 -ComputerName $nsxmgr -Quiet) {
        Add-Content -Path $exportpath -Value "$timestamp [ERROR] $NSXMGR - $_"
    }
    else {
        Add-Content -Path $exportpath -Value "$timestamp [ERROR] $NSXMGR - Host Not Found"
    }
exit
}

if (!$nsxmgr) {
    Write-Host "please provide parameter 'nsxmgr' in the format '$scriptName -nsxmgr [FQDN of NSX Manager]'"
    exit
    }

if (-Not(Test-Path -Path  $credPath)) {
    $username = Read-Host "Enter username for NSX Manager" 
    $pass = Read-Host "Enter password" -AsSecureString 
    $password = [System.Runtime.InteropServices.Marshal]::PtrToStringAuto([System.Runtime.InteropServices.Marshal]::SecureStringToBSTR($pass))
    $userpass  = $username + ":" + $password

    $bytes= [System.Text.Encoding]::UTF8.GetBytes($userpass)
    $encodedlogin=[Convert]::ToBase64String($bytes)
    
    Set-Content -Path $credPath -Value $encodedlogin
}

$encodedlogin = Get-Content -Path $credPath

$authheader = "Basic " + $encodedlogin
$header = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
$header.Add("Authorization",$authheader)

try{
    $result = Invoke-RestMethod -Uri https://$nsxmgr/api/v1/alarms -Headers $Header -Method 'GET' -UseBasicParsing

        $nsxAlarms = $result.results 
        foreach ($nsxAlarm in $nsxalarms) {
            
            $nsxAlarmCreated = (get-date 01.01.1970).AddSeconds([int]($nsxAlarm._create_time/1000)).ToString("yyyy/MM/dd HH:mm:ss")
            $timestamp = (Get-Date).ToString("yyyy/MM/dd HH:mm:ss")
            $nsxAlarmSeverity = $nsxAlarm.severity
            $nsxAlarmStatus = $nsxAlarm.status
            $nsxAlarmNode_display_name = $nsxAlarm.node_display_name
            $nsxAlarmDescription = $nsxAlarm.description

            if($nsxAlarm.status -ne "ACKNOWLEDGED" -and $nsxAlarm.status -ne "RESOLVED"){ 
                Add-Content -Path $exportpath -Value "$timestamp [$nsxAlarmSeverity] $NSXMGR - Alarm Created $nsxAlarmCreated Status $nsxAlarmStatus Affected Node $nsxAlarmNode_display_name Description  $nsxAlarmDescription"
            }
        
    }
 }
catch {catchFailure}

Overview

The final script above can be altered to be used as a skeleton for any other Invoke-RestRequest APIs as well as simply being adapted for Web API. I will be following up this post with further updates to adapt the script in order to use PowerCLI, which required a different credential store.


NSX Backup Check Script (Using the NSX Web API)

NSX Manager to login with a local account

I was recently asked for a way to have a simple check and report on the last backup status for a global company with multiple VMware NSX managers.

For some reason their NSX Managers were not reporting the backup status via syslog to VMware vRealize Log Insight (vRLI) and even if it was, they only have one vRLI cluster per site and wanted one simple place to do their daily checks.

So let’s make an NSX backup check script, PowerShell and REST API to the rescue! … right?

So … no, you cannot get the backup status via REST API

Great.

But you can via the WebAPI!

Hurrah! Lets throw in some Invoke-WebRequest and get the data we need.

After some basic checks, I got the info I wanted – now I need to schedule it and have it run a short period after the backup window.

This part resulted in a path of trying to figure out a way to hold account passwords in a usable manner without them being written in clear-text anywhere because that’s just no good. There are a few different ways to do this, but they either tie it to one user profile and computer, or don’t work with the basic auth needed to run against NSX to get the data via webrequest. I will go into how I achieved that further down, but first, the web API to get backup status.

Interrogating NSX Web API

After some looking around, I discovered the following URL called via Invoke-WebRequest would give us the backup results:

Invoke-WebRequest -Uri https://[nsxmgr]/api/v1/cluster/backups/overview -Headers $Header 

Now the big problem with Invoke-WebRequest is that you would have assumed that it would return any response status such as 403 Forbidden. Nope!

You don’t get any helpful error catching, it either works or bombs out. Not much good for an unattended script that you want to tell you about any issues.

So the best fix I came up with was using a try and catch

try { 
    $result = Invoke-WebRequest -Uri https://...
    }
catch {catchFailure}

I then created a function to run in the event of the failure which will ping the host to see if it’s online and if it is output the error, if it isn’t output that the host is unreachable.

if (Test-Connection -BufferSize 32 -Count 1 -ComputerName $nsxmgr -Quiet) {
        <error output>
    } else { <host offline output> } exit

Job jobbed, no more bombing out with red text.

Dealing with certificates

When you run Invoke-WebRequest against an NSX manager with self signed certificates you get the error "The Underlying connection was closed: Could not establish trust relationship for the SSL/TLS secure channel"

The fix for this is to add in this code near the top of your script to resolve the error.

add-type @"
   using System.Net;
   using System.Security.Cryptography.X509Certificates;
   public class TrustAllCertsPolicy : ICertificatePolicy {
      public bool CheckValidationResult(
      ServicePoint srvPoint, X509Certificate certificate,
      WebRequest request, int certificateProblem) {
      return true;
   }
}
"@
[System.Net.ServicePointManager]::CertificatePolicy = New-Object TrustAllCertsPolicy

Password Management

Great stuff I now have a working Script, but ideally, I want it to be scheduled and unattended.

This is where I spun around for a while trying different ways to store credentials in a secure format, because passwords in plain text is uncool.

I initially was trying to use the encoded credentials modules but having little luck getting it passed as a header value, so bugged a colleague (@pauldavey_79) for some help and ideas from his many years of experience prodding APIs.

What we came up with was to take the username and password as an requested input via Read-Host and encode it in the Base 64 format required to pass via the header in Invoke-WebRequest and store that in a text file.

[System.Runtime.InteropServices.Marshal]::PtrToStringAuto([System.Runtime.InteropServices.Marshal]::SecureStringToBSTR($pass))
    $userpass  = $username + ":" + $password

    $bytes= [System.Text.Encoding]::UTF8.GetBytes($userpass)
    $encodedlogin=[Convert]::ToBase64String($bytes)
    
    Set-Content -Path $credPath -Value $encodedlogin

This worked a charm.

Handling the Outputs.

With this script I wanted to feed the output into a syslog server (vRealize Log Insight in this case) which would then send emails to the appropriate places and allow for a nice dashboard for quick whole estate checks.

In order to achieve this, I used the Add-Content command to append the data to a .log file which was monitored by the Log Insight Agent and sent off to the Log Insight Server.

if($LatestBackup.success -eq $true){ 
  Add-Content -Path $exportpath -Value "$timestamp [INFO] $NSXMGR - Last backup successful. Start time $start End time $end"
} else{ 
  Add-Content -Path $exportpath -Value "$timestamp [ERROR] $NSXMGR - Last backup failed $start $end"

This gives us a nice syslog formatted output which can be easily manipulated within Log Insight. Hurrah.

One thing to note is that the NSX WebAPI returned the start and end times in the usual unix format, so I needed to convert that to a more suitable human readable date, that was done with the line:

 $var = (get-date 01.01.1970).AddSeconds([int]($LatestBackup.end_time/1000))

I also needed to get my try-catch error collector to output the error messages in the same format so that was done as so:

Add-Content -Path $exportpath -Value "$timestamp [ERROR] $NSXMGR - $_"

Pulling all of that together we get the final script which can be used as a skeleton for any future work required. A few of them will be posted at a later date.

The Final NSX Backup Check Script

I have put all the variables at the top and the script is designed to be run in a folder and to have another separate folder with the logs. This was done in order to manage multiple scripts logging to the same location
eg:
c:\scripts\NSXBackupCheck\NSXBackupCheck.ps1
c:\scripts\Logs\NSXBackupCheck.log

param ($nsxmgr)

$curDir = &{$MyInvocation.PSScriptRoot}
$exportpath = "$curDir\..\Logs\NSXBackupCheck.log"
$credPath = "$curDir\$nsxmgr.cred"
$scriptName = &{$MyInvocation.ScriptName}

add-type @"
   using System.Net;
   using System.Security.Cryptography.X509Certificates;
   public class TrustAllCertsPolicy : ICertificatePolicy {
      public bool CheckValidationResult(
      ServicePoint srvPoint, X509Certificate certificate,
      WebRequest request, int certificateProblem) {
      return true;
   }
}
"@
[System.Net.ServicePointManager]::CertificatePolicy = New-Object TrustAllCertsPolicy

function catchFailure {
    $timestamp = (Get-Date).ToString("yyyy/MM/dd HH:mm:ss")
    if (Test-Connection -BufferSize 32 -Count 1 -ComputerName $nsxmgr -Quiet) {
        Add-Content -Path $exportpath -Value "$timestamp [ERROR] $NSXMGR - $_"
    }
    else {
        Add-Content -Path $exportpath -Value "$timestamp [ERROR] $NSXMGR - Host Not Found"
    }
exit
}

if (!$nsxmgr) {
    Write-Host "please provide parameter 'nsxmgr' in the format '$scriptName -nsxmgr [FQDN of NSX Manager]'"
    exit
    }

if (-Not(Test-Path -Path  $credPath)) {
    $username = Read-Host "Enter username for NSX Manager" 
    $pass = Read-Host "Enter password" -AsSecureString 
    $password = [System.Runtime.InteropServices.Marshal]::PtrToStringAuto([System.Runtime.InteropServices.Marshal]::SecureStringToBSTR($pass))
    $userpass  = $username + ":" + $password

    $bytes= [System.Text.Encoding]::UTF8.GetBytes($userpass)
    $encodedlogin=[Convert]::ToBase64String($bytes)
    
    Set-Content -Path $credPath -Value $encodedlogin
}

$encodedlogin = Get-Content -Path $credPath

$authheader = "Basic " + $encodedlogin
$header = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
$header.Add("Authorization",$authheader)

try{
    $result = Invoke-WebRequest -Uri https://$nsxmgr/api/v1/cluster/backups/overview -Headers $Header -UseBasicParsing
    if($result.StatusCode -eq 200) {
        $nsxbackups = $result.Content | ConvertFrom-Json
        $LatestBackup = $nsxbackups.backup_operation_history.cluster_backup_statuses
        $start = (get-date 01.01.1970).AddSeconds([int]($LatestBackup.start_time/1000))
        $end = (get-date 01.01.1970).AddSeconds([int]($LatestBackup.end_time/1000))
        $timestamp = (Get-Date).ToString("yyyy/MM/dd HH:mm:ss")
        if($LatestBackup.success -eq $true){ 
            Add-Content -Path $exportpath -Value "$timestamp [INFO] $NSXMGR - Last backup successful. Start time $start End time $end"
        } else{ 
            Add-Content -Path $exportpath -Value "$timestamp [ERROR] $NSXMGR - Last backup failed $start $end"
        }
    }
 }
catch {catchFailure}

Overview

The final script above can be altered to be used as a skeleton for any other Invoke-WebRequest APIs as well as simply being adapted for REST API. I will be following up this post with further updates to this script using RESTAPI and also an adaption to use PowerCLI which required a different credential store.

The REST API Script can be found here: NSX Alarm Check Script

Registration failed: Log Insight Adaptor Object Missing

I recently came across a problem at a client’s with integrating Log Insight (vRLI) with vROps. The connection tests successfully and alert integration works, however launch in context returns the error “Registration failed: Log Insight Adapter Object Missing”

After a discussion with GSS it was discovered this is actually a known issue due to the vROps cluster being behind a load balancer and the following errors are shown in the Log Insight log /storage/var/loginsight/vcenter_operations.log

[2018-05-15 09:51:02.621+0000] ["https-jsse-nio-443-exec-3"/10.205.73.139 INFO] [com.vmware.loginsight.vcopssuite.VcopsSuiteApiRequest] [Open connection to URL https://vrops.domain.com/suite-api/api/versions/current]
[2018-05-15 09:51:02.621+0000] ["https-jsse-nio-443-exec-3"/10.205.73.139 INFO] [com.vmware.loginsight.vcopssuite.VcopsSuiteApiRequest] [http connection, setting request method 'GET' and content type 'application/json; charset=utf-8']
[2018-05-15 09:51:02.621+0000] ["https-jsse-nio-443-exec-3"/10.205.73.139 INFO] [com.vmware.loginsight.vcopssuite.VcopsSuiteApiRequest] [reading server response]
[2018-05-15 09:51:02.626+0000] ["https-jsse-nio-443-exec-3"/10.205.73.139 ERROR] [com.vmware.loginsight.vcopssuite.VcopsSuiteApiRequest] [failed to post resource to vRealize Operations Manager]
javax.net.ssl.SSLProtocolException: handshake alert:  unrecognized_name

This is caused by some security updates to the Apache Struts, JRE, kernel-default, and other libraries from vRealize Log Insight 4.5.1. These updated libraries affect the SSL Handshake that takes place when testing the vRealize Operations Manager integration.

To resolve this issue we needed to add the FQDN of the vROps load balancer as an alias to the apache2 config. This can be done by following these steps.

  1. ​Log into the vRealize Operations Manager Master node as root via SSH or Console.
  2. Open /usr/lib/vmware-vcopssuite/utilities/conf/vcops-apache.conf in a text editor.
  3. Find the ServerName ${VCOPS_APACHE_SERVER_NAME} line and insert a new line after it.
  4. On the new line enter the following:
ServerAlias vrops.domain.com

Note: Replace vrops.domain.com with the FQDN of vRealize Operations Manager’s load balancer.

5. Save and close the file.

6. Restart the apache2 service:

service apache2 restart

7. Repeat steps 1-6 on all nodes in the vRealize Operations Manager cluster.

VMware vCenter Security Log Events

I had a requirement from a customer to identify log events in order to create alerts for several threat scenarios. This post is intended to provide a high-level description of the results for the scenarios for future reference or in case anyone finds a use. Please see the earlier post on enabling additional vCenter and PSC logging. http://www.caenotech.co.uk/vmware/configuration-of-rsyslog-on-vcsa-and-psc/

Access to vCenter Administrator role

The objective of the following is to ensure nobody other than certain colleagues have access to the Cryptography operations within vCenter and that all work carried out on crypto operations is done under suitable change control.

As can be seen the default syslog details the Administrator user logging in as VSPHERE.LOCAL\Administrator and the IP it has originated from

<datetime> <vCenterHostname> vcenter-server: User <Domain>\<Username>@<IPAddress> logged in as JAX-WS RI 2.2.9-b130926.1035 svn-revisions#<UID>

<datetime> <vCenterHostname> vpxd <eventID> - - Event [<LineID>] [1-1] [<datetime>] [vim.event.UserLoginSessionEvent] [info] [<Domain>\<Username>] [] [LineID] [User <Domain>\<Username>@<IPAddress> logged in as JAX-WS RI 2.2.9-b130926.1035 svn-revisions#<UID>]

<datetime> <vCenterHostname> vcenter-server: User <Domain>\<Username>@<IPAddress> logged out (login time: <datetime>, number of API invocations: <x>, user agent: JAX-WS RI 2.2.9-b130926.1035 svn-revisions#<UID>)

<datetime> <vCenterHostname> vpxd <eventID> - - Event [<LineID>] [1-1] [<datetime>] [vim.event.UserLoginSessionEvent] [info] [<Domain>\<Username>] [] [LineID] [User <Domain>\<Username>@<IPAddress> logged out (login time: <datetime>, number of API invocations: <x>, user agent: JAX-WS RI 2.2.9-b130926.1035 svn-revisions#<UID>)]

the text strings “vim.event.UserLoginSessionEvent” and “vim.event.UserLogoutSessionEvent” can be used to alert on people logging into the vCenter


Alteration of vCenter Roles

Creation of a new vCenter role “newCryptoRole”

From the default log we can show that the new role is created however does not show whom by or which permissions it is given.

<datetime> <vCenterHostname> vcenter-server: New role <roleName> created

<datetime> <vCenterHostname> vpxd <eventID> - - Event [<LineID>] [1-1] [<datetime>] [vim.event.RoleAddedEvent] [info] [] [] [LineID] [New role <roleName> created]

This is where the additional vpxd-svcs log is required for details of who completed the action and what permissions were assigned to the role

[tomcat-exec-176  INFO  AuthorizationService.AuditLog  opId=] Action performed by principal(name=VSPHERE.LOCAL\Administrator,isGroup=false):Add role Id=-922973018,Name=newCryptoRole,Description=,Tenant=Privileges=[System.Anonymous, System.Read, System.View, Cryptographer.Clone, Cryptographer.Encrypt, Cryptographer.Migrate, Cryptographer.RegisterVM, Cryptographer.ManageKeyServers, Cryptographer.Decrypt, Cryptographer.AddDisk, Cryptographer.ManageKeys, Cryptographer.ManageEncryptionPolicy, Cryptographer.Access, Cryptographer.Recrypt, Cryptographer.RegisterHost, Cryptographer.EncryptNew]

Modification of permissions to any vCenter role

<datetime> <vCenterHostname> vcenter-server: Role modified 
Previous name: <roleName>, new name <newRoleName>
Added privileges: <privilegesAdded>
Removed privileges: <privilegesRemoved>

<datetime> <vCenterHostname> vpxd <eventID> - - Event [<LineID>] [1-1] [<datetime>] [vim.event.RoleUpdatedEvent] [info] [] [] [LineID] [Role modified 
Previous name: <roleName>, new name <newRoleName>
Added privileges: <privilegesAdded>
Removed privileges: <privilegesRemoved>]

From the default log we can show that the role is modified and which permissions have been added, however does not show whom by. This is where the additional vpxd-svcs log is required for details of who completed the action

[tomcat-exec-17  INFO  AuthorizationService.AuditLog  opId=a794037d-a725-4b89-ab96-d3a23a58648c] Action performed by principal(name=VSPHERE.LOCAL\Administrator,isGroup=false):Update role Id=-922973018,Name=newCryptoRole,Description=,Tenant=Privileges=[System.Anonymous, Cryptographer.Clone, Cryptographer.Encrypt, Cryptographer.Migrate, Cryptographer.RegisterVM, Cryptographer.ManageKeyServers, Cryptographer.Decrypt, Cryptographer.AddDisk, Cryptographer.ManageKeys, Cryptographer.ManageEncryptionPolicy, System.View, Cryptographer.Access, Cryptographer.Recrypt, Cryptographer.RegisterHost, System.Read, Cryptographer.EncryptNew, Network.Assign, Network.Config, Network.Move, Network.Delete, Task.Create, Task.Update]

Deletion of a vCenter role

<datetime> <vCenterHostname> vcenter-server: New role <roleName> removed

<datetime> <vCenterHostname> vpxd <eventID> - - Event [<LineID>] [1-1] [<datetime>] [vim.event.RoleRemovedEvent] [info] [] [] [LineID] [Role <roleName> removed]

From the default log we can show that the role is removed, however does not show whom by. This is where the additional vpxd-svcs log is required for details of who completed the action

 
[tomcat-exec-2  INFO  AuthorizationService.AuditLog  opId=c0100be8-9114-4e60-9520-4cf1b6015793] Action performed by principal(name=VSPHERE.LOCAL\Administrator,isGroup=false):Delete role -922973018  

Assignment of User to a Role

Assigning a user to a role is not recorded in the default logs, this requires the additional vpxd-svcs log

 [tomcat-exec-232  INFO  AuthorizationService.AuditLog  opId=] Action performed by principal(name=VSPHERE.LOCAL\Administrator,isGroup=false):Added access control [ Principal=Name=VSPHERE.LOCAL\newCryptoUser,isGroup=false,roles=[-922973018],propogating=true ] to document urn:acl:global:permissions

If you attempt to assign a user to a role with higher permissions that your current user you will receive the following error message in the vCenter Web UI

Additionally the following event is recorded in the vpxd-svcs.log

[tomcat-exec-293  WARN  com.vmware.cis.authorization.impl.AclPrivilegeValidator  opId=] User VSPHERE.LOCAL\newUser does not have privileges [System.Anonymous, Cryptographer.Clone, Cryptographer.Encrypt, Cryptographer.Migrate, Cryptographer.RegisterVM, Cryptographer.ManageKeyServers, Cryptographer.Decrypt, Cryptographer.AddDisk, Cryptographer.ManageKeys, Cryptographer.ManageEncryptionPolicy, System.View, Cryptographer.Access, Cryptographer.Recrypt, Cryptographer.RegisterHost, Authorization.ModifyPermissions, System.Read, Cryptographer.EncryptNew] on object urn%3Aacl%3Aglobal%3Apermissions

Adding user to Platform Services Controller SSO Groups

In order to capture logs showing adding user to the “SystemConfiguration.BashShellAdministrators” group we require the additional logs ssoAdminServer.log and vmdir-syslog.log

./sso/ssoAdminServer.log:

pool-4-thread-1 opId=73c87e6b-746c-46f2-9b59-a5da95f5a1c1 INFO  com.vmware.identity.admin.vlsi.PrincipalManagementServiceImpl] [User {Name: Administrator, Domain: vsphere.local} with role 'Administrator'] Adding users to local group 'SystemConfiguration.BashShellAdministrators'

./vmdird/vmdird-syslog.log:

info vmdird  t@139993972463360: MOD 1,add,member: (CN=Administrator,CN=Users,DC=vsphere,DC=local) info vmdird  t@139993972463360: Modify Entry (CN=SystemConfiguration.BashShellAdministrators,DC=vsphere,DC=local)(from 127.0.0.1)(by <PSCName>@vsphere.local)(via Ext)(USN 4974) 


Cryptographic Components

The objective of these alerts are to ensure that vSAN encryption is not disabled (where enabled) or enabled (where it’s not).  Equally, any tampering with KMS (required for encryption) should be correlated back to change control / incident management.

As user with “Administrator – No Cryptography” if you try to disable encryption on vSAN they do not receive the option due to a lack of privileges

Disable vSAN Encryption

In this test, vSAN encryption was disabled.  This is considered a reconfiguration of vSAN and logged accordingly.

Default vCenter logs show that vSAN is being reconfigured:

<datetime> <vCenterHostname> vcenter-server: Task: Reconfigure vSAN cluster

However this is not much help as it only indicates that a change has been made, but no details of the changes.

ESXi Host logs show that on the string [VsanSystemImpl::Update] the vSAN is being reconfigured and has encryption set to ‘enabled=false’.

The result was a vSAN with no encryption.

Enabling vSAN encryption

In this test, vSAN encryption was enabled.  This is considered a reconfiguration of vSAN and logged accordingly.

Default vCenter logs show that vSAN is being reconfigured:

<datetime> <vCenterHostname> vcenter-server: Task: Reconfigure vSAN cluster

<datetime> <vCenterHostname> vpxd <eventID> - - Event [<LineID>] [1-1] [<datetime>] [vim.event.TaskEvent] [info] [<domain>\<username>] [<clusterName>] [LineID] [Task: Reconfigure vSAN cluster]

ESXi Host logs show that on the string [VsanSystemImpl::Update] the vSAN is being reconfigured and has encryption set to ‘enabled=true’.

Adding a KMS Server

The event of adding an additional KMS is logged, specifying the KMS alias name and the KMS Cluster into which it is added.

<datetime> <vCenterHostname> vpxd <eventID> - - <date> info vpxd[<Rand>] [Originator@xxxx sub=CryptoManager opID-KmipServerPageMediator-add-xxxxx-ngc:<rand>] A new Kmip Server <KMSName> is registered in cluster <KMSCluster>

The string “A new Kmip Server” can be used to alert on a new KMS server being added to the KMS Cluster.

Delete a KMS Server

The KMS Server was unregistered from the VMware vCenter.

The following event described the removal.

<datetime> <vCenterHostname> vpxd <eventID> - - <date> info vpxd[<Rand>] [Originator@xxxx sub=CryptoManager opID-KmipServerActionResolver-remove-xxxxx-ngc:<rand>] Kmip Server <KMSName> is removed from cluster <KMSCluster>

vMotion

vMotion a VM from vSAN Datastore to Local Storage

The Test Virtual Machine (permbound1) was migrated from vSAN ‘vSANDatastore’ to local storage named ‘ds-local-ESXiHostnameLocalDS’

The following events were recorded by the default vCenter logs.

vcenter-server: Migrating <VMname> from <ESXiHostname>, <datastoreName> to <ESXiHostname>, <datastoreName> in <vCenterDatacenter>

The event is in the format and notes the time, who carried out the migration under the field “vc_username”, what was migrated, and the source/destination hosts and datastores.

Configuration of rSyslog on VMware vCenter Appliance VCSA and PSC for Logging Authentication and Authorisation Activities

Introduction

As part of a client’s environment, there was a requirement from the end customer to forward additional logging information above the default logs forwarded by vCenter Server and Platform Services Controller (PSC).

In order to provide these additional logs configuration of rSyslog is required to specify these files.

This post is intended to provide steps to implement these changes.

Additional logging available from non default vCenter logs

Single Sign-On Activities

  • Successful SSO Login
  • Successful SSO Logout
  • Successful SSO Active Directory Login
  • Successful SSO Active Directory Logout
  • Failed SSO Login
  • Failed SSO Login (User not found)
  • Failed SSO Active Directory Login
  • Failed SSO Active Directory Login (User not found)
  • SSO User Creation
  • SSO User Password Change
  • SSO User Deletion
  • SSO Group Creation
  • SSO Group Assignment
  • SSO Group Deletion
  • SSO Password policy update

vCenter Server Activities

  • Successful vCenter Server Login
  • Successful vCenter Server Logout
  • vSphere Permission Created
  • vSphere Permission Updated
  • vSphere Permission Deleted
  • vSphere Role Creation
  • vSphere Role Update
  • vSphere Role Deletion

In order to capture the above activities, you will need to forward the following log files:

  • /var/log/vmware/sso/vmware-sts-idmd.log
  • /var/log/vmware/sso/ssoAdminServer.log
  • /var/log/vmware/vpxd-svcs/vpxd-svcs.log
  • /var/log/vmware/vpx/vpxd.log

NOTE: I am not including the vpxd.log in my implementation below as it is an extremely verbose log and we did not require it for the security events we wished to capture. Additionally I don’t want someone blindly copying the config below without understanding it and accidentally upsetting their environment.

Implementation Steps

VMware Appliance Management Interface (VAMI)

Step 1 – Connect to the VAMI interface for all vCenters and PSCs on HTTPS with port 5480

https://<appliancename>:5480

Step 2 – Configure Syslog with the following settings.

  • Common Log Level
    • Info
  • Remote Syslog Host
    • <vRLI-LoadBalancer-VIP>
  • Remote Syslog Port
    • 6514
  • Remote Syslog Protocol
    • TLS

vCenter Server Appliance

Step 1 – SSH to the VCSA and open the following file /etc/rsyslog.conf for editing.

vi /etc/rsyslog.conf

Step 2 – Press [Insert] to put vi into insert mode and add following entry towards the top of the file at the bottom of the ###### Module declarations ###### section.

$ModLoad imfile

Step 3 – Add the following right below the “###### Rule declarations
######” section of the rsyslog configuration file

$InputFileName /var/log/vmware/vpxd-svcs/vpxd-svcs.log
$InputFileTag vpxd-svcs
$InputFileStateFile vpxd-svcs
$InputFileSeverity info
$InputFileFacility local7
$InputRunFileMonitor
$InputFilePollInterval 20

$InputFileName specifies the log file that we want to forward.

$InputFileTag is the appname that will show up when it is forwarded to your remote syslog server

$InputFileStateFile is the log monitoring file.

$InputFilePollInterval is set 20 seconds, the default is 10 if you leave it blank.

Step 4 – Save your changes by pressing [Esc] and typing and pressing enter:

 :wq

Step 5 – Restart the rsyslog service in the VCSA for the changes to go into effect by running the following command:

systemctl restart rsyslog

Platform Services Controller Appliance

Step 1 – SSH to the PSC and open the following file /etc/rsyslog.conf for editing.

 vi /etc/rsyslog.conf

Step 2 – Press [Insert] to put vi into insert mode and add following entry towards the top of the file at the bottom of the ###### Module declarations ###### section.

$ModLoad imfile

Step 3 – Add the following right below the “###### Rule declarations ######” section of the rsyslog configuration file

$InputFileName /var/log/vmware/vpxd-svcs/vpxd-svcs.log
$InputFileTag vpxd-svcs
$InputFileStateFile vpxd-svcs
$InputFileSeverity info
$InputFileFacility local7
$InputRunFileMonitor
$InputFilePollInterval 20
  • $InputFileName specifies the log file that we want to forward.
  • $InputFileTag is the appname that will show up when it is forwarded to your remote syslog server
  • $InputFileStateFile is the log monitoring file.
  • $InputFilePollInterval is set 20 seconds, the default is 10 if you leave it blank.

Step 4 – Save your changes by pressing [Esc] and typing and pressing enter:

 :wq

Step 5 – Restart the rsyslog service in the VCSA for the changes to go into effect by running the following command:

systemctl restart rsyslog