Estimate the equivalent number of VMs able to be reclaimed by rightsizing using vRealize Operations Supermetrics

When planning performing rightsizing events on a customer’s estate I am usually requested to estimate the number of new VMs which could be placed into estate on the resources freed up by rightsizing.

This can be calculated relatively easily by hand, but who wants to do that when you can have something else do it for you, and even utilise it on a dashboard as a KPI

My customer in this example have a guideline they use for an average machine on their estate which is 4 vCPU and 32GB RAM.

So in the first example I will show the code with a fixed VM size.

This calculation uses the floor function to take the lowest of an array of numbers. More details here:

Estimate remaining VM Overhead using vROps – Advanced Super Metrics

The calculations it’s using here are the number of excess vCPU metric, divided by 4 vCPU for our guideline VM, and the amount of excess memory metric, convert from KB to GB and divided by 32GB RAM

Remember the depth setting allowing this supermetric to run at a higher grouping level such as vCenter or Custom Group

floor(min([((sum(${adaptertype=VMWARE, objecttype=VirtualMachine, metric=summary|oversized|vcpus, depth=5}))/4),(((sum(${adaptertype=VMWARE, objecttype=VirtualMachine, metric=summary|oversized|memory, depth=5}))/1048576)/32)]))

Now this can be further expanded by instead of using a fixed VM size, we could take the average VM size of the grouping we are running this supermetric against.

To do this we would replace the “4” and “32” with a calculation for average size

For vCPU this would be

avg(${adaptertype=VMWARE, objecttype=VirtualMachine, metric=config|hardware|num_Cpu, depth=5}) 

for RAM this would be

avg((${adaptertype=VMWARE, objecttype=VirtualMachine, metric=config|hardware|memoryKB, depth=5})/1048576)

so our full calculation for estimating how many of the average VM size could be reclaimed by rightsizing would be:

floor(min([((sum(${adaptertype=VMWARE, objecttype=VirtualMachine, metric=summary|oversized|vcpus, depth=5}))/avg(${adaptertype=VMWARE, objecttype=VirtualMachine, metric=config|hardware|num_Cpu, depth=5})),(((sum(${adaptertype=VMWARE, objecttype=VirtualMachine, metric=summary|oversized|memory, depth=5}))/1048576)/avg((${adaptertype=VMWARE, objecttype=VirtualMachine, metric=config|hardware|memoryKB, depth=5})/1048576)
)]))

Sizing your migration using vRealize Operations and Supermetrics

Today I’m going to talk about using vRealize Operations and Supermetrics to size your requirements for migrating from one estate to another.

I have a customer with a large sprawling legacy vSphere estate and they are planning their migration to a new VCF deployment using HCX.

They could simply keep everything the same size and purchase the appropriate number of nodes, however in this case that could become very expensive very quickly.

Luckily we have been monitoring the legacy estate with vROps 7.0 and 8.1 for the last year.

With this in mind I created a supermetric which would calculated the total number of hosts required if all the VMs were conservatively rightsized, which would reduce their resource allocation by up to 50%, based on the vROps analystics calculations for recommended size along with removing any idle VMs which are no longer required.

This supermetric works to a depth of 5 deep, which means that we can get a required number of hosts for a cluster level as well as a whole vCenter or even a custom group of multiple vCenters.

In my example my new hosts have 40 cores which we are allowing to over-allocate by up to 4:1 giving a maximum of 160 vCPU per host, along with 1.5TB of RAM which is not going to be over allocated.

Step One – Memory

(ceil(((sum(${adaptertype=VMWARE, objecttype=ClusterComputeResource, metric=mem|memory_allocated_on_all_vms, depth=5}))-sum(${adaptertype=VMWARE, objecttype=ClusterComputeResource, metric=reclaimable|idle_vms|mem, depth=5})-sum(${adaptertype=VMWARE, objecttype=VirtualMachine, metric=summary|oversized|memory, depth=5}))/1574400000)+1)

This first calculation takes the total memory allocated on a cluster, removes the memory reclaimable from deleting idle VMs, and removes the total of memory able to be reclaimed by rightsizing the VMs.

This number is then divided by the amount of memory available in each host in kB

This number is then rounded up by using the CEIL function. More details on that here:

Estimate remaining VM Overhead using vROps – Advanced Super Metrics

Finally an additional host is added to this number to allow for N+1 High Availability. This can be set to your requirements.

Step Two – CPU

(ceil(((sum(${adaptertype=VMWARE, objecttype=ClusterComputeResource, metric=cpu|vcpus_allocated_on_all_vms, depth=5}))-sum(${adaptertype=VMWARE, objecttype=ClusterComputeResource,  metric=reclaimable|idle_vms|cpu, depth=5})-sum(${adaptertype=VMWARE, objecttype=VirtualMachine, metric=summary|oversized|vcpus, depth=5}))/(4*(40)))+1)

Similar to the memory calculation above, this takes the total number of vCPUs allocated on a cluster, removes the vCPUs able to be reclaimed from deleting idle VMs, and removes the total number of vCPUs able to be reclaimed by rightsizing the VMs.

This number is then divided by the number of cores available in each host multiplied by our maximum over-allocation of 4:1

Again this is rounded up using a CEIL function and then an additional host added for HA.

Step Three – Wrapping it up with a MAX function

This is the final super metric formula, which take the two calculations above and puts them into an array with the max function used to take the highest value to ensure we get the correct number of hosts.

This function has the following format:

max( [ calc1 , calc2 , … calcN ] )

You may spot that I have added a “3” as the third number, this is to ensure that the super metric never recommends a cluster size of less than three hosts.

max([(ceil(((sum(${adaptertype=VMWARE, objecttype=ClusterComputeResource, metric=mem|memory_allocated_on_all_vms, depth=5}))-sum(${adaptertype=VMWARE, objecttype=ClusterComputeResource, metric=reclaimable|idle_vms|mem, depth=5})-sum(${adaptertype=VMWARE, objecttype=VirtualMachine, metric=summary|oversized|memory, depth=5}))/1574400000)+1),(ceil(((sum(${adaptertype=VMWARE, objecttype=ClusterComputeResource, metric=cpu|vcpus_allocated_on_all_vms, depth=5}))-sum(${adaptertype=VMWARE, objecttype=ClusterComputeResource,  metric=reclaimable|idle_vms|cpu, depth=5})-sum(${adaptertype=VMWARE, objecttype=VirtualMachine, metric=summary|oversized|vcpus, depth=5}))/(4*(40)))+1),3])

IF Function in vROps Super Metrics aka Ternary Expressions

vRealize Operations. Using vROps Super Metric Ternary Expressions IF Function

Have you ever just wanted an IF Function when creating Super Metrics? Good news, there is one!

Leading on from the last post I did on determining the number of VMs which will fit into cluster, I have decided to further expand it with an IF function to take the Host Admission Policy failure to tolerate level into account as well.

Previously we used a flat 20% overhead as that was the company policy, however that reserved way too many resources on larger clusters, and setting it to a flat two host failures

We wanted to set any Cluster Compute Resource with less than 10 hosts, to only allow for a single host failure, but clusters of 10 and above should allow for two host failures.

In vROps terms this requires a Ternary Expression, or as most people know them, an IF Function.

You can use the ternary operator in an expression to run conditional expressions in the same way you would an IF Function.

This is done in the format:

expression_condition ? expression_if_true : expression_if_false.

So for our example we want to take the metric summary|total_number_hosts and check if the number of hosts is less than 10.

This means our expression condition is:

${this, metric=summary|total_number_hosts}<10

as we want to return a “1” for one host failure if this is true, and “2” for two host failures if it’s 10 or more our full expression is:

(${this, metric=summary|total_number_hosts}<10?1:2)

This means our full code is:

floor(min([(((((${this, metric=cpu|corecount_provisioned})-(((${this, metric=cpu|corecount_provisioned})/${this, metric=summary|total_number_hosts}))*(${this, metric=summary|total_number_hosts}<10?1:2))*4)-(${this, metric=cpu|vcpus_allocated_on_all_vms}))/8),(((((${this, metric=mem|host_provisioned})*((${this, metric=mem|host_provisioned}/${this, metric=summary|total_number_hosts})*(${this, metric=summary|total_number_hosts}<10?1:2)))-(${this, metric=mem|memory_allocated_on_all_vms, depth=1}))/1048576)/32),((((${this, metric=diskspace|total_capacity})*0.7-(${this, metric=diskspace|total_provisioned, depth=1}))/1.33)/(500+32))]))

Estimate remaining VM Overhead using vROps – Advanced Super Metrics

vRealize Operations. Using vROps to estimate remaining VM Overhead

I have a client using vROps 7 quite extensively, however they were still running a manual API Query to create a report on how many VMs of a certain size they could fit into their estate based on Allocation, which of course has been removed in 6.7 and 7.0. Running API Queries across their whole estate is a slow process, so they are interested in using vROps to estimate remaining VM overhead on a cluster.

Luckily this can be solved with a Super Metric.

First we need to calculate how many vCPU are available in total in the cluster, which is determined by the total number of cores multiplied by the overallocation ratio (4:1 here) and removing a buffer, in this case we are using 20% (80% remaining), but this can be set as the core count of one host if you prefer.

Then we remove the number of vCPUs that have been allocated to all the VMs.

Finally we divide by the number of vCPUs our template VM has. Two in this case.

(((((${this, metric=cpu|corecount_provisioned})*0.8)*4)-(${this, metric=cpu|vcpus_allocated_on_all_vms}))/2)

Next we need to determine the available RAM in total in the cluster, which is determined by the total RAM minus a buffer, again this can be equivalent to one host if prefered.

We then need to remove the RAM allocated to all the VMs.

Next we need to then divide this value by 1048576 to convert from KB to GB

And then we divide by the number of GB of RAM our VM has. We are using 4GB here.

((((${this, metric=mem|host_provisioned})*0.8-(${this, metric=mem|memory_allocated_on_all_vms, depth=1}))/1048576)/4)

For our last calculation, we need to determine the Storage by taking the total storage capacity, removing our buffer and removing the total usage. You could also use the total allocated if you don’t want to over provision storage. If you are using vSAN you can add in the vSAN replica storage as well. 2x for RAID1, 1.33x for Erasure Coding FTT=1 (AKA RAID5) and 1.5x for Erasure Coding FTT=2 (AKA RAID6). We are using RAID5 in this example.

We then divide this by either the size of the VMDK HDD or an average utilisation depending on your policy. We are using 80GB here for calculation purposes.

((((${this, metric=diskspace|total_capacity})*0.7-(${this, metric=diskspace|total_usage, depth=1}))/1.33)/80)

Now we have our three calculations we need to use some advanced Super Metric functions to chose the calculation with the lowest number, as that will be the driving factor on what will fit in the cluster.

This is done with the function “MIN” and feeding in an array

min([FormulaA,FormulaB,FormulaC])

Now we have the minimum number of VMs which will fit, we need to round down that number, because nobody cares that 67.432 VMs could fit in the cluster, they want to know that 67 VMs will fit. Luckily there is another function for that – “FLOOR”. This is similar to ROUNDDOWN in that it give you the whole value.

floor(formula) 

FYI “CEIL” is equivalent to ROUNDUP if you want the value to be rounded up.

Now we tie these all together to get our full calculation.

floor(min([(((((${this, metric=cpu|corecount_provisioned})*0.8)*4)-(${this, metric=cpu|vcpus_allocated_on_all_vms}))/2),((((${this, metric=mem|host_provisioned})*0.8-(${this, metric=mem|memory_allocated_on_all_vms, depth=1}))/1048576)/4),((((${this, metric=diskspace|total_capacity})*0.7-(${this, metric=diskspace|total_usage, depth=1}))/1.33)/80)]))

Now clone this to estimate remaining VM Overhead for each T-Shirt size you offer.

Update March 2020

I have further updated this super metric to use total provisioned for the storage when in use with vSAN or other thin provisioned datastores as well as also taking Swap size into account, and changed the overhead from a flat 20% to the equivalent of two hosts.

This section will take the total core count, and then remove the total core count divided by the number of hosts and multiply by the number of host failures to allow in a cluster (2 in this case), and then multiply by the vCPU to Core overallocation ratio (4:1 in this case).

(((${this, metric=cpu|corecount_provisioned})-(((${this, metric=cpu|corecount_provisioned})/${this, metric=summary|total_number_hosts}))*2)*4)

As before we then remove the total number of vCPUs allocated on all VMs and divide by the number of vCPUs in your VM.

I have done the same calculation for RAM as well

((((${this, metric=mem|host_provisioned})*((${this, metric=mem|host_provisioned}/${this, metric=summary|total_number_hosts})*2))

For storage I have changed to using the metric “diskspace|total_provisioned” instead of “diskspace|total_used” and added the memory size on top of the HDD size (500GB HDD plus 32GB Swap)

((((${this, metric=diskspace|total_capacity})*0.7-(${this, metric=diskspace|total_provisioned, depth=1}))/1.33)/(500+32))

This is the final super metric for all compute metrics.

floor(min([(((((${this, metric=cpu|corecount_provisioned})-(((${this, metric=cpu|corecount_provisioned})/${this, metric=summary|total_number_hosts}))*2)*4)-(${this, metric=cpu|vcpus_allocated_on_all_vms}))/8),(((((${this, metric=mem|host_provisioned})*((${this, metric=mem|host_provisioned}/${this, metric=summary|total_number_hosts})*2))-(${this, metric=mem|memory_allocated_on_all_vms, depth=1}))/1048576)/32),((((${this, metric=diskspace|total_capacity})*0.7-(${this, metric=diskspace|total_provisioned, depth=1}))/1.33)/(500+32))]))

This code is also submitted to VMware {code} Sample Exchange

https://code.vmware.com/samples/6996/estimate-remaining-vm-overhead-using-vrealize-operations#

Update Part Deux

I have further refined this Super Metric to account for different cluster sizes. Details are here:

Now it will allow you to have a hosts to failure of one host in clusters under 10 hosts, and two hosts to fail in clusters of 10 or more

The updated code is:

floor(min([(((((${this, metric=cpu|corecount_provisioned})-(((${this, metric=cpu|corecount_provisioned})/${this, metric=summary|total_number_hosts}))*(${this, metric=summary|total_number_hosts}<10?1:2))*4)-(${this, metric=cpu|vcpus_allocated_on_all_vms}))/8),(((((${this, metric=mem|host_provisioned})*((${this, metric=mem|host_provisioned}/${this, metric=summary|total_number_hosts})*(${this, metric=summary|total_number_hosts}<10?1:2)))-(${this, metric=mem|memory_allocated_on_all_vms, depth=1}))/1048576)/32),((((${this, metric=diskspace|total_capacity})*0.7-(${this, metric=diskspace|total_provisioned, depth=1}))/1.33)/(500+32))]))