Using the Nutanix v4 API Python SDK to extract performance metrics

In this post, I’ll show how to fetch stats for entities from the Nutanix v4 API using the Python SDK and explain how to put together a script with arguments that can produce dynamic graphs and csv exports of the retrieved data.

Extracting metrics from the Nutanix v4 API has changed quite a bit. In previous API versions, metrics would often be part of the entity payload and would include last values.

If you wanted to extract metrics for a specific time range, entity types had a stats endpoint you could use with parameters to do that.

In v4, the process to extract metrics for entities is the following:

  1. You initialize an API client for the ntnx_aiops_py_client SDK module,
  2. You fetch the available sources uuids from StatsApi using the get_sources_v4 function,
  3. You fetch entity type names and uuids from StatsApi using the get_entity_types_v4 function and the source uuid you want,
  4. You fetch available metrics for each entity type from StatsApi using the get_entity_descriptors_v4 function and the source uuid you want.

At this stage, you know which metrics are available for which entity types.

Next, you’ll need to initialize an API client for the module that contains actual stats for the entity you are interested in.

This table shows which module and which stats endpoints are available in the v4 API at the time of writing:

modulenamespaceendpointdescriptionmethod
ntnx_vmm_py_clientvmmvmm/v4.0/ahv/stats/vmsList VM stats for all VMs. Users can fetch the stats by specifying the following params in the request query: 1) ‘$select’: comma-separated attributes with the prefix ‘stats/’, e.g. ‘stats/controllerNumIo,stats/hypervisorNumIo’. 2) ‘$startTime’: the start time for which stats should be reported, e.g. ‘2023-01-01T12:00:00.000-08:00’; 3) ‘$endTime’: the end time for which stats should be reported; 4) ‘$samplingInterval’: the sampling interval in seconds at which statistical data should be collected; 5) ‘$statType’: the down-sampling operator to use while performing down-sampling on stats data; 6) ‘$orderby’; 7) ‘$page’; 8) ‘$limit’; and 9) ‘$filter’: the OData filter to use, e.g. ‘stats/hypervisorCpuUsagePpm gt 100000 and stats/guestMemoryUsagePpm lt 2000000.’GET
ntnx_vmm_py_clientvmmvmm/v4.0/ahv/stats/vms/:extIdGet VM stats for a given VM. Users can fetch the stats by specifying the following params in the request query: 1) ‘$select’: comma-separated attributes with the prefix ‘stats/’, e.g. ‘stats/checkScore’. 2) ‘$startTime’: the start time for which stats should be reported, e.g. ‘2023-01-01T12:00:00.000-08:00’; 3) ‘$endTime’: the end time for which stats should be reported; 4) ‘$samplingInterval’: the sampling interval in seconds at which statistical data should be collected; 5) ‘$statType’: the down-sampling operator to use while performing down-sampling on stats dataGET
ntnx_vmm_py_clientvmmvmm/v4.0/ahv/stats/vms/:vmExtId/disks/:extIdFetches the stats for the specified VM disk. Users can fetch the stats by specifying the following params in the request query: 1) ‘$select’: comma-separated attributes with the prefix ‘stats/’, e.g. ‘stats/checkScore’. 2) ‘$startTime’: the start time for which stats should be reported, e.g. ‘2023-01-01T12:00:00.000-08:00’; 3) ‘$endTime’: the end time for which stats should be reported; 4) ‘$samplingInterval’: the sampling interval in seconds at which statistical data should be collected; 5) ‘$statType’: the down-sampling operator to use while performing down-sampling on stats dataGET
ntnx_vmm_py_clientvmmvmm/v4.0/ahv/stats/vms/:vmExtId/nics/:extIdFetches the stats for the specified VM NIC. Users can fetch the stats by specifying the following params in the request query: 1) ‘$select’: comma-separated attributes with the prefix ‘stats/’, e.g. ‘stats/checkScore’. 2) ‘$startTime’: the start time for which stats should be reported, e.g. ‘2023-01-01T12:00:00.000-08:00’; 3) ‘$endTime’: the end time for which stats should be reported; 4) ‘$samplingInterval’: the sampling interval in seconds at which statistical data should be collected; 5) ‘$statType’: the down-sampling operator to use while performing down-sampling on stats dataGET
ntnx_vmm_py_clientvmmvmm/v4.0/esxi/stats/vmsList VM stats for all VMs. Users can fetch the stats by specifying the following params in the request query: 1) ‘$select’: comma-separated attributes with the prefix ‘stats/’, e.g. ‘stats/controllerNumIo,stats/hypervisorNumIo’. 2) ‘$startTime’: the start time for which stats should be reported, e.g. ‘2023-01-01T12:00:00.000-08:00’; 3) ‘$endTime’: the end time for which stats should be reported; 4) ‘$samplingInterval’: the sampling interval in seconds at which statistical data should be collected; 5) ‘$statType’: the down-sampling operator to use while performing down-sampling on stats data; 6) ‘$orderby’; 7) ‘$page’; 8) ‘$limit’; and 9) ‘$filter’: the OData filter to use, e.g. ‘stats/hypervisorCpuUsagePpm gt 100000 and stats/guestMemoryUsagePpm lt 2000000.’GET
ntnx_vmm_py_clientvmmvmm/v4.0/esxi/stats/vms/:extIdGet VM stats for a given VM. Users can fetch the stats by specifying the following params in the request query: 1) ‘$select’: comma-separated attributes with the prefix ‘stats/’, e.g. ‘stats/checkScore’. 2) ‘$startTime’: the start time for which stats should be reported, e.g. ‘2023-01-01T12:00:00.000-08:00’; 3) ‘$endTime’: the end time for which stats should be reported; 4) ‘$samplingInterval’: the sampling interval in seconds at which statistical data should be collected; 5) ‘$statType’: the down-sampling operator to use while performing down-sampling on stats dataGET
ntnx_vmm_py_clientvmmvmm/v4.0/esxi/stats/vms/:vmExtId/nics/:extIdFetches the stats for the specified VM NIC. Users can fetch the stats by specifying the following params in the request query: 1) ‘$select’: comma-separated attributes with the prefix ‘stats/’, e.g. ‘stats/checkScore’. 2) ‘$startTime’: the start time for which stats should be reported, e.g. ‘2023-01-01T12:00:00.000-08:00’; 3) ‘$endTime’: the end time for which stats should be reported; 4) ‘$samplingInterval’: the sampling interval in seconds at which statistical data should be collected; 5) ‘$statType’: the down-sampling operator to use while performing down-sampling on stats dataGET
ntnx_vmm_py_clientvmmvmm/v4.0/esxi/stats/vms/:vmExtId/disks/:extIdFetches the stats for the specified VM disk. Users can fetch the stats by specifying the following params in the request query: 1) ‘$select’: comma-separated attributes with the prefix ‘stats/’, e.g. ‘stats/checkScore’. 2) ‘$startTime’: the start time for which stats should be reported, e.g. ‘2023-01-01T12:00:00.000-08:00’; 3) ‘$endTime’: the end time for which stats should be reported; 4) ‘$samplingInterval’: the sampling interval in seconds at which statistical data should be collected; 5) ‘$statType’: the down-sampling operator to use while performing down-sampling on stats dataGET
ntnx_networking_py_clientnetworkingnetworking/v4.0/stats/layer2-stretches/:extIdGet Layer2Stretch statistics.GET
ntnx_networking_py_clientnetworkingnetworking/v4.0/stats/load-balancer-sessions/:extIdGet load balancer session listener and target statisticsGET
ntnx_networking_py_clientnetworkingnetworking/v4.0/stats/routing-policies/$actions/clearClear the value in packet and byte counters of all Routing Policies in the chosen VPC or a particular routing policy in the chosen VPC.POST
ntnx_networking_py_clientnetworkingnetworking/v4.0/stats/traffic-mirrors/:extIdGet Traffic mirror session statistics.GET
ntnx_networking_py_clientnetworkingnetworking/v4.0/stats/vpc/:vpcExtId/external-subnets/:extIdGet VPC North-South statistics.GET
ntnx_networking_py_clientnetworkingnetworking/v4.0/stats/vpn-connections/:extIdGet VPN connection statistics.GET
ntnx_aiops_py_clientaiopsaiops/v4.0/stats/sources/:sourceExtId/entities/:extIdReturns a list of attributes and metrics (time series data) that are available for a given entity type.GET
ntnx_aiops_py_clientaiopsaiops/v4.0/stats/scenarios/:extIdGet the statistics data of the WhatIf Scenario identified by the provided ExtId.GET
ntnx_clustermgmt_py_clientclustermgmtclustermgmt/v4.0/stats/clusters/:extIdGet the statistics data of the cluster identified by {clusterExtId}.GET
ntnx_clustermgmt_py_clientclustermgmtclustermgmt/v4.0/stats/clusters/:clusterExtId/hosts/:extIdGet the statistics data of the host identified by {hostExtId} belonging to the cluster identified by {clusterExtId}.GET
ntnx_clustermgmt_py_clientclustermgmtclustermgmt/v4.0/stats/disks/:extIdFetch the stats information of the Disk identified by external identifier.GET
ntnx_clustermgmt_py_clientclustermgmtclustermgmt/v4.0/stats/storage-containers/:extIdFetches the statistical information for the Storage Container identified by external identifier..GET
ntnx_volumes_py_clientvolumesvolumes/v4.0/stats/volume-groups/:extIdQuery the Volume Group stats identified by {extId}.GET
ntnx_volumes_py_clientvolumesvolumes/v4.0/stats/volume-groups/:volumeGroupExtId/disks/:extIdQuery the Volume Disk stats identified by {diskExtId}.GET

Regardless of which module you have to query for stats, every stats endpoint with a GET method will require the following inputs:

  1. start time and end time: this is a date and time in ISO-8601 format. What’s that you day? To get the correct format in python, use something like this: start_time = (datetime.datetime.now(datetime.timezone.utc)).isoformat(). This assumes of course that you have imported the datetime module with import datetime.
  2. select: this can be * if you want all available metrics or a list of metrics you want to retrieve (you got the list of available metrics from the aiops module above, remember?)
  3. stat type: this is one of the following:
    • AVG“: Aggregation indicating mean or average of all values.
    • MIN“: Aggregation containing lowest of all values.
    • MAX“: Aggregation containing highest of all values.
    • LAST“: Aggregation containing only the last recorded value.
    • SUM“: Aggregation with sum of all values.
    • COUNT“: Aggregation containing total count of values.
  4. sampling interval: this is an integer indicating in seconds the sampling interval (5 for 5 seconds, 30 for 30 seconds, etc…)

Some endpoints will also let you specify an OData filter (such as a vm name). An example of a query filter using an entity uuid would be:

query_filter = "extId eq 'b42889c2-1d60-4fde-b192-37c52263a086'"

Now that we have established the ground rules, let’s walk thru a code example.

We’ll start by querying the API to see which metrics are available for each entity type. Most of the concepts we’ll use in this example have been explained in details in parts 1 and 2 covering the basics of how to use the Nutanix v4 API with the Python SDK.

Print statements in the code sample use a Python class to display output in different colors and use different modules to display timestamps as well, so here is the overall modules we need to import for now as well as the code for that Python class:

Python
from concurrent.futures import ThreadPoolExecutor, as_completed

import math
import time
import datetime
import argparse
import getpass

from humanfriendly import format_timespan

import urllib3
import pandas as pd
import keyring
import tqdm

import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

import ntnx_aiops_py_client
import ntnx_vmm_py_client

class PrintColors:
    """Used for colored output formatting.
    """
    OK = '\033[92m' #GREEN
    SUCCESS = '\033[96m' #CYAN
    DATA = '\033[097m' #WHITE
    WARNING = '\033[93m' #YELLOW
    FAIL = '\033[91m' #RED
    STEP = '\033[95m' #PURPLE
    RESET = '\033[0m' #RESET COLOR

Note that we’ll use all those modules eventually, including plotly to generate dynamic graphs of the metrics we’ll collect. The overall goal of the script will be to generate graphs for a number of specified virtual machines for a specified period of time.

First, we need to find out which sources are available from the API:

Python
import ntnx_aiops_py_client

#* initialize variable for API client configuration
api_client_configuration = ntnx_aiops_py_client.Configuration()
api_client_configuration.host = api_server
api_client_configuration.username = username
api_client_configuration.password = secret

if secure is False:
    #! suppress warnings about insecure connections
    urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
    #! suppress ssl certs verification
    api_client_configuration.verify_ssl = False

#* getting list of sources
client = ntnx_aiops_py_client.ApiClient(configuration=api_client_configuration)
entity_api = ntnx_aiops_py_client.StatsApi(api_client=client)
print(f"{PrintColors.OK}{(datetime.datetime.now()).strftime('%Y-%m-%d %H:%M:%S')} [INFO] Fetching available sources...{PrintColors.RESET}")
response = entity_api.get_sources_v4() 
source_ext_id = next(iter([source.ext_id for source in response.data if source.source_name == 'nutanix']))

Note here that what we are ultimately after if the extId (or uuid) of the nutanix source. We’re also using variables like api_server, username or secret which are assumed to be arguments of the script. The final version of the script will have all of this baked in, so bear with me.

Once we have that information, we’ll want to fetch descriptors from the API which will tell us what metrics are available for each entity type.

We’ll want that process to be multi-threaded in case there are a lot of pages of data to retrieve from the API, so step one will be to come up with a function that we’ll be able to leverage with the concurrent module.

Python
def fetch_entity_descriptors(client,source_ext_id,page,limit=50):
    '''fetch_entity_descriptors function.
        Args:
            client: a v4 Python SDK client object.
            source_ext_id: uuid of a valid source.
            page: page number to fetch.
            limit: number of entities to fetch.
        Returns:
    '''
    entity_api = ntnx_aiops_py_client.StatsApi(api_client=client)
    response = entity_api.get_entity_descriptors_v4(sourceExtId=source_ext_id,_page=page,_limit=limit)
    return response

We then proceed to use this function with the concurrent module (combined here with tqdm so that we have a nive progress bar):

Python
#* getting entities and metrics descriptor for nutanix source
print(f"{PrintColors.OK}{(datetime.datetime.now()).strftime('%Y-%m-%d %H:%M:%S')} [INFO] Fetching entities and descriptors for source nutanix...{PrintColors.RESET}")
entity_list=[]
response = entity_api.get_entity_descriptors_v4(sourceExtId=source_ext_id,_page=0,_limit=1)
total_available_results=response.metadata.total_available_results
page_count = math.ceil(total_available_results/limit)
with tqdm.tqdm(total=page_count, desc="Fetching pages") as progress_bar:
    with ThreadPoolExecutor(max_workers=10) as executor:
        futures = [executor.submit(
                fetch_entity_descriptors,
                client=client,
                source_ext_id=source_ext_id,
                page=page_number,
                limit=limit
            ) for page_number in range(0, page_count, 1)]
        for future in as_completed(futures):
            try:
                entities = future.result()
                entity_list.extend(entities.data)
            except Exception as e:
                print(f"{PrintColors.WARNING}{(datetime.datetime.now()).strftime('%Y-%m-%d %H:%M:%S')} [WARNING] Task failed: {e}{PrintColors.RESET}")
            finally:
                progress_bar.update(1)
entity_descriptors_list = entity_list

Note how we first retrieve the total number of pages and then loop thru each page to eventually build entity_list which we then assign to entity_descriptors_list.

We now need to display this information:

Python
descriptors={}
for item in entity_descriptors_list:
    entity_type = item.entity_type
    descriptors[entity_type] = {}
    for metric in item.metrics:
        metric_name = metric.name
        descriptors[entity_type][metric_name] = {}
        descriptors[entity_type][metric_name]['name'] = metric.name
        descriptors[entity_type][metric_name]['value_type'] = metric.value_type
        if metric.additional_properties is not None:
            descriptors[entity_type][metric_name]['description'] = next(iter([metric_property.value for metric_property in metric.additional_properties if metric_property.name == 'description']),None)
        else:
            descriptors[entity_type][metric_name]['description'] = None
for entity_type in descriptors.keys():
    print(f"{PrintColors.OK}{(datetime.datetime.now()).strftime('%Y-%m-%d %H:%M:%S')} [INFO] Available metrics for {entity_type} are:{PrintColors.RESET}")
    for metric in sorted(descriptors[entity_type]):
        print(f"    {descriptors[entity_type][metric]['name']},{descriptors[entity_type][metric]['value_type']},{descriptors[entity_type][metric]['description']}")

Note how for each entity type, we display the internal metric name, its data type, as well as the description if there is one available.

Using this code, we can get a list similar to this (here for entity type vm):

checkScore,INT,None
cluster,STRING,None
controllerAvgIoLatencyMicros,INT,I/O latency in milliseconds from the Storage Controller.
controllerAvgReadIoLatencyMicros,INT,Storage Controller read latency in milliseconds.
controllerAvgReadIoSizeKb,INT,None
controllerAvgWriteIoLatencyMicros,INT,Storage Controller write latency in milliseconds.
controllerAvgWriteIoSizeKb,INT,None
controllerIoBandwidthKbps,INT,Data transferred in KB/second from the Storage Controller.
controllerNumIo,INT,None
controllerNumIops,INT,Input/Output operations per second from the Storage Controller.
controllerNumRandomIo,INT,None
controllerNumReadIo,INT,None
controllerNumReadIops,INT,Input/Output read operations per second from the Storage Controller
controllerNumSeqIo,INT,None
controllerNumWriteIo,INT,None
controllerNumWriteIops,INT,Input/Output write operations per second from the Storage Controller
controllerOplogDrainDestHddBytes,INT,None
controllerOplogDrainDestSsdBytes,INT,None
controllerRandomIoPpm,INT,None
controllerReadIoBandwidthKbps,INT,Read data transferred in KB/second from the Storage Controller.
controllerReadIoPpm,INT,Percent of Storage Controller IOPS that are reads.
controllerReadSourceEstoreHddLocalBytes,INT,None
controllerReadSourceEstoreHddRemoteBytes,INT,None
controllerReadSourceEstoreSsdLocalBytes,INT,None
controllerReadSourceEstoreSsdRemoteBytes,INT,None
controllerReadSourceOplogBytes,INT,None
controllerSeqIoPpm,INT,None
controllerSharedUsageBytes,INT,Shared Data usage
controllerSnapshotUsageBytes,INT,Snapshot usage Bytes
controllerStorageTierSsdUsageBytes,INT,None
controllerTimespanMicros,INT,None
controllerTotalIoSizeKb,INT,None
controllerTotalIoTimeMicros,INT,None
controllerTotalReadIoSizeKb,INT,None
controllerTotalReadIoTimeMicros,INT,None
controllerTotalTransformedUsageBytes,INT,None
controllerUserBytes,INT,Disk Usage Bytes
controllerWriteDestEstoreHddBytes,INT,None
controllerWriteDestEstoreSsdBytes,INT,None
controllerWriteIoBandwidthKbps,INT,Write data transferred in KB/second from the Storage Controller.
controllerWriteIoPpm,INT,Percent of Storage Controller IOPS that are writes.
controllerWss120SecondReadMb,INT,None
controllerWss120SecondUnionMb,INT,None
controllerWss120SecondWriteMb,INT,None
controllerWss3600SecondReadMb,INT,Read I/O working set size
controllerWss3600SecondUnionMb,INT,I/O working set size
controllerWss3600SecondWriteMb,INT,Write I/O working set size
diskCapacityBytes,INT,None
diskUsagePpm,INT,Disk Usage in percentage
frameBufferUsagePpm,INT,Usage of the GPU's framebuffer
gpuUsagePpm,INT,Usage of the GPU
guestMemoryUsagePpm,INT,None
hypervisorAvgIoLatencyMicros,INT,None
hypervisorCpuReadyTimePpm,INT,Hypervisor CPU ready time
hypervisorCpuUsagePpm,INT,Percent of CPU used by the hypervisor.
hypervisorIoBandwidthKbps,INT,None
hypervisorMemoryBalloonReclaimTargetBytes,INT,Memory Swap Out Rate
hypervisorMemoryBalloonReclaimedBytes,INT,Memory Balloon Bytes
hypervisorMemoryUsagePpm,INT,Hypervisor Memory Usage percentage
hypervisorNumIo,INT,None
hypervisorNumIops,INT,None
hypervisorNumReadIo,INT,None
hypervisorNumReadIops,INT,None
hypervisorNumReceivePacketsDropped,INT,Network Receive Packets Dropped
hypervisorNumReceivedBytes,INT,Write data transferred in KB/second from the Storage Controller.
hypervisorNumTransmitPacketsDropped,INT,Network Transmit Packets Dropped
hypervisorNumTransmittedBytes,INT,Write data transferred per second in KB/second.
hypervisorNumWriteIo,INT,None
hypervisorNumWriteIops,INT,None
hypervisorReadIoBandwidthKbps,INT,None
hypervisorSwapInRateKbps,INT,Memory Swap In Rate
hypervisorSwapOutRateKbps,INT,Memory Swap Out Rate
hypervisorTimespanMicros,INT,None
hypervisorTotalIoSizeKb,INT,None
hypervisorTotalIoTimeMicros,INT,None
hypervisorTotalReadIoSizeKb,INT,None
hypervisorType,STRING,None
hypervisorVmRunningTimeUsecs,INT,None
hypervisorWriteIoBandwidthKbps,INT,None
memoryReservedBytes,INT,None
memoryUsageBytes,INT,None
memoryUsagePpm,INT,Percent of memory used by the VM.
numVcpusUsedPpm,INT,None

Now that we know which metrics are available, we can focus on actually retrieving these metrics for one or more vm entities and do something with them (such as build graphs).

As usual, we’ll need to initialize an API client for the vmm module:

Python
#* initialize variable for API client configuration
api_client_configuration = ntnx_vmm_py_client.Configuration()
api_client_configuration.host = api_server
api_client_configuration.username = username
api_client_configuration.password = secret

if secure is False:
    #! suppress warnings about insecure connections
    urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
    #! suppress ssl certs verification
    api_client_configuration.verify_ssl = False

client = ntnx_vmm_py_client.ApiClient(configuration=api_client_configuration)

Now that we have our client, we can get our vm entity from the API:

Python
#* fetch vm object to figure out extId
entity_api = ntnx_vmm_py_client.VmApi(api_client=client)
query_filter = f"name eq '{vm}'"
response = entity_api.list_vms(_filter=query_filter)
vm_uuid = response.data[0].ext_id

Note how we use a filter here to grab only the vm we’re interested in. What we’re really after is its extId/uuid, which we can then use to grab metrics:

Python
#* fetch metrics for vm
entity_api = ntnx_vmm_py_client.StatsApi(api_client=client)
start_time = (datetime.datetime.now(datetime.timezone.utc)-datetime.timedelta(minutes=minutes_ago)).isoformat()
end_time = (datetime.datetime.now(datetime.timezone.utc)).isoformat()
response = entity_api.get_vm_stats_by_id(vm_uuid, _startTime=start_time, _endTime=end_time, _samplingInterval=sampling_interval, _statType=stat_type, _select='*')
vm_stats = [stat for stat in response.data.stats if stat.cluster is None]

we’re figuring out start and end time here using a number of minutes we want to look back at (the minutes_ago variable) and we’re grabbing all available metrics for that time period with _select='*'.

We’re then removing any results that match cluster information with the statement on line 6 since that contains the cluster uuid and would prevent us from creating graphs later.

Next, we’ll build pandas dataframe from that data so that we can easily create graphs with plotly later:

Python
#* building pandas dataframe from the retrieved data
data_points = []
for data_point in vm_stats:
    data_points.append(data_point.to_dict())
df = pd.DataFrame(data_points)
df = df.set_index('timestamp')
df.drop('_reserved', axis=1, inplace=True)
df.drop('_object_type', axis=1, inplace=True)
df.drop('_unknown_fields', axis=1, inplace=True)
df.drop('cluster', axis=1, inplace=True)
df.drop('hypervisor_type', axis=1, inplace=True)

Note that we’re converting the retrieved data to Python dict with the .to_dict() function and that we are dropping a number of columns we won’t use in our graph anyway.

Now we’re building graphs (multiple on one page) with plotly with that dataframe:

Python
#* building graphs
df = df.dropna(subset=['disk_usage_ppm'])
df['disk_usage'] = (df['disk_usage_ppm'] / 10000).round(2)
df = df.dropna(subset=['memory_usage_ppm'])
df['memory_usage'] = (df['memory_usage_ppm'] / 10000).round(2)
df = df.dropna(subset=['hypervisor_cpu_usage_ppm'])
df['hypervisor_cpu_usage'] = (df['hypervisor_cpu_usage_ppm'] / 10000).round(2)
df = df.dropna(subset=['hypervisor_cpu_ready_time_ppm'])
df['hypervisor_cpu_ready_time'] = (df['hypervisor_cpu_ready_time_ppm'] / 10000).round(2)

fig = make_subplots(rows=2, cols=2,
        subplot_titles=(f"{vm} Overview", f"{vm} Storage IOPS", f"{vm} Storage Bandwidth", f"{vm} Storage Latency"),
        x_title="Time")  # Shared x-axis title
# Subplot 1: Overview
y_cols1 = ["hypervisor_cpu_usage", "hypervisor_cpu_ready_time", "memory_usage", "disk_usage"]
for y_col in y_cols1:
    fig.add_trace(go.Scatter(x=df.index, y=df[y_col], hovertemplate="%{x}<br>%%{y}", name=y_col, mode='lines', legendgroup='group1'), row=1, col=1)
fig.update_yaxes(title_text="% Utilized", range=[0, 100], row=1, col=1)
# Subplot 2: Storage IOPS
y_cols2 = ["controller_num_iops", "controller_num_read_iops", "controller_num_write_iops"]
for y_col in y_cols2:
    fig.add_trace(go.Scatter(x=df.index, y=df[y_col], hovertemplate="%{x}<br>%{y} iops", name=y_col, mode='lines', legendgroup='group2'), row=1, col=2)
fig.update_yaxes(title_text="IOPS", row=1, col=2)
# Subplot 3: Storage Bandwidth
y_cols3 = ["controller_io_bandwidth_kbps", "controller_read_io_bandwidth_kbps", "controller_write_io_bandwidth_kbps"]
for y_col in y_cols3:
    fig.add_trace(go.Scatter(x=df.index, y=df[y_col], hovertemplate="%{x}<br>%{y} kbps", name=y_col, mode='lines', legendgroup='group3'), row=2, col=1)
fig.update_yaxes(title_text="Kbps", row=2, col=1)
# Subplot 4: Storage Latency
y_cols4 = ["controller_avg_io_latency_micros", "controller_avg_read_io_latency_micros", "controller_avg_write_io_latency_micros"]
for y_col in y_cols4:
    fig.add_trace(go.Scatter(x=df.index, y=df[y_col], hovertemplate="%{x}<br>%{y} usec", name=y_col, mode='lines', legendgroup='group4'), row=2, col=2)
fig.update_yaxes(title_text="Microseconds", row=2, col=2)
fig.update_layout(height=800, legend_title_text="Metric") # Shared legend title
fig.show()

First we’re manipulating data a bit from lines 2 to 9 by removing null values and performing math calculation on the ppm metrics so that we can display percentages.

We’re then creating subplots (different graphs on the same page) from lines 11 to 34 using plotly.

Finally, we’re opening that page with line 35 (which we’ll open your default browser and display the graphs).

What if we wanted to do this for multiple vms you ask? We would pull all of this in a function:

Python
def get_vm_metrics(client,vm,minutes_ago,sampling_interval,stat_type):
    '''get_vm_metrics function.
       Fetches metrics for a specified vm and generates graphs for that entity.
        Args:
            client: a v4 Python SDK client object.
            vm: a virtual machine name
            minutes_ago: integer indicating the number of minutes to get metrics for (exp: 60 would mean get the metrics for the last hour).
            sampling_interval: integer used to specify in seconds the sampling interval.
            stat_type: The operator to use while performing down-sampling on stats data. Allowed values are SUM, MIN, MAX, AVG, COUNT and LAST.
        Returns:
    '''
    #* fetch vm object to figure out extId
    entity_api = ntnx_vmm_py_client.VmApi(api_client=client)
    query_filter = f"name eq '{vm}'"
    response = entity_api.list_vms(_filter=query_filter)
    vm_uuid = response.data[0].ext_id
    
    #* fetch metrics for vm
    entity_api = ntnx_vmm_py_client.StatsApi(api_client=client)
    start_time = (datetime.datetime.now(datetime.timezone.utc)-datetime.timedelta(minutes=minutes_ago)).isoformat()
    end_time = (datetime.datetime.now(datetime.timezone.utc)).isoformat()
    response = entity_api.get_vm_stats_by_id(vm_uuid, _startTime=start_time, _endTime=end_time, _samplingInterval=sampling_interval, _statType=stat_type, _select='*')
    vm_stats = [stat for stat in response.data.stats if stat.cluster is None]
        
    #* building pandas dataframe from the retrieved data
    data_points = []
    for data_point in vm_stats:
        data_points.append(data_point.to_dict())
    df = pd.DataFrame(data_points)
    df = df.set_index('timestamp')
    df.drop('_reserved', axis=1, inplace=True)
    df.drop('_object_type', axis=1, inplace=True)
    df.drop('_unknown_fields', axis=1, inplace=True)
    df.drop('cluster', axis=1, inplace=True)
    df.drop('hypervisor_type', axis=1, inplace=True)

    #* building graphs
    df = df.dropna(subset=['disk_usage_ppm'])
    df['disk_usage'] = (df['disk_usage_ppm'] / 10000).round(2)
    df = df.dropna(subset=['memory_usage_ppm'])
    df['memory_usage'] = (df['memory_usage_ppm'] / 10000).round(2)
    df = df.dropna(subset=['hypervisor_cpu_usage_ppm'])
    df['hypervisor_cpu_usage'] = (df['hypervisor_cpu_usage_ppm'] / 10000).round(2)
    df = df.dropna(subset=['hypervisor_cpu_ready_time_ppm'])
    df['hypervisor_cpu_ready_time'] = (df['hypervisor_cpu_ready_time_ppm'] / 10000).round(2)

    fig = make_subplots(rows=2, cols=2,
            subplot_titles=(f"{vm} Overview", f"{vm} Storage IOPS", f"{vm} Storage Bandwidth", f"{vm} Storage Latency"),
            x_title="Time")  # Shared x-axis title
    # Subplot 1: Overview
    y_cols1 = ["hypervisor_cpu_usage", "hypervisor_cpu_ready_time", "memory_usage", "disk_usage"]
    for y_col in y_cols1:
        fig.add_trace(go.Scatter(x=df.index, y=df[y_col], hovertemplate="%{x}<br>%%{y}", name=y_col, mode='lines', legendgroup='group1'), row=1, col=1)
    fig.update_yaxes(title_text="% Utilized", range=[0, 100], row=1, col=1)
    # Subplot 2: Storage IOPS
    y_cols2 = ["controller_num_iops", "controller_num_read_iops", "controller_num_write_iops"]
    for y_col in y_cols2:
        fig.add_trace(go.Scatter(x=df.index, y=df[y_col], hovertemplate="%{x}<br>%{y} iops", name=y_col, mode='lines', legendgroup='group2'), row=1, col=2)
    fig.update_yaxes(title_text="IOPS", row=1, col=2)
    # Subplot 3: Storage Bandwidth
    y_cols3 = ["controller_io_bandwidth_kbps", "controller_read_io_bandwidth_kbps", "controller_write_io_bandwidth_kbps"]
    for y_col in y_cols3:
        fig.add_trace(go.Scatter(x=df.index, y=df[y_col], hovertemplate="%{x}<br>%{y} kbps", name=y_col, mode='lines', legendgroup='group3'), row=2, col=1)
    fig.update_yaxes(title_text="Kbps", row=2, col=1)
    # Subplot 4: Storage Latency
    y_cols4 = ["controller_avg_io_latency_micros", "controller_avg_read_io_latency_micros", "controller_avg_write_io_latency_micros"]
    for y_col in y_cols4:
        fig.add_trace(go.Scatter(x=df.index, y=df[y_col], hovertemplate="%{x}<br>%{y} usec", name=y_col, mode='lines', legendgroup='group4'), row=2, col=2)
    fig.update_yaxes(title_text="Microseconds", row=2, col=2)
    fig.update_layout(height=800, legend_title_text="Metric") # Shared legend title
    fig.show()

…and then we would use concurrent to multi-thread the processing like so:

Python
#* initialize variable for API client configuration
api_client_configuration = ntnx_vmm_py_client.Configuration()
api_client_configuration.host = api_server
api_client_configuration.username = username
api_client_configuration.password = secret

if secure is False:
    #! suppress warnings about insecure connections
    urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
    #! suppress ssl certs verification
    api_client_configuration.verify_ssl = False

client = ntnx_vmm_py_client.ApiClient(configuration=api_client_configuration)

with tqdm.tqdm(total=len(vms), desc="Processing VMs") as progress_bar:
    with ThreadPoolExecutor(max_workers=10) as executor:
        futures = [executor.submit(
                get_vm_metrics,
                client=client,
                vm=vm,
                minutes_ago=minutes_ago,
                sampling_interval=sampling_interval,
                stat_type=stat_type
            ) for vm in vms]
        for future in as_completed(futures):
            try:
                entities = future.result()
            except Exception as e:
                print(f"{PrintColors.WARNING}{(datetime.datetime.now()).strftime('%Y-%m-%d %H:%M:%S')} [WARNING] Task failed: {e}{PrintColors.RESET}")
            finally:
                progress_bar.update(1)

In addition to creating graphs, it may be interesting to export the metrics data to csv, so we’ll add the following code to the get_vm_metrics function:

Python
for column in df.columns:
    df[column].to_csv(f"{vm}_{column}.csv", index=True)

That will create a csv file for each metric for each vm being processed.

Time to pull it all together with arguments, including credentials:

Python
if __name__ == '__main__':
    # * parsing script arguments
    parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument("-p", "--prism", help="prism server.")
    parser.add_argument("-u", "--username", default='admin', help="username for prism server.")
    parser.add_argument("-s", "--secure", default=False, action=argparse.BooleanOptionalAction, help="Control SSL certs verification.")
    parser.add_argument("-sh", "--show", action=argparse.BooleanOptionalAction, help="Show available entity types and metrics.")
    parser.add_argument("-g", "--graph", action=argparse.BooleanOptionalAction, help="Indicate you want graphs to be generated. Defaults to True.")
    parser.add_argument("-e", "--export", action=argparse.BooleanOptionalAction, help="Indicate you want csv exports to be generated (1 csv file per metric for each vm). Defaults to False.")
    parser.add_argument("-v", "--vm", type=str, help="Comma separated list of VM names you want to process.")
    parser.add_argument("-c", "--csv", type=str, help="Path and name of csv file with vm names (header: vm_name and then one vm name per line).")
    parser.add_argument("-t", "--time", type=int, default=5, help="Integer used to specify how many minutes ago you want to collect metrics for (defaults to 5 minutes ago).")
    parser.add_argument("-i", "--interval", type=int, default=30, help="Integer used to specify in seconds the sampling interval (defaults to 30 seconds).")
    parser.add_argument("-st", "--stat_type", default="AVG", choices=["AVG","MIN","MAX","LAST","SUM","COUNT"], help="The operator to use while performing down-sampling on stats data. Allowed values are SUM, MIN, MAX, AVG, COUNT and LAST. Defaults to AVG")
    args = parser.parse_args()

    # * check for password (we use keyring python module to access the workstation operating system password store in an "ntnx" section)
    print(f"{PrintColors.OK}{(datetime.datetime.now()).strftime('%Y-%m-%d %H:%M:%S')} [INFO] Trying to retrieve secret for user {args.username} from the password store.{PrintColors.RESET}")
    pwd = keyring.get_password("ntnx",args.username)
    if not pwd:
        try:
            pwd = getpass.getpass()
            keyring.set_password("ntnx",args.username,pwd)
        except Exception as error:
            print(f"{PrintColors.FAIL}{(datetime.datetime.now()).strftime('%Y-%m-%d %H:%M:%S')} [ERROR] {error}.{PrintColors.RESET}")
            exit(1)

    if args.show is True:
        target_vms = None
    elif args.csv:
        data=pd.read_csv(args.csv)
        target_vms = data['vm_name'].tolist()
    elif args.vm:
        target_vms = args.vm.split(',')

    main(api_server=args.prism,username=args.username,secret=pwd,secure=args.secure,show=args.show,vms=target_vms,minutes_ago=args.time,sampling_interval=args.interval,stat_type=args.stat_type,graph=args.graph,csv_export=args.export)

Note that we can now control if graphs and/or csv exports are produced.

The rest of the code has to be modified as well to now work with all those arguments. The final result is available here.

Let’s now have a look at the script executing and the result:

In a future post, I’ll show how to apply all this knowledge to build a custom prometheus node exporter for Nutanix resources.

Using the Nutanix v4 API Python SDK (part 2)

In part 1 of this series, we covered the basics of using the Nutanix API v4 API Python SDK including:

  1. How to decide which module you will require
  2. How to initialize the API client and use functions on objects
  3. How to use pagination with multithreading for maximum efficiency

In part 2, we will cover:

  1. How to start building a reporting script that you will be able to pass arguments to
  2. How to deal securely with credentials in this script using built-in operating systems credentials vaults
  3. How to turn API entities data into dynamic HTML reports and Excel spreadsheets

To illustrate this, we’ll walk you thru building a new script using all those concepts from scratch.

If all you’re interested in is the script, not the knowledge that goes along with it, then so be it. The script is available here.

Building a Python script template that you can pass arguments to

While I don’t pretend to be a Python developer, I’ve now scripted APIs using Python for a number of years and I pretty much always use the same code structure:

  • Python scripts should start with a docstring that documents what they do and how to use them.
  • They should then contain import statements that list modules and functions within modules that the script will require.
  • You then declare classes that the script will use if any. I usually have one for doing pretty colored outputs to stdout.
  • You then declare functions that the script uses and which aren’t available in any of the modules you already imported. This includes the main function of the script.
  • Finally, you have code so that the script itself can be used both as a standalone script or a module. In this section, I usually also deal with arguments and how I pass them to the main function.

Docstring

This is what our script docstring will look like:

Python
""" gets misc entities list from Prism Central using v4 API and python SDK

    Args:
        prism: The IP or FQDN of Prism.
        username: The Prism user name.
        secure: True or False to control SSL certs verification.

    Returns:
        html and excel report files.
"""

Nothing fancy. We document what the script can be used for, what input it requires, and what output it produces.

Import

This is what our import section will look like:

Python
#region #*IMPORT
from concurrent.futures import ThreadPoolExecutor, as_completed

import math
import time
import datetime
import argparse
import getpass

from humanfriendly import format_timespan

import urllib3
import pandas as pd
import datapane
import keyring
import tqdm

import ntnx_vmm_py_client
import ntnx_clustermgmt_py_client
import ntnx_networking_py_client
import ntnx_prism_py_client
import ntnx_iam_py_client
#endregion #*IMPORT

Note that I use #region and #enregion tags to enable easy opening and collapsing of sections in the script when using an IDE like Visual Studio Code (which is what I use). I also use the Better Comments extension in vscode with the #* tags to highlight in green the region name which makes it even easier to navigate code within longer scripts.

At the top of the import region, I put built-in (not requiring any specific installation) Python modules from which I import only specific functions. I then have a block of all the built-in Python modules that I import whole. I then list installed modules from which I import only specific functions, installed modules that I import whole, and finally, the last block lists the Nutanix v4 API SDK modules I import.

Here is a detailed explanation of the full list. We’ll use:

  • concurrent.futures for multi-threaded processing (when we want to retrieve multiple pages of entities from the API at once)
  • math, time and datetime to manipulate numbers and date/time formats for output
  • argparse to deal with passing arguments to the script
  • getpass to capture and encode credentials
  • humanfriendly to make sense of API objects timestamps in a human readable format (nobody, besides the Terminator, knows how many microseconds elapsed since January 1st 1970)
  • urllib3 to disable warnings that pollute our stdout output when unsecure calls are made to the API (not everybody cares to replace SSL certs, even though they should)
  • pandas to easily structure API entities data into objects that can be exported to various formats (like html or excel)
  • datapane to produce the final dynamic (and pretty sexy looking) html report with built-in capabilities such as multi-pages, SQL queries and csv export.
  • keyring to store securely credentials inside guest operating systems built-in vaults (such as keychain on Mac OSX) because nobody likes to type their 20 character password every time they run a script; unless you’re preparing for the keyboard typing olympic games)
  • tqdm to display progress bars (staring at a blinking cursor wondering what is happening ain’t no fun)
  • ntnx_vmm_py_client to retrieve information about virtual machines from the Nutanix API
  • ntnx_clustermgmt_py_client to retrieve information about clusters from the Nutanix API
  • ntnx_networking_py_client to retrieve information about subnets from the Nutanix API
  • ntnx_prism_py_client to retrieve information about categories from the Nutanix API
  • ntnx_iam_py_client to retrieve information about users from the Nutanix API

Now you know. Let’s move on.

Classes

We’ll only use a single custom class whose unique purpose will be to color output to stdout (because the world is grey enough as it is):

Python
#region #*CLASS
class PrintColors:
    """Used for colored output formatting.
    """
    OK = '\033[92m' #GREEN
    SUCCESS = '\033[96m' #CYAN
    DATA = '\033[097m' #WHITE
    WARNING = '\033[93m' #YELLOW
    FAIL = '\033[91m' #RED
    STEP = '\033[95m' #PURPLE
    RESET = '\033[0m' #RESET COLOR
#endregion #*CLASS

Functions

Our script will have two functions:

  1. fetch_entities which we’ll use to get entities from the Nutanix API. This function will be generic, meaning that we’ll be able to use it regardless of the module, entity type and list function we need.
  2. main which will be baking apple pies (just kidding; what do you honestly think main is used for?)

This is what our fetch_entities function looks like:

Python
#region #*FUNCTIONS


def fetch_entities(client,module,entity_api,function,page,limit=50):
    '''fetch_entities function.
        Args:
            client: a v4 Python SDK client object.
            module: name of the v4 Python SDK module to use.
            entity_api: name of the entity API to use.
            function: name of the function to use.
            page: page number to fetch.
            limit: number of entities to fetch.
        Returns:
    '''
    entity_api_module = getattr(module, entity_api)
    entity_api = entity_api_module(api_client=client)
    list_function = getattr(entity_api, function)
    response = list_function(_page=page,_limit=limit)
    return response


#more on main later


#endregion #*FUNCTIONS

Note that our function has a docstring (let’s not be lazy). We use getattr to make it dynamic based on what parameters it is passed. This prevents us from having a specific fetch entities function per API module and entity type. Schwing!

We’ll cover the main function later since that is the bulk of the script. Just be patient. I know, it’s hard.

The unnamed section

This is what the unnamed (whose name shall not be pronounced) looks like:

Python
if __name__ == '__main__':
    # * parsing script arguments
    parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument("-p", "--prism", help="prism server.")
    parser.add_argument("-u", "--username", default='admin', help="username for prism server.")
    parser.add_argument("-s", "--secure", default=False, help="True of False to control SSL certs verification.")
    args = parser.parse_args()

    # * check for password (we use keyring python module to access the workstation operating system password store in an "ntnx" section)
    print(f"{PrintColors.OK}{(datetime.datetime.now()).strftime('%Y-%m-%d %H:%M:%S')} [INFO] Trying to retrieve secret for user {args.username} from the password store.{PrintColors.RESET}")
    pwd = keyring.get_password("ntnx",args.username)
    if not pwd:
        try:
            pwd = getpass.getpass()
            keyring.set_password("ntnx",args.username,pwd)
        except Exception as error:
            print(f"{PrintColors.FAIL}{(datetime.datetime.now()).strftime('%Y-%m-%d %H:%M:%S')} [ERROR] {error}.{PrintColors.RESET}")
            exit(1)
    main(api_server=args.prism,username=args.username,secret=pwd,secure=args.secure)

This enables you to use the script as is or as a module (not that you would particularly want to).

In here we deal with script arguments in the “parsing script arguments” section, with credentials in the “check for password” section and then we call the main function in the main section. Let’s go over each section.

arguments

This section starts by creating an argument parser object called parser:

Python
parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)

We then add arguments to that object:

Python
parser.add_argument("-p", "--prism", help="prism server.")
parser.add_argument("-u", "--username", default='admin', help="username for prism server.")
parser.add_argument("-s", "--secure", default=False, help="True of False to control SSL certs verification.")

Note that with each argument we add, we specify a short tag (exp: -p for prism), a long tag (exp: --prism) and a help message. We could also specify a type, default value (which makes the argument optional) a list of valid choices… Note that username for example will default to admin if nothing is specified and that secure defaults to False. We’re otherwise just keeping it minimal here. If you are interested in all those options, knock yourself out by reading the argparse doc.

Finally, we ask the argparse module to do its job and parse the script command line to extract those arguments and store everything in a variable called args (and no, it’s not agonizing):

Python
args = parser.parse_args()

This makes parsed arguments directly available such as args.prism or args.secure. Schwing!

credentials

This section starts by trying to retrieve the username’s secret from the operating system built-in password vault, in a section called ntnx and store it in a variable called pwd:

Python
pwd = keyring.get_password("ntnx",args.username)

What happens if that password does not exist? Thanks for asking:

Python
if not pwd:
	try:
		pwd = getpass.getpass()
		keyring.set_password("ntnx",args.username,pwd)
	except Exception as error:
		print(f"{PrintColors.FAIL}{(datetime.datetime.now()).strftime('%Y-%m-%d %H:%M:%S')} [ERROR] {error}.{PrintColors.RESET}")
		exit(1)

If the password could not be retrieved (as would happen the first time you run the script), we prompt the user for it using getpass with:

Python
pwd = getpass.getpass()

We then store that secret into the vault for future uses using keyring:

Python
keyring.set_password("ntnx",args.username,pwd)

The rest is boring and dealing with errors.

So, if you’re sharp, you’re thinking: “Wait a minute, what if I ran this script a month ago and since my user password has changed? Then what?

Good thinking Batman. You would have to delete your secret from the vault using (from your OS command line):

Python
keyring del ntnx <username>

You may also be thinking: “Is this really a secure way to deal with credentials?

I’ll let you read the keyring documentation on that topic. All I can say is that retrieving those secrets is tied to the OS user and only works for that user on that machine. In my book, that sure beats storing secrets in clear text in configuration files, script code or environment variables. Thank you very much.

The main course

Our main function, where all the magic happens, starts just like any other function: with a docstring:

Python
def main(api_server,username,secret,secure=False):
    '''main function.
        Args:
            api_server: IP or FQDN of the REST API server.
            username: Username to use for authentication.
            secret: Secret for the username.
            secure: indicates if certs should be verified.
        Returns:
            html and excel report files.
    '''

    start_time = time.time()
    limit=100

Note that we define the parameters that are passed to this main function, which match the way we call it in the unnamed section of our script.

We also capture the function start time (so that we can report at then end how long processing took), and define a limit of 100 which we’ll use when calling the fetch_entities function later so that we retrieve 100 objects at a time instead of the API default of 50.

The rest of the main function will be organized in regions. We’ll have:

  1. one region per entity type we need to retrieve and include in our report,
  2. one region for producing the html output
  3. one region for producing the excel output

getting and processing entities

Let’s look at an entity region:

Python
#region #?clusters
    #* initialize variable for API client configuration
    api_client_configuration = ntnx_clustermgmt_py_client.Configuration()
    api_client_configuration.host = api_server
    api_client_configuration.username = username
    api_client_configuration.password = secret

    if secure is False:
        #! suppress warnings about insecure connections
        urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
        #! suppress ssl certs verification
        api_client_configuration.verify_ssl = False

    #* getting list of clusters
    client = ntnx_clustermgmt_py_client.ApiClient(configuration=api_client_configuration)
    entity_api = ntnx_clustermgmt_py_client.ClustersApi(api_client=client)
    print(f"{PrintColors.OK}{(datetime.datetime.now()).strftime('%Y-%m-%d %H:%M:%S')} [INFO] Fetching Clusters...{PrintColors.RESET}")
    entity_list=[]
    response = entity_api.list_clusters(_page=0,_limit=1)
    total_available_results=response.metadata.total_available_results
    page_count = math.ceil(total_available_results/limit)
    with tqdm.tqdm(total=page_count, desc="Fetching entity pages") as progress_bar:
        with ThreadPoolExecutor(max_workers=10) as executor:
            futures = [executor.submit(
                    fetch_entities,
                    module=ntnx_clustermgmt_py_client,
                    entity_api='ClustersApi',
                    client=client,
                    function='list_clusters',
                    page=page_number,
                    limit=limit
                ) for page_number in range(0, page_count, 1)]
            for future in as_completed(futures):
                try:
                    entities = future.result()
                    entity_list.extend(entities.data)
                except Exception as e:
                    print(f"{PrintColors.WARNING}{(datetime.datetime.now()).strftime('%Y-%m-%d %H:%M:%S')} [WARNING] Task failed: {e}{PrintColors.RESET}")
                finally:
                    progress_bar.update(1)
    cluster_list = entity_list

    #* format output
    print(f"{PrintColors.OK}{(datetime.datetime.now()).strftime('%Y-%m-%d %H:%M:%S')} [INFO] Processing {len(entity_list)} entities...{PrintColors.RESET}")
    cluster_list_output = []
    for entity in cluster_list:
        if 'PRISM_CENTRAL' in entity.config.cluster_function:
            continue
        entity_output = {
            'name': entity.name,
            'ext_id': entity.ext_id,
            'incarnation_id': entity.config.incarnation_id,
            'is_available': entity.config.is_available,
            'operation_mode': entity.config.operation_mode,
            'redundancy_factor': entity.config.redundancy_factor,
            'domain_awareness_level': entity.config.fault_tolerance_state.domain_awareness_level,
            'current_max_fault_tolerance': entity.config.fault_tolerance_state.current_max_fault_tolerance,
            'desired_max_fault_tolerance': entity.config.fault_tolerance_state.desired_max_fault_tolerance,
            'upgrade_status': entity.upgrade_status,
            'vm_count': entity.vm_count,
            'inefficient_vm_count': entity.inefficient_vm_count,
            'cluster_arch': entity.config.cluster_arch,
            'cluster_function': entity.config.cluster_function,
            'hypervisor_types': entity.config.hypervisor_types,
            'is_password_remote_login_enabled': entity.config.is_password_remote_login_enabled,
            'is_remote_support_enabled': entity.config.is_remote_support_enabled,
            'pulse_enabled': entity.config.pulse_status.is_enabled,
            'timezone': entity.config.timezone,
            'ncc_version': next(iter({ software.version for software in entity.config.cluster_software_map if software.software_type == "NCC" })),
            'aos_full_version': entity.config.build_info.full_version,
            'aos_commit_id': entity.config.build_info.short_commit_id,
            'aos_version': entity.config.build_info.version,
            'is_segmentation_enabled': entity.network.backplane.is_segmentation_enabled,
            'external_address_ipv4': entity.network.external_address.ipv4.value,
            'external_data_service_ipv4': entity.network.external_data_service_ip.ipv4.value,
            'external_subnet': entity.network.external_subnet,
            'name_server_ipv4_list': list({ name_server.ipv4.value for name_server in entity.network.name_server_ip_list}),
            'ntp_server_list': "",
            'number_of_nodes': entity.nodes.number_of_nodes,
        }
        if "fqdn" in entity.network.ntp_server_ip_list:
            entity_output['ntp_server_list'] = list({ ntp_server.fqdn.value for ntp_server in entity.network.ntp_server_ip_list})
        elif "ipv4" in entity.network.ntp_server_ip_list:
            entity_output['ntp_server_list'] = list({ ntp_server.ipv4.value for ntp_server in entity.network.ntp_server_ip_list})

        cluster_list_output.append(entity_output)
#endregion #?clusters

An entity type region does:

  1. set up the SDK API client
  2. use multithreading with the fetch_entities function to get all entities
  3. build a variable keeping the object properties we’re interested to include in our final output

1 and 2 were already pretty much covered in part 1 of this series, but let’s go over it again.

To create API client objects, we need to specify a configuration. This is achieved with:

Python
api_client_configuration = ntnx_clustermgmt_py_client.Configuration()
api_client_configuration.host = api_server
api_client_configuration.username = username
api_client_configuration.password = secret

Note how we use parameters passed to the main function which were themselves captured from the parsed script command line to populate values in the API client configuration. api_server is in fact args.prism, username is args.username and secret is pwd, which we retrieved from our credential vault. We decided all of this in our unnamed section with that line we used to call our main function:

Python
main(api_server=args.prism,username=args.username,secret=pwd,secure=args.secure)

We then make sure to suppress annoying stdout warnings when we’re not validating SSL certs with:

Python
if secure is False:
	#! suppress warnings about insecure connections
	urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
	#! suppress ssl certs verification
	api_client_configuration.verify_ssl = False

To fetch entities, the next code section starts by creating an API client with this configuration and then creates an API entity object:

Python
client = ntnx_clustermgmt_py_client.ApiClient(configuration=api_client_configuration)
entity_api = ntnx_clustermgmt_py_client.ClustersApi(api_client=client)

We then initialize a list variable we’ll use to populate results:

Python
entity_list=[]

We then need to retrieve the total number of entities available by making a quick call with a single object:

Python
response = entity_api.list_clusters(_page=0,_limit=1)
total_available_results=response.metadata.total_available_results
page_count = math.ceil(total_available_results/limit)

We now know how many pages of results exist in the API with the limit that we are using and we can use this to start a multi-threaded retrieval that includes a nice progress bar:

Python
with tqdm.tqdm(total=page_count, desc="Fetching entity pages") as progress_bar:
	with ThreadPoolExecutor(max_workers=10) as executor:
		futures = [executor.submit(
				fetch_entities,
				module=ntnx_clustermgmt_py_client,
				entity_api='ClustersApi',
				client=client,
				function='list_clusters',
				page=page_number,
				limit=limit
			) for page_number in range(0, page_count, 1)]
		for future in as_completed(futures):
			try:
				entities = future.result()
				entity_list.extend(entities.data)
			except Exception as e:
				print(f"{PrintColors.WARNING}{(datetime.datetime.now()).strftime('%Y-%m-%d %H:%M:%S')} [WARNING] Task failed: {e}{PrintColors.RESET}")
			finally:
				progress_bar.update(1)

Note that when each call succeeds, we append results to our list variable with:

Python
entity_list.extend(entities.data)

…and we update our progress bar with:

Python
progress_bar.update(1)

After this is all done, we save our results in a variable:

Python
cluster_list = entity_list

Now we need the process this output and keep only what we want to report on. To do this, we start by initializing a list variable with:

Python
cluster_list_output = []

We then have a loop to look at each individual result from our multi-threaded retrieval and extract only the information we need:

Python
for entity in cluster_list:
	if 'PRISM_CENTRAL' in entity.config.cluster_function:
		continue
	entity_output = {
		'name': entity.name,
		'ext_id': entity.ext_id,
		'incarnation_id': entity.config.incarnation_id,
		'is_available': entity.config.is_available,
		'operation_mode': entity.config.operation_mode,
		'redundancy_factor': entity.config.redundancy_factor,
		'domain_awareness_level': entity.config.fault_tolerance_state.domain_awareness_level,
		'current_max_fault_tolerance': entity.config.fault_tolerance_state.current_max_fault_tolerance,
		'desired_max_fault_tolerance': entity.config.fault_tolerance_state.desired_max_fault_tolerance,
		'upgrade_status': entity.upgrade_status,
		'vm_count': entity.vm_count,
		'inefficient_vm_count': entity.inefficient_vm_count,
		'cluster_arch': entity.config.cluster_arch,
		'cluster_function': entity.config.cluster_function,
		'hypervisor_types': entity.config.hypervisor_types,
		'is_password_remote_login_enabled': entity.config.is_password_remote_login_enabled,
		'is_remote_support_enabled': entity.config.is_remote_support_enabled,
		'pulse_enabled': entity.config.pulse_status.is_enabled,
		'timezone': entity.config.timezone,
		'ncc_version': next(iter({ software.version for software in entity.config.cluster_software_map if software.software_type == "NCC" })),
		'aos_full_version': entity.config.build_info.full_version,
		'aos_commit_id': entity.config.build_info.short_commit_id,
		'aos_version': entity.config.build_info.version,
		'is_segmentation_enabled': entity.network.backplane.is_segmentation_enabled,
		'external_address_ipv4': entity.network.external_address.ipv4.value,
		'external_data_service_ipv4': entity.network.external_data_service_ip.ipv4.value,
		'external_subnet': entity.network.external_subnet,
		'name_server_ipv4_list': list({ name_server.ipv4.value for name_server in entity.network.name_server_ip_list}),
		'ntp_server_list': "",
		'number_of_nodes': entity.nodes.number_of_nodes,
	}
	if "fqdn" in entity.network.ntp_server_ip_list:
		entity_output['ntp_server_list'] = list({ ntp_server.fqdn.value for ntp_server in entity.network.ntp_server_ip_list})
	elif "ipv4" in entity.network.ntp_server_ip_list:
		entity_output['ntp_server_list'] = list({ ntp_server.ipv4.value for ntp_server in entity.network.ntp_server_ip_list})

	cluster_list_output.append(entity_output)

Note that:

  1. we skip the result if it points at Prism Central itself (which is just an oddity of the Nutanix cluster API).
  2. to figure out property names on the API returned entity, use the documentation
  3. some properties are lists which can have various shapes of forms and that will required logic to figure out, such as ntp_server_ip_list in the above example (which can contain either IPv4 addresses or fully qualified domain names).

In the end, everything is stored in the cluster_list_output variable which we will use later.

Whether we’re looking at clusters, vms, users, categories, the logic and flow will be the same. Have a look at the entire script to see this.

html report

Here is the region that produces the html report:

Python
#region #?html report
    #* exporting to html
    html_file_name = f"{api_server}_get_pc_report.html"
    print(f"{PrintColors.OK}{(datetime.datetime.now()).strftime('%Y-%m-%d %H:%M:%S')} [INFO] Exporting results to file {html_file_name}.{PrintColors.RESET}")

    vm_df = pd.DataFrame(vm_list_output)
    cluster_df = pd.DataFrame(cluster_list_output)
    host_df = pd.DataFrame(host_list_output)
    storage_container_df = pd.DataFrame(storage_container_list_output)
    subnet_df = pd.DataFrame(subnet_list_output)
    category_df = pd.DataFrame(category_list_output)
    user_df = pd.DataFrame(user_list_output)


    datapane_app = datapane.App(
        datapane.Select(
        datapane.DataTable(vm_df,label="vms"),
        datapane.DataTable(cluster_df,label="clusters"),
        datapane.DataTable(host_df,label="hosts"),
        datapane.DataTable(storage_container_df,label="storage_containers"),
        datapane.DataTable(subnet_df,label="subnets"),
        datapane.DataTable(category_df,label="categories"),
        datapane.DataTable(user_df,label="users"),
        )
    )
    datapane_app.save(html_file_name)
#endregion #?html report

Here we:

  1. decide what we’re going to call our final html report file,
  2. create pandas dataframes with our output variables (because this is what datapane takes as input to create dynamic tables in html format),
  3. use those dataframes to feed a datapane app with multiple pages (one per entity type), then save that app to the html file we defined in step 1.

excel spreadsheet

Here is the region that creates our Excel spreadsheet output:

Python
#region #?excel spreadsheet
    excel_file_name = f"{api_server}_get_pc_report.xlsx"
    print(f"{PrintColors.OK}{(datetime.datetime.now()).strftime('%Y-%m-%d %H:%M:%S')} [INFO] Exporting results to file {excel_file_name}.{PrintColors.RESET}")
    data = {'vms': vm_list_output, 'clusters': cluster_list_output, 'hosts': host_list_output, 'storage_containers': storage_container_list_output, 'subnets': subnet_list_output, 'categories': category_list_output, 'users': user_list_output}

    with pd.ExcelWriter(excel_file_name, engine='xlsxwriter') as writer:
        for sheet_name, df_data in data.items():
            df = pd.DataFrame(df_data)  # Create a DataFrame for each dictionary
            if sheet_name == 'users':
                df['created_time'] = df['created_time'].dt.tz_localize(None)
                df['last_updated_time'] = df['last_updated_time'].dt.tz_localize(None)
                df['last_login_time'] = df['last_login_time'].dt.tz_localize(None)
            df.to_excel(writer, sheet_name=sheet_name, index=False)  # index=False to avoid row numbers
#endregion #?excel spreadsheet

Here we:

  1. define the name of the final Excel spreadsheet file,
  2. define the structure of our spreadsheet (which worksheets our workbook will have, and what they’ll contain),
  3. write the content of the spreadsheet, making sure we transform timestamps into localized data which Excel requires for the users worksheet.

And voilà!

Wait, we finish our script by showing total processing time with:

Python
end_time = time.time()
elapsed_time = end_time - start_time
print(f"{PrintColors.STEP}{(datetime.datetime.now()).strftime('%Y-%m-%d %H:%M:%S')} [SUM] Process completed in {format_timespan(elapsed_time)}{PrintColors.RESET}")

Wrapping up

This is what the execution of the script looks like:

This is what the html report looks like:

get_pc_report html report

This is what the Excel report looks like:

get_pc_report excel report

The entire script can be downloaded here.

Thanks for reading. That’s all folks!

Using the Nutanix v4 API Python SDK (part 1)

Starting with AOS v7, API v4 is GA (generally available) for a multitude of endpoints (888 in total so far which is almost the same number as all previous API versions endpoints cumulated).

One of the new feature of this API is that it comes with SDK (software development kits) for several programming languages, including Python.

Using SDKs means that you do not need to make direct HTTP requests to API endpoints. Instead you can use functions available in the API module directly which facilitate things like authentication or dealing with specific methods available to the API endpoint.

I’ll admit it, I’m lazy, which is why I work in IT.

I don’t like to read thru hundreds of pages of documentation and any shortcut I can take, I will. Just in case you are not like me and for reference, the Nutanix v4 API SDK is documented here: https://developers.nutanix.com/

All API things are documented here: https://nutanix.dev

To get started, the best read available at the moment is this one: https://www.nutanix.dev/nutanix-api-user-guide/

This page also contains important information about the v4 API in general: https://www.nutanix.dev/api-reference-v4/

Now that this is out of the way, let’s review what this specific article series will cover, keeping in mind that its intent is to go straight to the point and get you going with the Python SDK in record time:

  1. the basics of using the v4 API Python SDK (how to install and import modules, how to initiate the first connection, how to use the online documentation)
  2. how to deal with pagination in the API (we might as well get this out of the way quickly as this will be a recurring issue otherwise)
  3. a suggested method for dealing with credentials securely (not directly related to this API but generally useful for working with any API that uses basic authentication in Python)
  4. provide an example of a Prism Central entities reporting script

The intent of all code examples used in this article is to show code that is production ready and can be used in a live customer environment (as opposed to leave the burden of figuring that out on you).

The basics of using the v4 API Python SDK

Picking and installing the right modules

The first thing you will need to do (assuming you already have a functional Python setup) is to install the Python modules for the Nutanix v4 API. Note that this is “modules” (plural), not “module” (singular) because each domain has its own module.

At the time of writing, the available domains are the following (you can click the domain to access the specific documentation page):

domainmoduleinstall commandimport command
AIOpsntnx_aiops_py_clientpip install ntnx-aiops-py-clientimport ntnx_aiops_py_client
Cluster Managementntnx_clustermgmt_py_clientpip install ntnx-clustermgmt-py-clientimport ntnx_clustermgmt_py_client
Data Policiesntnx_datapolicies_py_clientpip install ntnx-datapolicies-py-clientimport ntnx_datapolicies_py_client
Data Protectionntnx_dataprotection_py_clientpip install ntnx-dataprotection-py-clientimport ntnx_dataprotection_py_client
Filesntnx_files_py_clientpip install ntnx-files-py-clientimport ntnx_files_py_client
Identity and Access Managementntnx_iam_py_clientpip install ntnx-iam-py-clientimport ntnx_iam_py_client
Licensingntnx_licensing_py_clientpip install ntnx-licensing-py-clientimport ntnx_licensing_py_client
Life Cycle Managementntnx_lifecycle_py_clientpip install ntnx-lifecycle-py-clientimport ntnx_lifecycle_py_client
Flow Managementntnx_microseg_py_clientpip install ntnx-microseg-py-clientimport ntnx_microseg_py_client
Monitoringntnx_monitoring_py_clientpip install ntnx-monitoring-py-clientimport ntnx_monitoring_py_client
Networkingntnx_networking_py_clientpip install ntnx-networking-py-clientimport ntnx_networking_py_client
Object Storage Managementntnx_objects_py_clientpip install ntnx-objects-py-clientimport ntnx_objects_py_client
NCM Operation Base Platformntnx_opsmgmt_py_clientpip install ntnx-opsmgmt-py-clientimport ntnx_opsmgmt_py_client
Prismntnx_prism_py_clientpip install ntnx-prism-py-clientimport ntnx_prism_py_client
Security1ntnx_security_py_clientpip install ntnx-security-py-clientimport ntnx_security_py_client
Virtual Machine Managementntnx_vmm_py_clientpip install ntnx-vmm-py-clientimport ntnx_vmm_py_client
Volumesntnx_volumes_py_clientpip install ntnx-volumes-py-clientimport ntnx_volumes_py_client

So at this point you’re probably asking yourself a number of questions:

  1. This is fine and dandy, but how the hell am I supposed to know what those domains mean and which I will need for my specific use case?
  2. Why the hell aren’t all those domains regrouped in a single module?
  3. And why the hell would you have a different naming convention for the module name between the pip install and import commands?” (notice how one is using dashes while the other is using underscores)

I’m guessing the answer to that last one has to do with some weird Python semantic but I honestly haven’t dug around to find out.

About using different modules for each domain, I agree this is a bit of a pain but at the same time I figure that this will enable different domains to update and evolve their APIs separately and will benefit us grumpy API users in the long term.

As for the first question, I have a couple of things to help you figure that out. The first one is this table that shows you the description for each domain (keeping only the essential parts of those description):

domaindescription
AIOps“… features such as Analysis, Reporting, Capacity Planning, What if Analysis, VM Rightsizing, Troubleshooting, App Discovery, Broad Observability, and Ops Automation through Playbooks.
Note that this is also where you get your stats/metrics for entities.
I also crossed out reporting because the reports apis are actually in another domain (NCM Operations Base Platform).
Cluster Management“… manage Hosts, Clusters and other Infrastructure.
Data Policies“… manage Policies for Disaster Recovery and Storage.
Data Protection“… business Continuity with full spectrum of Disaster Recovery and Backup solution. Spanning across Single PC, Cross AZ, MultiSite. Configuration of Recovery points, Protection policies, Recovery Plans. Execution and monitoring of back up and recovery orchestrations on OnPrem as well as Cloud.
Files“… manage virtual file servers, create and configure shares for client access, protect them using DR and sync policies, provision storage space and administer security controls.
Identity and Access Management“… for managing users, user-groups, directory services, identity providers, roles and authorization policies.
Yes, this is where RBAC is.
LicensingNo useful description in the documentation, but this helps you manage licenses (shocking).
Life Cycle Management“… manage Infrastructure, Software and Firmware Upgrades.
While that description is not tremendously useful, for those of you familiar with the Nutanix platform, this is the LCM API (LCM being the framework used to do firmware and software updates on the Nutanix platform).
Flow Management“… manage Network Security Policy configuration of Nutanix clusters.
Translation: this is for Flow Network Security microsegmentation policies.
Monitoring“… manage Alerts, Alert policies, Events and Audits.
Networking“… manage networking configuration on Nutanix clusters, including AHV and advanced networking.
Note that “advanced networking” means Flow Virtual Networking (VPCs and such).
Object Storage Management“… manage Petabytes of Unstructured and Machine-generated data using a software-defined Object Store Service.
Translation: this is the Nutanix Objects API.
NCM Operation Base Platform“… provide functionalities that are common to APIs in namespaces aiops, devops, secops, finops.
This has reports APIs (while aiops does not, despite its description).
Prism“… manage Tasks, Category Associations and Submit Batch Operations.
Note that this also contains categories APIs.
Security1“… manage security features, such as encryption, certificates, or platform hardening.
No, this isn’t where RBAC is. You’ll need Identity and Access Management for RBAC (and use authorization policies).
Virtual Machine Management“… manage the life-cycle of virtual machines hosted on Nutanix.
Translation: where all VM things are (almost).
Volumes“… configure volumes.
Translation: Nutanix Volumes APIs (which are used to create and manage volume groups which are used to provide access to Nutanix storage using the iscsi protocol).

The second is the following spreadsheet which is a complete inventory of all public Nutanix v4 API endpoints. You can search it to help you hopefully identify which domain/namespace has the endpoint you need for your use case:

You’re welcome.

So now that you know which domains and modules you need, that you know how to install them and how to import them in your script, how the heck do they actually work?

Understanding how modules work and establishing a first connection

The steps to use a Python SDK module for the Nutanix v4 API are:

  1. Import the module
  2. Create an API client configuration object and set key attributes
  3. Use the API client configuration to create an API client object
  4. Create an instance of the API client by connecting it to a specific submodule
  5. Use the API client instance to make a call to a specific API endpoint and get or post data

The step that was most confusing for me was step 4: why do I have to create yet another object since I already have a connected API client? This is because each SDK module has submodules (endpoints).

For example, ntnx_clustermgmt_py_client has six submodules/endpoints: cluster_profiles_api, clusters_api, disks_api, pcie_devices_api, storage_containers_api, and vcenter_extensions_api.

Each of those submodules/endpoints has specific methods. You can view all the submodules/endpoints in the spreadsheet where “namespace” is the domain/module and endpoint_name is the submodule/endpoint.

For convenience, here is a summary table of all modules and their submodules/endpoints (with the name you would use invoking them in Python):

modulesubmodules/endpoints
aiopsScenariosApi, StatsApi
clustermgmtClusterProfilesApi, ClustersApi, DisksApi, PcieDevicesApi, StorageContainersApi, VcenterExtensionsApi
datapoliciesProtectionPoliciesApi
dataprotectionProtectedResourcesApi, RecoveryPointsApi
filesAnalyticsApi, AntivirusServerApi, DnsApi, FileServersApi, InfectedFilesApi, MountTargetsApi, NotificationPoliciesApi, PartnerServersApi, QuotaPoliciesApi, RansomwareConfigsApi, RecommendationsApi, ReplicationJobsApi, RepolicationPoliciesApi, SnapshoptChangedContentsApi, SnapshotSchedulesApi, SnapshotsApi, TierApi, UnifiedNamespacesApi, UserMappingsApi, VirusScanPoliciesApi
iamAuthorizationPoliciesApi, CertificateAuthenticationProvidersApi, ClientsApi, DirectoryServicesApi, EntitiesApi, OperationsApi, RolesApi, SAMLIdentityProvidersApi, UserGroupsApi, UsersApi
licensingEndUserLicenseAgreementApi, LicenseKeysApi, LicensesApi
lifecycleBundlesApi, ConfigApi, EntitiesApi, ImagesApi, InventoryApi, LcmSummariesApi, NotificationsApi, PrechecksApi, RecommendationsApi, StatusApi, UpgradesApi
microsegAdressGroupsApi, DirectoryServerConfigsApi, NetworkSecurityPoliciesApi, ServiceGroupsApi
monitoringAlertEmailConfigurationApi, AlertsApi, AuditsApi, ClusterLogsApi, EventsApi, ManageAlertsApi, SystemDefinedPoliciesApi, UserDefinedPoliciesApi
networkingAwsSubnetsApi, AwsVpcsApi, BgpRoutesApi, BgpSessionsApi, BridgesApi, ClusterCapabilitiesApi, FloatingIpsApi, GatewaysApi, IPFIXExportersApi, Layer2StretchStatsApi, Layer2StretchesApi, LoadBalancerSessionStatsApi, LoadBalancerSessionsApi, MacAdressesApi, NetworkControllersApi, RemoteEntitiesApi, RouteTablesApi, RoutesApi, RoutingPoliciesApi, RoutingPolicyStatsApi, SubnetIPReservationApi, SubnetMigrationsApi, SubnetsApi, TrafficMirrorStatsApi, TrafficMirrorsApi, UplinkBondsApi, VirtualSwitchNodesInfoApi, VirtualSwitchesApi, VpcNsStatsApi, VpcVirtualSwitchMappingsApi, VpcsApi, VpnConnectionStatsApi, VpnConnectionsApi
objectsObjectStoresApi
opsmgmtGlobalReportSettingApi, ReportArtifactsApi, ReportConfigApi, ReportsApi
prismBatchesApi, CategoriesApi, DomainManagerApi, DomainManagerBackupsApi, TasksApi
securityApprovalPoliciesApi, STIGsApi
vmmEsxiStatsApi, EsxiVmApi, ImagePlacementPoliciesApi, ImageRateLimitPoliciesApi, ImagesApi, StatsApi, TemplatesApi, VmApi
volumesIscsiClientsApi, VolumeGroupsApi

So, back to our 4 steps, this time with code examples to bring it all together (this example will get the first page of virtual machines):

Python
# step 1 of 5: import the module
import ntnx_vmm_py_client

# step 2 of 5: create an API client configuration object and set key attributes
api_client_configuration = ntnx_vmm_py_client.Configuration()
api_client_configuration.host = api_server 
api_client_configuration.username = username
api_client_configuration.password = secret
api_client_configuration.verify_ssl = False

# step 3 of 5: use the API client configuration to create an API client object
api_client = ntnx_vmm_py_client.ApiClient(configuration=api_client_configuration)

# step 4 of 5: create an instance of the API client by connecting it to a specific submodule
api_instance_vm = ntnx_vmm_py_client.api.VmApi(api_client=api_client)

# step 5 of 5: use the API client instance to make a call to a specific API endpoint and get or post data
vm_list = api_instance_vm.list_vms()

And now we have a dictionary (vm_list) which contains the list of vm entities available on the first page with all their attributes.

So how do you know what the response looks like? How do you know which functions (other than list_vms) are available for that submodule/endpoint? That’s what the documentation is for, and here is how you use that documentation:

  1. First, you connect to the documentation page: https://developers.nutanix.com/
  2. You select the domain/module under “SDK Reference” at the top of the page and the language (exp: Virtual Machine Management > Python)
  3. You then select “SDK Reference” and “api_package”
  4. You see all the functions available for each endpoint. Select the function you want (exp: VmApi.list_vms()) to use to see which parameters you can use with each function (for list_vms, you’ll see that we can specify the page number, filter, etc…).
  5. Under the “Returns:” section, click the class name (exp: ListVmsApiResponse). Pretty much all functions will have a data and a metadata section in their response.
  6. Under “data” click on the object type (exp: Vm) and you will see a list of all the properties which are part of that object. Note that sometimes, other objects are in the list of properties (exp: cluster in the vm object) and you will need to drill down to see what properties are part of that other object.

If you need to understand what other attributes can be configured in the API client configuration, click on the “Configuration” section for that module. The “Examples” section can also help you figure out how to use some of those functions.

Speaking of which, and back to our list vms example, I’m sure you noticed that I mentioned this is only returning the first page of entities, so you may be wondering how to get all entities. To do this, we have to deal with pagination which is our next topic.

Dealing with pagination

By default, the v4 API will return 50 entities in its data section. If you want to get more, you’ll have to either increase the limit of your initial request, or you’ll have to request the next page until the last page.

Increasing the limit of the initial request has its limitations (no pun intended) which may vary from endpoint to endpoint so it is not a reliable way to make sure you get all the entities you need. If you’re trying to reduce the amount of data and you already the name or some other attributes that can be filtered on, then filtering is the way to go which is well explained in this blog.

In case you need to increase the limit though, this is simply done in the request URL like so:

HTML
https://{{pc_ip}}:9440/api/vmm/v4.0/ahv/config/vms?$page=0&$limit=100

Note that we also specify the page number (which starts at 0) in the request URL so at this point you’re probably wondering “how do I know there is another page, and how do I know what’s the last page?

The metadata section of a response contains an attribute called “totalAvailableResults” that helps you figure out how many entities are available in total. In addition, that medata section also contains a “links” section which is a list containing 3 objects: the href of the first page, the href of the current page, and the href of the last page of entities.

Working thru pagination is therefore a matter of starting at page 0, then moving on to the next page until the href of the current page is the same as the href of the last page.

Of course, doing it sequentially would be very inefficient, especially if you have hundreds of pages of data.

Using Python, we can use the concurrent module to multi-thread the multiple pages retrieval process and cut down fetching times from minutes to seconds. In addition, we’ll use the tqdm module to display a progress bar because if you are impatient like me you’ll want to know what the hell your script is doing.

The first step will be to figure out how many pages of entities there are. To do this, we start by setting up the api client and its instance, then make a call limited to a single entity (so it’s fast) and then we retrieve the total number of available entities from the metadata section of the response:

Python
import math

client = ntnx_vmm_py_client.ApiClient(configuration=api_client_configuration)
entity_api = ntnx_vmm_py_client.VmApi(api_client=client)
LIMIT = 100

response = entity_api.list_vms(_page=0,_limit=1)

total_available_results=response.metadata.total_available_results
page_count = math.ceil(total_available_results/LIMIT)

Note that we are figuring out the page count by using the math.ceil function on the total available results count divided by the limit.

Now that we know how many pages we have, we’ll use tqdm and concurrent to call a function to retrieve all those pages in parallel (with a controlled number of workers):

Python
from concurrent.futures import ThreadPoolExecutor, as_completed
import tqdm

def fetch_entities(client,module,entity_api,function,page,limit):
    entity_api_module = getattr(module, entity_api)
    entity_api = entity_api_module(api_client=client)
    list_function = getattr(entity_api, function)
    response = list_function(_page=page,_limit=limit)
    return response

entity_list=[]
LIMIT = 100

with tqdm.tqdm(total=page_count, desc="Fetching entity pages") as progress_bar:    
    with ThreadPoolExecutor(max_workers=10) as executor:
        futures = [executor.submit(
                fetch_entities,
                module=ntnx_vmm_py_client,
                entity_api='VmApi',
                client=client,
                function='list_vms',
                page=page_number,
                limit=LIMIT
            ) for page_number in range(0, page_count, 1)]
        for future in as_completed(futures):
            try:
                entities = future.result()
                entity_list.extend(entities.data)
            except Exception as e:
                print(f"{(datetime.datetime.now()).strftime('%Y-%m-%d %H:%M:%S')} [WARNING] Task failed: {e}")
            finally:
                progress_bar.update(1)

So, here we have a generic function to retrieve entities to which you can pass a module name, API instance, API client and function (so it works here for vms but could work for any type of entity).

We then have a loop that uses the page count to start multiple threads (up to 10 as defined in max_workers) to retrieve different pages. The results of those queries are stored in a variable called future, specifically in its result function.

With each page retrieved, we extend a list variable called entity_list with the retrieved data. We also update the progress bar for those impatient users (myself included).

Easy peazy and super speedy.

Remember that for an API instance object, you can figure out the retrieve function name from the documentation (and if you do not remember this, read the previous section of this blog again).

With this process, if we wanted to retrieve the list of all clusters entities we would use:

Python
client = ntnx_clustermgmt_py_client.ApiClient(configuration=api_client_configuration)

entity_api = ntnx_clustermgmt_py_client.ClustersApi(api_client=client)
entity_list=[]
LIMIT = 100

response = entity_api.list_clusters(_page=0,_limit=1)

total_available_results=response.metadata.total_available_results<br>page_count = math.ceil(total_available_results/LIMIT)
with tqdm.tqdm(total=page_count, desc="Fetching entity pages") as progress_bar:
        with ThreadPoolExecutor(max_workers=10) as executor:
                futures = [executor.submit(
                        fetch_entities,
                        module=ntnx_clustermgmt_py_client,
                        entity_api='ClustersApi',
                        client=client,
                        function='list_clusters',
                        page=page_number,
                        limit=LIMIT
                    ) for page_number in range(0, page_count, 1)]
                for future in as_completed(futures):
                    try:
                        entities = future.result()
                        entity_list.extend(entities.data)
                    except Exception as e:
                        print(f"{(datetime.datetime.now()).strftime('%Y-%m-%d %H:%M:%S')} [WARNING] Task failed: {e}")
                    finally:
                        progress_bar.update(1)

Note that this is not using the same API instance object and that the function name has changed from list_vms to list_clusters.

In part 2 of this series, we will cover how to deal with arguments and credentials securely and examine a complete reporting scripts that uses the v4 API to produce an HTML inventory of a number of entity types available in Prism Central.

As a spoiler, here is a demo of the entire script running inside a terminal:

  1. Not GA yet, still in beta ↩︎
  2. Not GA yet, still in beta ↩︎