Monitoring System Capacity

It is important to track the key metrics that determine Private Cloud Appliance capacity to host your compute instances and the storage they use. Administrators have direct access to the current consumption of CPU, memory, and storage space.

The detailed data for compute node load and storage usage can be found in the Grafana dashboards. This topic explains how to access the most critical metrics directly from the Service Enclave.

Viewing CPU and Memory Usage By Fault Domain

These procedures display the number of compute nodes, the amount of total memory and free memory, and the number of total and free virtual CPUs for each fault domain.

The UNASSIGNED row refers to compute nodes that are not currently assigned to a fault domain. Because these compute nodes do not belong to a fault domain, their memory and CPU usage in a fault domain is zero.

To display this information and more for an individual compute node, select PCA Config > Rack Units from the navigation menu, or select the Rack Units tile on the Dashboard, and then click the name of a compute node in the list.

Using the Service Web UI
  1. In the navigation menu, select PCA Config > Fault Domains.

  2. Click the name of a fault domain to see the information for only that fault domain.

Using the Service CLI

Enter the getFaultDomainInfo command.

PCA-ADMIN> getFaultDomainInfo
Data:
  id           totalCNs   totalMemory   freeMemory   totalvCPUs   freevCPUs
  --           --------   -----------   ----------   ----------   ---------
  UNASSIGNED   1          0.0           0.0          0            0
  FD1          2          1072.0        976.0        176          164
  FD2          1          984.0         984.0        120          120
  FD3          1          984.0         984.0        120          120

The Notes column is omitted from the preceding example.

Viewing Disk Space Usage on the ZFS Storage Appliance

The Service Enclave runs a storage monitoring tool called ZFS pool manager, which polls the ZFS Storage Appliance every 60 seconds. Using the Service CLI, you can display current information about the usage of available disk space in each ZFS pool. You can also set the usage threshold that triggers a fault when the threshold is exceeded.

Checking the Storage Status of ZFS Pools

List ZFS pools.

PCA-ADMIN> list ZfsPool
Data:
  id                                     name
  --                                     ----
  e898b147-7cf0-4bd0-8b54-e32ec83d04cb   PCA_POOL
  c2f67943-df81-47a5-9713-06768318b623   PCA_POOL_HIGH

In a standard storage configuration, you only have one pool. If your system includes high-performance disk trays, then you can view usage information for each pool separately.

PCA-ADMIN> show ZfsPool id=e898b147-7cf0-4bd0-8b54-e32ec83d04cb
Data:
  Id = e898b147-7cf0-4bd0-8b54-e32ec83d04cb
  Type = ZfsPool
  Pool Status = Online
  Free Pool = 44879343128576
  Total Pool = 70506183131136
  Pool Usage Percent = 0.3634693989163486
  Name = PCA_POOL
  Work State = Normal
Configuring the Fault Threshold of the ZFS Pool Manager

By default, the fault threshold is set to 80 percent full: usage percentage 0.8.

PCA-ADMIN> show ZfsPoolManager
Data:
  Id = a6ca861b-f83a-4032-91c5-bc506394d0de
  Type = ZfsPoolManager
  LastRunTime = 2022-10-09 12:17:52,964 UTC
  Poll Interval (sec) = 60
  The minimum Zfs pool usage percentage to trigger a major fault = 0.8
  Manager's run state = Running

The following example sets the fault threshold to 75 percent full: usageMajorFaultPercent=0.75.

PCA-ADMIN> edit ZfsPoolManager usageMajorFaultPercent=0.75
JobId: 67cfe180-f2a2-4d59-a676-01b3d73cffae