Note:
- This tutorial requires access to Oracle Cloud. To sign up for a free account, see Get started with Oracle Cloud Infrastructure Free Tier.
- It uses example values for Oracle Cloud Infrastructure credentials, tenancy, and compartments. When completing your lab, substitute these values with ones specific to your cloud environment.
Manage VM disk utilization using Stack Monitoring
Introduction
One of the major responsibilities of a cloud administrator is to manage and monitor the resources deployed in the organization environments. For all kind of workloads, it is necessary to monitor the resources health and changes from time to time and ensure all workloads are in green status - healthy and running.
In compute instances, storage plays a vital role in terms of application stability, running and storing the application data, users in database files, and so on. It is important for an administrator to monitor and manage the storage of each volume or disk attached to the compute instance to mitigate application crash, downtime and data loss of the users and organizations.
Stack Monitoring is one of the services offered for monitoring compute instances in Oracle Cloud Infrastructure (OCI) under the Observability and Management for the infrastructure deployed under the OCI tenancy. Stack Monitoring allows not only the storage of each disk, but also the filesystems created on each volume attached to the compute instances. Additional to storage, stack monitoring provides metrics for Availability, CPU, Memory, and Disk activity + Paging of the compute instances.
Stack Monitoring features
Stack Monitoring provides the tools to quickly alert and identify problems such as a filesystem running out of disk space or high CPU utilization. Stack Monitoring provides greater visibility into the health and availability status of compute instances. It also provides a compiled view dashboard of all the resources being monitored. With Stack Monitoring, you can:
- Quickly identify performance issues such as Memory, CPU or Disks filesystems over-utilization using the graphical Charts and Tables.
- Easily review filesystem utilization across all mount points using Tables. Provides a detailed view of any open alarms for the resources being monitored.
- Get quick access to critical information such as the OS version, a summary of open alarms by severity, and details on the date and time of the last status change for the host.
- View all dimensions and values of a given metric from Stack Monitoring tables. This information is helpful for reviewing the Filesystem Usage in GB’s for all filesystems on a host without any need to log on to the host.
- Understand disk activity for an indicator to how busy a host is. Disk Activity Summary reports the number of read, write, and total operations per second for all disks on a host.
Objective
Learn how to manage VM disk utilization using Stack Monitoring.
Prerequisites
-
Create or designate a compartment to use:
You can create a new compartment or use an existing one to install and configure the Stack Monitoring service. For information about compartments, see Managing Compartments.
When designating a monitoring compartment, make sure you use the same compartment for your agents as well as native OCI resources (for example, Databases, Compute Instances). If the resources are in different compartments, then move the resources to the same compartment of the stack monitoring agents.
-
Create a Dynamic Group of all Management Agents:
To interact with the OCI service endpoints, you must explicitly create a dynamic group to allow Management Agents to communicate with the Management Agent service (MACS). To create dynamic group, follow the steps:
-
Under Identity & Security, go to Identity and click Dynamic Groups.
-
Click Create Dynamic Group.
-
In the Create Dynamic Group dialog box, enter a name for the dynamic group, a description and the matching rules, and then click Create Dynamic Group.
For example, you create a dynamic group named “Demo_DyanmicGroup_For_MonitoringAgent” with the following details under RULE 1:
ALL {resource.type='managementagent', resource.compartment.id='ocid1.compartment.oc1.examplecompartmentid'}
-
-
Create policy on the Dynamic Group:
You need to create the following policies to allow the Management Agents to interact with the Management Agent service and to allow the Management Agents to upload data.
-
Policy 1:
ALLOW DYNAMIC-GROUP <Demo_DynamicGroup_For_MonitoringAgent >TO USE METRICS IN COMPARTMENT <compartment_name> where target.metrics.namespace = 'oracle_appmgmt'
Description: Allow the agent to upload metrics to Telemetry into ‘oracle_appmgmt’ namespace. Here, the Management_Agent_Dynamic_Group is a dynamic group of management agents in a compartment.
-
Policy 2:
ALLOW DYNAMIC-GROUP <Demo_DynamicGroup_For_MonitoringAgent> TO {STACK_MONITORING_DISCOVERY_JOB_RESULT_SUBMIT} IN COMPARTMENT <compartment_name>
`Description: Allow the agent to upload data to the discovery service. Here, the Management_Agent_Dynamic_Group is a dynamic group of management agents in a compartment.
Note: You can skip Prerequisites 4 and Prerequisites 5 if you have administrator privileges on tenancy. Below steps are to be followed if admin wants to create users to manage only stack monitoring specifically.
-
-
Create users and groups:
As a best practice create separate users and groups to manage the stack monitoring related operations. However, the administrator group and its users will have the required privileges by default. Stack Monitoring users and groups are created using the Identity and Access Management (IAM) service from Oracle Cloud Infrastructure. For information about creating and managing users and groups using the Identity and Access Management (IAM) service, see Managing Users and Managing Groups. Create the following user groups that are needed for the Stack Monitoring.
Group Description StackMonitoringAdminGrp Group for Users that perform admin/operator related operations. -
Create required policies:
Stack Monitoring policies are created using the Identity and Access Management (IAM) policies. This tutorial provides specific examples to configure your tenancy to leverage Stack Monitoring. For general information regarding OCI policies, see Getting Started with Policies.
Create Policies for Administrative Operations
The following is the list of policies to be defined to allow the users that can perform administration operations, i.e., the users that belong to the
StackMonitoringAdminGrp
group.-
Policy 1:
ALLOW GROUP StackMonitoringAdminGrp TO MANAGE stack-monitoring-family IN COMPARTMENT <compartment_name>
Description : Allow users in the
StackMonitoringAdminGrp
group to do admin operations in a compartment. -
Policy 2:
ALLOW GROUP StackMonitoringAdminGrp TO {MGMT_AGENT_DEPLOY_PLUGIN_CREATE, MGMT_AGENT_INSPECT, MGMT_AGENT_READ} IN COMPARTMENT <compartment_name>
Description: Allow users in the
StackMonitoringAdminGrp
group to list/read agents and deploy Stack Monitoring Management Agent plug-in during resource discovery when Management Agent does not have the plug-in yet in the scope of the compartment. -
Policy 3:
ALLOW GROUP StackMonitoringAdminGrp TO READ metrics IN COMPARTMENT <compartment_name>
Description: Allow users in the
StackMonitoringAdminGrp
group to read metrics in a compartment. -
Policy 4:
ALLOW GROUP StackMonitoringAdminGrp to READ instances IN COMPARTMENT<compartment_name>
Description: Allow users in the
StackMonitoringAdminGrp
group to read instances in a compartment. -
Policy 5:
ALLOW GROUP StackMonitoringAdminGrp to MANAGE external-database-family IN COMPARTMENT <compartment_name>
Description: Allow users in the
StackMonitoringAdminGrp
group to manage external databases in a compartment. -
Policy 6:
ALLOW GROUP StackMonitoringAdminGrp to MANAGE alarms IN COMPARTMENT<compartment_name>
Description: Allow users in the
StackMonitoringAdminGrp
group to manage alarms in a compartment. -
Policy 7:
ALLOW GROUP StackMonitoringAdminGrp to USE ons-topics IN COMPARTMENT <compartment_name>
Description: Allow users in the
StackMonitoringAdminGrp
to list, create, update, delete, and move subscriptions for topics in the tenancy.
-
Task 1: Install Management Agents
You must install the Management Agent plugin as it is required for using the Stack Monitoring service.
-
Log in to the Oracle Cloud Console, from the navigation menu, click Compute, and then click Instances.
-
Click the instance that you’re interested in.
-
Click the Oracle Cloud Agent tab. The list of plugins is displayed.
-
Toggle the Enabled switch for the Management Agent plugin.
After the agent is enabled, it will take around 10-15 minutes of time to change to the running status.
Task 2: Verify the Monitoring Agent ID
-
Log in to the Oracle Cloud Console, from the navigation menu, click Observability & Management.
-
Select Management Agent Service.
-
Select the compartment from the Compartment drop-down list and you will see the list of agents, which are enabled for their respective compute instances.
-
Select the agent for compute instance associated with which you want to enable stack monitoring for and note the “Agent Id” of monitoring for the future use.
Task 3: Run the Stack Monitoring Discovery job
The final step is to run the discovery job to by promoting OCI Compute Instances. You can monitor a compute instance more thoroughly and get insights into the resources that are being used by it. For this step you require to open a cloud shell in the OCI portal.
-
Click on the OCI Cloud Shell from the home page and the following window displays. Additionally you can also perform the step on a device with OCI CLI Installed on.
-
Verify the OCI CLI version using the command
oci –version
. Stack Monitoring commands will run on OCI CLI 3.XX or higher. -
Create a JSON file named
parameters.json
in the cloud shell under any directory. Copy the JSON provided below and replace the “", " ", " " with your respective parameters. The details of the parameters to be changed is provided in the table “JSON Input Parameters”: { "discoveryType": "ADD", "discoveryClient": "host-discovery", "compartmentId": "<Compartment-ID>", "discoveryDetails": { "agentId": "<Agent-ID>", "resourceType": "HOST", "resourceName": "<HostName or IP Address>", "properties": { "propertiesMap": {} } } }
JSON Input Parameters
Input Field Description compartmentId Compartment OCID where the compute instance resides. agentId The OCID of the management agent monitoring the resource. resourceName The fully qualified domain name (FQDN) of the host within Stack Monitoring. -
After saving the above JSON file execute the following command by replacing the
<path_to_JSON_file>
parameter with the actual path of the JSON file which was created in the previous step. You can usepwd
command to get the JSON file path.oci stack-monitoring discovery-job create --compartment-id "your-compartment-id" --from-json file://< path_to_JSON_file>
-
The process may take up to 5-10 min to complete and once you refresh the OCI portal, you will be able to see the Stack Monitoring enabled and the a complete dashboard on the Stack Monitoring page. After promotion, the resource type of the compute instance is a Host.
-
Check the status of the Promotion Job under Resource Discovery to verify the success of discovering the resources.
-
On the Stack Monitoring Dashboard, select the Resource block and you will be presented with a list of compute instances and hosts which you enabled the monitoring for.
-
Select the desired host from the list and you will be presented with the detailed view of the metrics and tables displayed for that particular host.
Host information and metrics are displayed as charts and tables on the resource details page.
You can choose the Filesystem Used (GBs) and Filesystem Utilization(%) to get more specific information about the storage present in the host. The table view provides all the metrics in a table style/format.
Once you select the specific information of instances, it will be displayed in both percentage and storage in GB. Each of the filesystem presented in the host machine will be represented with the mount points in table format.
Task 4: Enable Alarms
-
Once the metrics are displayed in the Resources Details page for the hosts, to get an alert for specific metrics, navigate to Observability & Management, Monitoring, Alarm Status.
-
Click Create Alarm, then type in the alarm’s name and severity. Select the host compartment, namespace “oracle appmgmt, resource group host, and you can select the metric name from the drop-down list.
-
Based on the requirement, choose the Filesystem Used or Filesystem Utilization metric, interval period, and statistic.
-
After choosing the metrics, choose the metric dimension.
• To specify a specific host, choose “agentHostName” and then choose the dimension value for the host from the drop-down menu.
• Next, click Additional dimension, choose “fileSystemName,” and then choose the dimension value for the mount point you wish to trigger an alert for from the list.
-
In the section Trigger rule, set the condition for putting the alarm in the firing state. For example, for agentHostName – “XXXX” and “fileSystemName” – “/Dev’ with statistics - “mean”. If the filesystem storage reaches more than 90%, trigger an alert for “storage full”.
-
Select the Destination service - Notification and the Topic in which you have specified the Email address you wish to receive alerts for this specific alarm in the Define alarm notifications section. For more information on Notification and creation of Topic click here.
-
Additionally, you can also choose the Message Format to get better alert formatting. Select the Repeat notification check box if you want to receive continuous alerts for the same metrics.
-
Save the alarm.
-
Once the defined rule matches the metric, alarm will be triggered and Below is the example screenshot of a metric alarm:
You can utilize the advanced metrics provided by OCI Stack Monitoring and Alarms capabilities to monitor your computing infrastructure, particularly in the storage sector. You can also explore more features of stack monitoring for different services.
Related Links
Acknowledgements
- Authors - Akarsha I K (Cloud Architect), Maninder Flora (Cloud Architect)
More Learning Resources
Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.
For product documentation, visit Oracle Help Center.
Manage VM disk utilization using Stack Monitoring
F75078-01
December 2022
Copyright © 2022, Oracle and/or its affiliates.