Note:

Automate OCI Stack Monitoring deployment for OCI Compute Instances

Introduction

Stack Monitoring is one of the services offered for monitoring compute instances in Oracle Cloud Infrastructure (OCI) under the Observability and Management for the infrastructure deployed under the OCI tenancy. Stack Monitoring allows not only the storage of each disk, but also the filesystems created on each volume attached to the compute instances. Additional to storage, stack monitoring provides metrics for Availability, CPU, Memory, and Disk activity + Paging of the compute instances.

In our previous tutorial: Manage VM disk utilization using Stack Monitoring, we discussed monitoring storage filesystems of compute instances using OCI Stack Monitoring service where we manually enabled the Monitoring agent for each instance and then ran a discovery job for a single instance using a JSON file.

However, when it comes to a real production environment there are thousands of machines present and we can automate OCI Stack Monitoring deployment for OCI Compute Instances.

Objective

Automate the complete process of Stack Monitoring installation and deployment using custom scripts.

Prerequisites

  1. If you have already implemented the solution present in this tutorial then you can skip steps 2 to 5.
  2. The user should have the IAM permissions as mentioned in this tutorial.
  3. The Dynamic group for the management agents with the required policies must be created as mentioned in this tutorial.
  4. The User must have OCI CLI Version 3.16 or above installed on the machine where scripts will be deployed.
  5. The solution consists of shell scripts so needs to be hosted on a Linux/Unix shell compatible machine.

Task 1: Deploy the solution

Create a separate directory on your Linux based device to install the solution. Following are the detailed steps to deploy the solution in OCI:

  1. Deploy the script for installing the Monitoring agent.

    1. Create a new file named agent-enable.sh in the directory using the command:

      vi agent-enable.sh
      
    2. Copy the following code snippet and paste it in the agent-enable.sh file. Save the changes made in the file and exit.

      #************************************************************************
      #
      #   Licensed under the Apache License, Version 2.0 (the “License”);
      #   you may not use this file except in compliance with the License.
      #   You may obtain a copy of the License at
      #
      #       http://www.apache.org/licenses/LICENSE-2.0
      #
      #   Unless required by applicable law or agreed to in writing, software
      #   distributed under the License is distributed on an “AS IS” BASIS,
      #   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      #   See the License for the specific language governing permissions and
      #   limitations under the License.
      #
      #************************************************************************
      # Available at: https://github.com/dbarj/oci-scripts
      # Created on: Oct/2022 by Maninder Singh Flora and Akarsha Itigi
      # Version 2.0
      #************************************************************************
      
      #!/bin/bash
      DATE=$(date +'%m-%d-%Y-%T')
      
      Help()
      {
         # Display Help
         echo "This script will enable the Monitoring agent on the OCI Compute Instances by taking input of a Compartment-id or an Instance-id."
         echo
         echo "Syntax: sh <script-name> [-h|i|c]"
         echo "options:"
         echo "h     Print this Help."
         echo "i     Takes Instance-id as input and enables agent for that particular compute instance only."
         echo "c     Takes Compartment-id as input and enables agent for all the compute instances in that compartment."
         echo
      }
      
      Allinstance()
      {
      	LOG_FILE=$INPUT_ID-$DATE
      	STR=$INPUT_ID
      	MATCH1=$(echo $INPUT_ID | awk -F"." {'print $2'} | tr -d " ")
      	MATCH2="compartment"
              INPUT_LENGTH=${#STR}
              LENGTH=83
      	if [ $MATCH1 = $MATCH2 ] && [ $INPUT_LENGTH = $LENGTH ]
              then
      	  touch /tmp/monitoring_agent/monitoring_agent_logs/agent_enable_logs/$LOG_FILE.log
      	  for INSTANCE_ID in $(oci compute instance list --compartment-id $INPUT_ID | grep ocid1.instance | awk -F":" {print'$2'} | awk -F"," {print'$1'} | tr -d "\"" | tr -d " ");
                      do
                      echo " "
                      INSTANCE_NAME=$(oci compute instance get --instance-id "$INSTANCE_ID" | grep -i display-name | awk -F":" {print'$2'} | awk -F"," {print'$1'} | tr -d "\"" | tr -d " ")
                      echo "$DATE Enabling Monitoring Agent for $INSTANCE_NAME ..."
                      echo "$DATE Enabling Monitoring Agent for $INSTANCE_NAME ..." >> /tmp/monitoring_agent/monitoring_agent_logs/agent_enable_logs/$LOG_FILE.log
                      oci compute instance update --instance-id "$INSTANCE_ID" --from-json file:///tmp/monitoring_agent/monitoring_agent_templates/agent_enable.json --force >> /tmp/monitoring_agent/monitoring_agent_logs/agent_enable_logs/$LOG_FILE.log
                      echo "$DATE Enabled Monitoring Agent for $INSTANCE_NAME" >> /tmp/monitoring_agent/monitoring_agent_logs/agent_enable_logs/$LOG_FILE.log
                      echo "$DATE Enabled Monitoring Agent for $INSTANCE_NAME"        
                done
      
        	else
      	     echo "[ERORR] : Invalid Compartment-id"
      	fi
      
      }
      
      Instance()
      {
      	LOG_FILE=$INPUT_ID-$DATE
      	STR=$INPUT_ID
      	MATCH1=$(echo $INPUT_ID | awk -F"." {'print $2'} | tr -d " ")
              MATCH2="instance"
              INPUT_LENGTH=${#STR}
              LENGTH=83
      	if [ $MATCH1 = $MATCH2 ] && [ $INPUT_LENGTH = $LENGTH ]
      	then
      		touch /tmp/monitoring_agent/monitoring_agent_logs/agent_enable_logs/$LOG_FILE.log
              	echo " "
      		INSTANCE_NAME=$(oci compute instance get --instance-id "$INPUT_ID" | grep -i display-name | awk -F":" {print'$2'} | awk -F"," {print'$1'} | tr -d "\"" | tr -d " ")
              	echo "$DATE Enabling Monitoring Agent for $INSTANCE_NAME ..."
              	echo "$DATE Enabling Monitoring Agent for $INSTANCE_NAME ..." >> /tmp/monitoring_agent/monitoring_agent_logs/agent_enable_logs/$LOG_FILE.log
              	oci compute instance update --instance-id "$INPUT_ID" --from-json file:///tmp/monitoring_agent/monitoring_agent_templates/agent_enable.json --force >> /tmp/monitoring_agent/monitoring_agent_logs/agent_enable_logs/$LOG_FILE.log
              	echo "$DATE Enabled Monitoring Agent for $INSTANCE_NAME" >> /tmp/monitoring_agent/monitoring_agent_logs/agent_enable_logs/$LOG_FILE.log
              	echo "$DATE Enabled Monitoring Agent for $INSTANCE_NAME"
      	else
                  echo "[ERORR] : Invalid Instance-id"
              fi
      }
      
      # Get the options
      while getopts " hi:c: " option; do
         case $option in
            h) # display Help
               Help
               exit;;
            i) # Enter Instance-id
               INPUT_ID=$OPTARG
               Instance
               exit;;
            c) # Enter Compartment-id
               INPUT_ID=$OPTARG
               Allinstance
               exit;;
           \?) # Invalid option
               echo "[Error] : Invalid option"
               echo "Please check the valid options using ./<script-name> -h"
               exit;;
         esac
      done
      
    3. Give execute permission to the file using the following command:

      chmod +x agent-enable.sh
      
    4. Create another file named agent_enable.json in the same directory using the following command:

      vi agent_enable.json
      
    5. Copy the code snippet from below and paste it in the agent_enable.json file. Save the changes made in the file and exit.

      {
          "agent-config": {
            "are-all-plugins-disabled": false,
            "is-management-disabled": false,
            "is-monitoring-disabled": false,
            "plugins-config": [
              {
                "desired-state": "ENABLED",
                "name": "Management Agent"
              }        
            ]
          }
      }
      
  2. Deploy the script for running OCI Stack Monitoring Discovery Jobs.

    1. Create a new file named discovery-job.sh in the directory using the command:

      vi discovery-job.sh
      
    2. Copy the following code snippet and paste it in the discovery-job.sh file. Save the changes made in the file and exit.

      #************************************************************************
      #
      #   Licensed under the Apache License, Version 2.0 (the “License”);
      #   you may not use this file except in compliance with the License.
      #   You may obtain a copy of the License at
      #
      #       http://www.apache.org/licenses/LICENSE-2.0
      #
      #   Unless required by applicable law or agreed to in writing, software
      #   distributed under the License is distributed on an “AS IS” BASIS,
      #   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      #   See the License for the specific language governing permissions and
      #   limitations under the License.
      #
      #************************************************************************
      # Available at: https://github.com/dbarj/oci-scripts
      # Created on: Oct/2022 by Maninder Singh Flora and Akarsha Itigi
      # Version 1
      #************************************************************************
      
      #!/bin/bash
      DATE=$(date +'%m-%d-%Y-%T')
      
      Help()
      {
         # Display Help
         echo "This script will run Stack-Monitoring Discovery-job on the OCI Compute Instances by taking input of a Compartment-id or an Instance-id."
         echo
         echo "Syntax: sh <script-name> [-h|i|c]"
         echo "options:"
         echo "h     Print this Help."
         echo "i     Takes Instance-id as input and enables agent for that particular compute instance only."
         echo "c     Takes Compartment-id as input and enables agent for all the compute instances in that compartment."
         echo
      }
      
      Allinstance()
      {
              LOG_FILE=$INPUT_ID-$DATE
              STR=$INPUT_ID
              MATCH1=$(echo $INPUT_ID | awk -F"." {'print $2'} | tr -d " ")
              MATCH2="compartment"
              INPUT_LENGTH=${#STR}
              LENGTH=83
              if [ $MATCH1 = $MATCH2 ] && [ $INPUT_LENGTH = $LENGTH ]
      	then
      	    touch /tmp/monitoring_agent/monitoring_agent_logs/stack_monitoring_discovery_job_logs/$LOG_FILE.log
      	    for AGENT_ID in $(oci management-agent agent list --compartment-id $INPUT_ID | grep "CreatedBy" | awk -F":" {print'$2'} | awk -F"," {print'$1'} | tr -d "\"" | tr -d " ");
                  do
                      echo " "
                      INSTANCE_ID=$(oci management-agent agent get --agent-id $AGENT_ID | grep -i ocid1.instance | awk -F":" {print'$2'} | awk -F"," {print'$1'} | tr -d "\"" | tr -d " ")
                      INSTANCE_IP=$(oci compute instance list-vnics --instance-id $INSTANCE_ID | grep -i private-ip | awk -F":" {print'$2'} | awk -F"," {print'$1'} | tr -d "\"" | tr -d " " | head -n 1)
                      INSTANCE_NAME=$(oci compute instance get --instance-id "$INSTANCE_ID" | grep -i display-name | awk -F":" {print'$2'} | awk -F"," {print'$1'} | tr -d "\"" | tr -d " ")
                      cp /tmp/monitoring_agent/monitoring_agent_templates/discovery_job_template.json /tmp/monitoring_agent/monitoring_agent_templates/discovery_job_template1.json
                      sed -i s/COMPARTMENT_ID/$INPUT_ID/ /tmp/monitoring_agent/monitoring_agent_templates/discovery_job_template1.json
                      sed -i s/AGENT_ID/$AGENT_ID/ /tmp/monitoring_agent/monitoring_agent_templates/discovery_job_template1.json
                      sed -i s/INSTANCE_IP/$INSTANCE_IP/ /tmp/monitoring_agent/monitoring_agent_templates/discovery_job_template1.json
                      echo "$DATE Running Stack-Monitoring Discovery-job for $INSTANCE_NAME ($INSTANCE_IP)..."
                      echo "$DATE Running Stack-Monitoring Discovery-job for $INSTANCE_NAME ($INSTANCE_IP)..." >> /tmp/monitoring_agent/monitoring_agent_logs/stack_monitoring_discovery_job_logs/$LOG_FILE.log
                      oci stack-monitoring discovery-job create --compartment-id "$INPUT_ID" --from-json file:///tmp/monitoring_agent/monitoring_agent_templates/discovery_job_template1.json >> /tmp/monitoring_agent/monitoring_agent_logs/stack_monitoring_discovery_job_logs/$LOG_FILE.log
                      echo "$DATE Stack-Monitoring Discovery-job created for $INSTANCE_NAME ($INSTANCE_IP)" >> /tmp/monitoring_agent/monitoring_agent_logs/stack_monitoring_discovery_job_logs/$LOG_FILE.log
                      echo "$DATE Stack-Monitoring Discovery-job created for $INSTANCE_NAME ($INSTANCE_IP)"
                  done
      	else
                   echo "[ERORR] : Invalid Compartment-id"
              fi
      }
      
      Instance()
      {
              LOG_FILE=$INPUT_ID-$DATE
              STR=$INPUT_ID
              MATCH1=$(echo $INPUT_ID | awk -F"." {'print $2'} | tr -d " ")
              MATCH2="instance"
              INPUT_LENGTH=${#STR}
              LENGTH=83
              if [ $MATCH1 = $MATCH2 ] && [ $INPUT_LENGTH = $LENGTH ]
              then
      		touch /tmp/monitoring_agent/monitoring_agent_logs/stack_monitoring_discovery_job_logs/$LOG_FILE.log
      		echo " "
              	INSTANCE_NAME=$(oci compute instance get --instance-id "$INPUT_ID" | grep -i display-name | awk -F":" {print'$2'} | awk -F"," {print'$1'} | tr -d "\"" | tr -d " ")
              	INSTANCE_IP=$(oci compute instance list-vnics --instance-id $INPUT_ID | grep -i private-ip | awk -F":" {print'$2'} | awk -F"," {print'$1'} | tr -d "\"" | tr -d " " | head -n 1)
              	COMPARTMENT_ID=$(oci compute instance get --instance-id "$INPUT_ID" | grep -i compartment | awk -F":" {print'$2'} | awk -F"," {print'$1'} | tr -d "\"" | tr -d " ")
              	AGENT_ID=$(oci management-agent agent list --compartment-id $COMPARTMENT_ID --host-id $INPUT_ID | grep "CreatedBy" | awk -F":" {print'$2'} | awk -F"," {print'$1'} | tr -d "\"" | tr -d " ")
              	cp /tmp/monitoring_agent/monitoring_agent_templates/discovery_job_template.json /tmp/monitoring_agent/monitoring_agent_templates/discovery_job_template1.json
              	sed -i s/COMPARTMENT_ID/$COMPARTMENT_ID/ /tmp/monitoring_agent/monitoring_agent_templates/discovery_job_template1.json
              	sed -i s/AGENT_ID/$AGENT_ID/ /tmp/monitoring_agent/monitoring_agent_templates/discovery_job_template1.json
              	sed -i s/INSTANCE_IP/$INSTANCE_IP/ /tmp/monitoring_agent/monitoring_agent_templates/discovery_job_template1.json
              	echo "$DATE Running Stack-Monitoring Discovery-job for $INSTANCE_NAME ($INSTANCE_IP)..."
              	echo "$DATE Running Stack-Monitoring Discovery-job for $INSTANCE_NAME ($INSTANCE_IP)..." >> /tmp/monitoring_agent/monitoring_agent_logs/stack_monitoring_discovery_job_logs/$LOG_FILE.log
              	oci stack-monitoring discovery-job create --compartment-id "$COMPARTMENT_ID" --from-json file:///tmp/monitoring_agent/monitoring_agent_templates/discovery_job_template1.json >> /tmp/monitoring_agent/monitoring_agent_logs/stack_monitoring_discovery_job_logs/$LOG_FILE.log
              	echo "$DATE Stack-Monitoring Discovery-job created for $INSTANCE_NAME ($INSTANCE_IP)" >> /tmp/monitoring_agent/monitoring_agent_logs/stack_monitoring_discovery_job_logs/$LOG_FILE.log
              	echo "$DATE Stack-Monitoring Discovery-job created for $INSTANCE_NAME ($INSTANCE_IP)"
      
      	else
                  echo "[ERORR] : Invalid Instance-id"
              fi
      }
      
      # Get the options
      while getopts " hi:c: " option; do
         case $option in
            h) # display Help
               Help
               exit;;
            i) # Enter Instance-id
               INPUT_ID=$OPTARG
               Instance
               exit;;
            c) # Enter Compartment-id
               INPUT_ID=$OPTARG
               Allinstance
               exit;;
           \?) # Invalid option
               echo "[Error] : Invalid option"
      	 echo "Please check the valid options using ./<script-name> -h"
               exit;;
         esac
      done
      
    3. Give execute permission to the file using the command.

      chmod +x discovery-job.sh
      
    4. Create another file named discovery_job_template.json in the same directory using the command.

      vi discovery_job_template.json
      
    5. Copy the code snippet from below and paste it in the discovery_job_template.json file. Save the changes made in the file and exit.

      {
          "discoveryType": "ADD",
          "discoveryClient": "host-discovery",
          "compartmentId": "COMPARTMENT_ID",
          "discoveryDetails": {
            "agentId": "AGENT_ID",
            "resourceType": "HOST",
            "resourceName": "INSTANCE_IP",
            "properties": {
              "propertiesMap": {}
            }
          }
      }
      
    6. Create another file named discovery_job_template1.json in the same directory using the command.

      vi discovery_job_template1.json
      
    7. Copy the following code snippet and paste it in the discovery_job_template1.json file. Save the changes made in the file and exit.

      {
          "discoveryType": "ADD",
          "discoveryClient": "host-discovery",
          "compartmentId": "ocid1.compartment.oc1..aaaaaaaapi7c22bjp3zcuiywo2nabufke2lktkfhbf2ehqfclomnjc3vqh3q",
          "discoveryDetails": {
            "agentId": "ocid1.managementagent.oc1.iad.amaaaaaawe6j4fqa4f5q5pj3izevwffdpe63dvykmqzub2xakevch3a7xgtq",
            "resourceType": "HOST",
            "resourceName": "10.0.0.53",
            "properties": {
              "propertiesMap": {}
            }
          }
      }
      
  3. Deploy installation script to complete the setup.

    1. Create a new file named installation.sh in the directory using the command.

      vi installation.sh
      
    2. Copy the followin code snippet and paste it in the installation.sh file. Save the changes made in the file and exit.

      #!/bin/bash
      
      mkdir Monitoring-Agent-Scripts
      mkdir /tmp/monitoring_agent
      mkdir /tmp/monitoring_agent/monitoring_agent_logs
      mkdir /tmp/monitoring_agent/monitoring_agent_templates
      mkdir /tmp/monitoring_agent/monitoring_agent_logs/agent_enable_logs/
      mkdir /tmp/monitoring_agent/monitoring_agent_logs/stack_monitoring_discovery_job_logs
      
      cp agent_enable.json /tmp/monitoring_agent/monitoring_agent_templates/agent_enable.json
      cp discovery_job_template.json /tmp/monitoring_agent/monitoring_agent_templates/discovery_job_template.json
      cp discovery_job_template1.json /tmp/monitoring_agent/monitoring_agent_templates/discovery_job_template1.json
      
      cp discovery-job.sh Monitoring-Agent-Scripts/discovery-job.sh
      cp agent-enable.sh Monitoring-Agent-Scripts/agent-enable.sh
      
      chmod +x Monitoring-Agent-Scripts/discovery-job.sh
      chmod +x Monitoring-Agent-Scripts/agent-enable.sh
      
      rm -f agent_enable.json
      rm -f discovery_job_template.json
      rm -f discovery_job_template1.json
      
      rm -f discovery-job.sh
      rm -f agent-enable.sh
      rm -f installation.sh
      
    3. Give execute permission to the file using the command.

      chmod +x agent-enable.sh
      
  4. Complete the deployment. Now we have all the required files with all the necessary permissions.

    1. To complete the setup run the below command in the same directory.

      sh installation.sh
      
    2. This will copy the JSON templates to the right locations and create the log files to store the agent installation and stack monitoring job runs.

    3. All the above-mentioned templates and JSON files will be copied on the location /tmp/monitoring_agent/monitoring_agent_templates and /tmp/monitoring_agent/monitoring_agent_logs locations respectively.

    Note: You can see one directory named Monitoring-Agent-Scripts in the current directory. This directory has 2 scripts to install the agent and run the discovery job respectively.

Task 2: Execute the solution

After the scripts have been deployed in your device, you can run the scripts to install the agents and run the discovery jobs for Stack Monitoring. Go to the Monitoring-Agent-Scripts directory and follow the further steps.

  1. Run the Monitoring Agents in Compute Instances.

    1. To install the Monitoring-Agent for a single Compute Instance you must run the following script with ‘-i’ option.

      sh agent-enable.sh -i <oci_compute_instance_ocid>
      
    2. To install the Monitoring Agent in all the Compute Instances in a compartment you must run the script with ‘-c’ option.

      sh agent-enable.sh -c <compartment_ocid>
      
    3. You can check the syntax and the option details using the ‘-h’ option with the script as follows.

      sh agent-enable.sh -h
      
    4. Once the above script (1) or (2) has run successfully for the compute instance, you can check the status of the same in the compute instance’s details page under the Oracle Cloud Agent tab. SM List

    Note: Once the Management Agent displays in running state, only then proceed to further steps because if the Management agent is not running in the instance, then you will get errors in the further steps.

  2. Run the OCI Stack Monitoring Discovery Jobs.

    1. Once the Management Agent is running for the Compute Instance, you can run the following script with the ‘-i’ option to run the OCI Stack Monitoring Discovery Job.

      sh discovery-job.sh -i <oci_compute_instance_ocid>
      
    2. To run the OCI Stack Monitoring Discovery Job for all the Compute Instances in a given compartment, you must run the script with the ‘-c’ option.

      sh discovery-job.sh -c <compartment_ocid>
      
    3. You can check the syntax and the above option details using the ‘-h’ option with the script as follows.

      sh discovery-job.sh -h
      
    4. The above script (1) or (2) may take up to 5-10 min to complete. Log in to the OCI portal, navigate to the Home menu, Observability and Management, Stack Monitoring. You can see the Stack Monitoring enabled and a complete dashboard on the Stack Monitoring page. After promotion, the resource type of the compute instance is a Host. SMDashboard

    5. Check the status of the Promotion Job under Resource Discovery to verify the success of discovering the resources. SMAgentList1

    6. On the Stack Monitoring Dashboard, select the Resource block and you will be presented with a list of compute instances and hosts which you enabled the monitoring for.

      SM List 2

    7. Select the desired host from the list and you will be presented with the detailed view of the metrics and tables displayed for that particular host.

      SM Agent Overview

      • Host information and metrics are displayed as charts and tables on the Resource Details page.

        Main Page Chart 1

        Main Page Chart 2

        Main Page Chart 3

      • You can choose the Filesystem Used (GBs) and Filesystem Utilization(%) to get more specific information about the storage present in the host. The table view provides all the metrics in a table style/format.

        Main Page Table

      • Once you select the specific information of instances, it will be displayed in both percentage and storage in GB. Each of the filesystem presented in the host machine will be represented with the mount points in table format.

        Disk Table Info

Acknowledgments

More Learning Resources

Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.

For product documentation, visit Oracle Help Center.