13 Monitoring BRM REST Services Manager Cloud Native

Learn how to use external applications, such as Prometheus, Grafana, and Helidon MP, to monitor BRM REST Services Manager in a cloud native environment.

Topics in this document:

About Monitoring BRM REST Services Manager Cloud Native

You set up monitoring for BRM REST Services Manager by using the following applications:

  • Helidon MP: Use this Eclipse Microprofile application to run health checks and collect metrics. Helidon MP is configured and ready to use in the BRM REST Services Manager deployment package.

    For information about using the health check and metrics endpoints, see "About REST Endpoints for Monitoring BRM REST Services Manager". For more information about Helidon MP, see "Helidon MP Introduction" in the Helidon MP documentation.

  • Prometheus: Use this open-source toolkit to scrape metric data and then store it in a time-series database. Use Prometheus Operator for BRM REST Services Manager.

    See "prometheus-operator" on GitHub.

  • Grafana: Use this open-source tool to view on a graphical dashboard all BRM REST Services Manager metric data stored in Prometheus.

    See "Grafana Support for Prometheus" in the Prometheus documentation for information about using Grafana and Prometheus together.

Setting Up Monitoring for BRM REST Services Manager

To set up monitoring for BRM REST Services Manager cloud native:

  1. Install Prometheus Operator:

    1. Ensure that BRM cloud native prerequisite software, such as the Kubernetes cluster and Helm, are running, and that Git is installed on the node that runs the Helm chart.

    2. Create a namespace for monitoring. For example:

      kubectl create namespace monitoring
    3. Set the HTTP_PROXY environment variable on all cluster nodes with the following command:

      export HTTP_PROXY="proxy_host"
      export HTTPS_PROXY=$HTTP_PROXY

      where proxy_host is the hostname or IP address of your proxy server.

    4. Download the Prometheus Operator helm charts with the following commands:

      helm repo add stable https://charts.helm.sh/stable
      helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
      helm repo update
      helm fetch prometheus-community/kube-prometheus-stack
    5. Unset the HTTP_PROXY environment variable with the following command:

      unset HTTP_PROXY
      unset HTTPS_PROXY
    6. Create an override-values.yaml file for Prometheus Operator and configure optional values to:

      • Add alert rules, such as the two rules in the sample below.

      • Make Prometheus, Alert Manager, and Grafana accessible outside the cluster and host machine by changing the service type to LoadBalancer.

      • Enable Grafana to send email alerts.

      The following sample override-values.yaml shows alert rules and configuration options.

      additionalPrometheusRulesMap:
        - rule-name: BRM-RSM-rule
          groups:
          - name: brm-rsm-alert-rules
            rules:
            - alert: CPU_UsageWarning
              annotations:
                message: CPU has reached 80% utilization
              expr: avg without(cpu) (rate(node_cpu_seconds_total{job="node-exporter", instance="instance", mode!="idle"}[5m])) > 0.8
              for: 5m
              labels:
                severity: critical
            - alert: Memory_UsageWarning
              annotations:
                message: Memory has reached 80% utilization
              expr: node_memory_MemTotal_bytes{job="node-exporter", instance="instance"} - node_memory_MemFree_bytes{job="node-exporter", instance="instance"} - node_memory_Cached_bytes{job="node-exporter",instance="instance"} - node_memory_Buffers_bytes{job="node-exporter", instance="instance"} > 22322927872
              for: 5m
              labels:
                severity: critical
      alertmanager:
        service:
          type: LoadBalancer
      grafana:
        service:
          type: LoadBalancer
        grafana.ini:
          smtp:
            enabled: true
            host: email_host
            user: "email_address"
            password: "password"
            skip_verify: true
      prometheus:
        service:
          type: LoadBalancer

      For details about the default Prometheus Operator values to base your override-values.yaml on, see "prometheus-operator/values.yaml" on the GitHub website.

    7. Save and close the override-values.yaml file.

    8. Install Prometheus Operator with the following command:

      helm install prometheus kube-prometheus-stack --values override-values.yaml -n monitoringNamespace

      where monitoringNamespace is the namespace you created for monitoring.

    9. Verify the installation with the following command:

      kubectl get all -n monitoringNamespace

      Pods and services for the following components should be listed:

      • Alert Manager

      • Grafana

      • Prometheus Operator

      • Prometheus

      • Node Exporter

      • kube-state-metrics

    For the list of compatible software versions, see "BRM Cloud Native Deployment Software Compatibility" in BRM Compatibility Matrix.

  2. Configure BRM REST Services Manager ServiceMonitor, which specifies how to monitor groups of services. Prometheus Operator automatically generates the scrape configuration based on this definition.

    1. Ensure that BRM REST Services Manager is running.

    2. Create an rsm-sm.yaml file with the following content:

      apiVersion: monitoring.coreos.com/v1
      kind: ServiceMonitor
      metadata:
        annotations:
          meta.helm.sh/release-name: releaseName
          meta.helm.sh/release-namespace: rsm_namespace
        labels:
          app.kubernetes.io/managed-by: Helm
          app.kubernetes.io/name: brm-rest-services-manager
          app.kubernetes.io/version: rsm_version
          chart: brmrestservicesmanager-12.0.0
          heritage: Helm
          release: prometheus
        name: brm-rest-services-manager-monitoring
        namespace: rsm_namespace
      spec:
        endpoints:
        - path: /metrics
          port: api-http-prt
        namespaceSelector:
          matchNames:
          - rsm_namespace
        selector:
          matchLabels:
            app.kubernetes.io/name: brm-rest-services-manager
      where:
      • releaseName is the name given to the BRM REST Services Manager deployment during Helm installation

      • rsm_namespace is the namespace where BRM REST Services Manager is deployed

      • rsm_version is the version of BRM REST Services Manager, for example, 12.0.0.4.0

    3. Save and close the file.

    4. Apply the changes with the following command:

      kubectl apply -f rsm-sm.yaml -n rsm_namespace
    5. Verify the configuration in the Prometheus user interface. From the Status menu, select Targets, and confirm that the /metrics endpoint appears.

  3. Configure Grafana for displaying BRM REST Services Manager metric data. See "Creating Grafana Dashboards for BRM REST Services Manager".

  4. Access the health and metrics REST endpoints. See "About REST Endpoints for Monitoring BRM REST Services Manager".

Creating Grafana Dashboards for BRM REST Services Manager

Create a dashboard in Grafana for displaying your BRM REST Services Manager metric data. You can alternatively use the sample dashboard JSON model that is included in the oc-cn-docker-files-12.0.0.x.0.tgz package.

To use the sample dashboard:

  1. Open the oc-cn-docker-files/samples/monitoring/ocrsm-rsm-dashboard.json file in a text editor.

  2. Search for instance=\" and replace the default host and port all occurrences with the host where your instance of Prometheus Operator is running and your prometheus-node-exporter port.

    For example, for the node_memory_MemFree_bytes expression, replace Prometheus_Operator_host and Prometheus_Node_Exporter_Port:

    {
       "exemplar": true,
       "expr": "node_memory_MemFree_bytes{job=\"node-exporter\", instance=\"Prometheus_Operator_host:Prometheus_Node_Exporter_Port\"}",
       "hide": false,
       "interval": "",
       "legendFormat": "Free",
       "refId": "D"
    }
  3. Save and close the file.

  4. In Grafana, import the edited oc-cn-docker-files/samples/monitoring/ocrsm-rsm-dashboard.json dashboard file. See "Export and Import" in the Grafana Dashboards documentation for more information.

Modifying Prometheus and Grafana Alert Rules After Deployment

After deploying Prometheus Operator, you can add alert rules in Prometheus, or make changes in the Grafana user interface.

You have the following options for editing or adding Prometheus alert rules:

  • Edit the override-values.yaml file and upgrade the Helm release.

  • If you added rules in override-values.yaml before installing Prometheus Operator, use the following command to edit the rules file:

    kubectl edit prometheusrule kube-prometheus-stack-0 -n monitoringNamespace
  • If you didn't add any rules in override-values.yaml, use the following command to edit the rules file:

    kubectl edit prometheusrule prometheus-kube-prometheus-alertmanager -n monitoringNamespace

You can also configure alert rules and add or remove email recipients in the Grafana user interface. See "Legacy Grafana Alerts" in the Grafana documentation for more information.

About REST Endpoints for Monitoring BRM REST Services Manager

You can use REST endpoints to monitor metrics and run a health check on BRM REST Services Manager.

Use a browser to send HTTP/HTTPS requests to the endpoints listed in Table 13-1, where hostname and port are the URL and port for your BRM REST Services Manager server.

Table 13-1 BRM REST Services Manager Monitoring Endpoints

Type Description Endpoint
Health Returns details for both health/live and health/ready endpoints https://hostname:port/health
Liveness Confirms that the application can run in the environment. Checks disk space, heap memory, and deadlocks. https://hostname:port/health/live
Readiness Confirms that the application is ready to perform work. https://hostname:port/health/ready
Metrics Returns standard Helidon MP monitoring metrics for BRM REST Services Manager. https://hostname:port/metrics

Sample Response for the Health Endpoint

The following example shows a response for the health endpoint, which includes both liveness and readiness details:

{
    "outcome": "UP",
    "status": "UP",
    "checks": [
        {
            "name": "deadlock",
            "state": "UP",
            "status": "UP"
        },
        {
            "name": "diskSpace",
            "state": "UP",
            "status": "UP",
            "data": {
                "free": "144.85 GB",
                "freeBytes": 155532308480,
                "percentFree": "62.71%",
                "total": "231.00 GB",
                "totalBytes": 248031531008
            }
        },
        {
            "name": "heapMemory",
            "state": "UP",
            "status": "UP",
            "data": {
                "free": "225.08 MB",
                "freeBytes": 236014824,
                "max": "3.48 GB",
                "maxBytes": 3739746304,
                "percentFree": "97.37%",
                "total": "319.00 MB",
                "totalBytes": 334495744
            }
        }
    ]
}

Sample Response for the Metrics Endpoint

The response for the metrics endpoint contains the standard Helidon application and vendor metrics. The following example shows some of the metrics in the response:

# TYPE base_classloader_loadedClasses_count gauge
# HELP base_classloader_loadedClasses_count Displays the number of classes that are currently loaded in the Java virtual machine.
base_classloader_loadedClasses_count 9095
# TYPE base_classloader_loadedClasses_total counter
# HELP base_classloader_loadedClasses_total Displays the total number of classes that have been loaded since the Java virtual machine has started execution.
base_classloader_loadedClasses_total 9097
...
# TYPE base_memory_usedHeap_bytes gauge
# HELP base_memory_usedHeap_bytes Displays the amount of used heap memory in bytes.
base_memory_usedHeap_bytes 138109824
# TYPE base_thread_count gauge
# HELP base_thread_count Displays the current number of live threads including both daemon and nondaemon threads
base_thread_count 20
...
# TYPE vendor_requests_count_total counter
# HELP vendor_requests_count_total Each request (regardless of HTTP method) will increase this counter
vendor_requests_count_total 4
# TYPE vendor_requests_meter_total counter
# HELP vendor_requests_meter_total Each request will mark the meter to see overall throughput
vendor_requests_meter_total 4
# TYPE vendor_requests_meter_rate_per_second gauge
vendor_requests_meter_rate_per_second 0.008296727017772145

For details about all of the metrics, and more information about Helidon monitoring, see: