11 Monitoring and Autoscaling Business Operations Center Cloud Native

Learn how to use external applications, such as Prometheus and Grafana, to monitor and autoscale Oracle Communications Business Operations Center in a cloud native environment.

Topics in this document:

About Monitoring and Autoscaling in Business Operations Center Cloud Native

You set up the monitoring of Business Operations Center and the autoscaling of its managed-server pods by using the following external applications:

  • WebLogic Monitoring Exporter: Use this Oracle web application to scrape runtime information from Business Operations Center cloud native and then expose the metric data in Prometheus format. It exposes different WebLogic Mbeans metrics, such as memory usage and sessions count, that are required for monitoring and maintaining the Business Operations Center application.

  • Prometheus: Use this open-source toolkit to scrape Business Operations Center metric data from WebLogic Monitoring Exporter and then store it in a time-series database. It can also be used to scale up or scale down your Business Operations Center pods based on memory and CPU usage.

    You can use a standalone version of Prometheus or Prometheus Operator.

  • Grafana: Use this open-source tool to view on a graphical dashboard all Business Operations Center metric data that is stored in Prometheus.

Setting Up Monitoring and Autoscaling in Business Operations Center

To set up monitoring and autoscaling in Business Operations Center cloud native:

  1. Deploy Prometheus in one of the following ways:

    • Deploy a standalone version of Prometheus in your cloud native environment. See "Installation" in the Prometheus documentation.

    • Deploy Prometheus Operator. See "prometheus-operator" on the GitHub website.

    For the list of compatible software versions, see "BRM Cloud Native Deployment Software Compatibility" in BRM Compatibility Matrix.

  2. Install Grafana. See "Install Grafana" in the Grafana documentation.

    For the list of compatible software versions, see "BRM Cloud Native Deployment Software Compatibility" in BRM Compatibility Matrix.

  3. Configure WebLogic Monitoring Exporter to scrape metric data from Business Operations Center in your cloud native environment. See "Configuring WebLogic Monitoring Exporter to Scrape Metric Data".

  4. Configure the Prometheus webhook to autoscale the Business Operations Center pods in your cloud native environment. See "Configuring webhook to Enable Autoscaling".

  5. Configure one of the following to collect metric data and send alerts:

  6. Configure Grafana for displaying Business Operations Center metric data. See "Creating Grafana Dashboards for Business Operations Center".

Configuring WebLogic Monitoring Exporter to Scrape Metric Data

You configure WebLogic Monitoring Exporter to scrape metric data for Business Operations Center by enabling monitoring of the application and by specifying whether to use it with Prometheus or Prometheus Operator.

When monitoring is enabled, WebLogic Monitoring Exporter scrapes WebLogic Server MBean metrics such as server status, web application session metrics, servlet metrics, JVM runtime metrics, and so on. See "WebLogic-Based Application Metrics" for a full list of metrics that are scraped. However, you can configure WebLogic Monitoring Exporter to scrape additional WebLogic Server MBeans to meet your business requirements.

To configure WebLogic Monitoring Exporter to scrape metric data for Business Operations Center cloud native:

  1. Open your override-values.yaml file for oc-cn-helm-chart.

  2. Set the ocboc.boc.monitoring.isEnabled key to true.

  3. Set the ocboc.boc.monitoring.operator.isEnabled key to one of the following:

    • true if you are using Prometheus Operator.

    • false if you are using a standalone version of Prometheus. This is the default.

  4. Optionally, configure WebLogic Monitoring Exporter to scrape additional metrics for Business Operations Center. To do so, set the ocboc.boc.monitoring.queries key to the full array of WebLogic Server MBeans to monitor, in YAML structure. For the list of possible MBeans, see MBean Reference for Oracle WebLogic Server in the Oracle WebLogic Server documentation.

    Note:

    Set the queries key to the full list of MBeans to scrape, including the default MBeans. That is, if you want to add one new metric, you must copy the default list from the domain's YAML file, add the new metric to that list, and then copy the full list to the queries key.

  5. Set the other optional keys under ocboc.boc.monitoring as needed.

    For information about the other keys under ocboc.boc.monitoring, read the descriptions in the oc-cn-helm-charts/values.yaml file.
  6. Save and close the file.

  7. Run the helm upgrade command to update the BRM Helm release:

    helm upgrade BrmReleaseName oc-cn-helm-chart --values OverrideValuesFile -n BrmNameSpace

    where:

    • BrmReleaseName is the release name for oc-cn-helm-chart and is used to track this installation instance.

    • OverrideValuesFile is the file name and path to your override-values.yaml file.

    • BrmNameSpace is the name space in which to create BRM Kubernetes objects for the BRM Helm chart.

    WebLogic Monitoring Exporter is started within the Business Operations Center WebLogic Server pod and begins scraping metric data for Business Operations Center.

    If you enabled Prometheus Operator, a ServiceMonitor is also deployed. The ServiceMonitor specifies how to monitor groups of services. Prometheus Operator automatically generates the scrape configuration based on this definition.

Configuring webhook to Enable Autoscaling

You can configure the webhook application to autoscale your Business Operations Center pods. When configured to do so, the webhook application waits for alerts from Prometheus Alertmanager. When it receives a specific alert status, the webhook application calls a script that performs the scaling action.

You can optionally configure the webhook application to monitor for additional alert statuses that trigger calls to your custom scripts.

To configure webhook to autoscale your Business Operations Center pods:

  1. Open your override-values.yaml file for oc-cn-helm-chart.

  2. Set the following keys to enable autoscaling:

    • webhook.isEnabled: Set this to true.

    • webhook.logPath: Set this to the path in which to write log files for the webhook application.

    • webhook.scripts.mountpath: Set this to the directory in which you will store any custom scripts to be run by the webhook application. The default is /u01/script.

    • webhook.wop.namespace: Set this to the namespace for WebLogic Kubernetes Operator. See "Installing WebLogic Kubernetes Operator" in BRM Cloud Native Deployment Guide.

    • webhook.wop.sa: Set this to the service account for the WebLogic Kubernetes Operator. The default is default.

    • webhook.wop.internalOperatorCert: Set this to the WebLogic Kubernetes Operator certificate. To retrieve the certificate for this key, run the following command:

      kubectl -n operator describe configmap

      where operator is the namespace for WebLogic Kubernetes Operator.

    For information about the other optional keys under the webhook section, read the descriptions in the oc-cn-helm-charts/values.yaml file.

  3. If you want the webhook application to monitor for additional alert statuses and call your custom scripts, do the following:

    1. Copy your custom scripts to the oc-cn-helm-chart/webhook_scripts directory.

    2. In your override-values.yaml file for oc-cn-helm-chart, set the webhook.jsonConfig key to include the additional alerts to monitor and the scripts that are triggered when they occur. Use the following format:

      jsonConfig: {"alertName":"value", "alertStatus":["value"], "args":["arg1","arg2"], "script":"path/customScript", "workDirectory":"path"}

      Table 11-1 lists the possible values for each parameter.

      Table 11-1 Webhook Alerts

      Alert Parameter Description

      alertName

      Set this to the name of the alert to monitor, such as clusterScaleUp.

      alertStatus

      Set this to the alert's status that triggers a call to your custom script. For example: firing.

      args

      Set this to the list of arguments to pass to your custom script. The arguments must be listed in the order in which they will appear in the script's command line.

      There are three types of arguments:

      • static: These arguments can be directly mapped while calling your script. For example: "operator" or "operator-sa".
      • custom labels: Use the format @@LABEL:key-name@@, where key-name is an alert label passed in the alert notification. For example, to include the "domain_uid=boc-domain" argument, you would enter "--domain_uid=@@LABEL:domain_uid@@".
      • environment variables: Use the format @@ENV:env-name@@, where env-name is the environment variable that is looked up. For example, to include the "--wls_domain_namespace=oc-cn-brm" argument, you would enter "--wls_domain_namespace=@@ENV:NAMESPACE@@".

      script

      The name of the script to run along with its fully qualified path. For example: /u01/script/scalingAction.sh.

      workDirectory

      The script's current working directory. For example: /u01/oracle/app.

  4. Save and close your override-values.yaml file.

  5. Run the helm upgrade command to update your BRM Helm release:

    helm upgrade BrmReleaseName oc-cn-helm-chart --values OverrideValuesFile -n BrmNameSpace

The webhook application starts waiting for alerts from Prometheus Alertmanager.

Example: Configuring webhook to Autoscale Business Operations Center Pods

The following shows sample override-values.yaml entries for setting up the webhook application to perform autoscaling on your Business Operations Center pods:

webhook:
    isEnabled: true
    logPath: /u01/logs
    logLevel: INFO
    deployment:
        imageName: webhook
        imageTag: $BRM_VERSION
        imagePullPolicy: IfNotPresent
    scripts:
        mountPath: /u01/script
    wop:
        namespace: WME_Namespace
        sa: default
        internalOperatorCert: certificate
    jsonConfig: {"alertName":"clusterAlert", "alertStatus":["firing"], "args":["arg1","arg2"], "script":"/u01/script/customAction.sh", "workDirectory":"/u01/oracle/app"}

Configuring Standalone Prometheus for Business Operations Center

To configure a standalone version of Prometheus for Business Operations Center cloud native:

  1. Open your override-values.yaml file for Prometheus.

  2. Configure Prometheus to collect your Business Operations Center metrics exposed by WebLogic Monitoring Exporter.

    To do so, copy and paste the following into your file, replacing the variables with the appropriate values for your system:

    extraScrapeConfigs: |
        - job_name: 'wls-domain1'
          kubernetes_sd_configs:
          - role: pod
          relabel_configs:
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: namespace
          - source_labels: [__meta_kubernetes_pod_label_weblogic_domainUID, __meta_kubernetes_pod_label_weblogic_clusterName]
            action: keep
            regex: boc-domain
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
            action: replace
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2
            target_label: __address__
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)
          - source_labels: [__meta_kubernetes_pod_name]
            action: replace
            target_label: pod_name
          basic_auth:
            username: WebLogic_UserName
            password: WebLogic_Password
  3. Configure the alert rules in Prometheus.

    To do so, copy and paste the following into your file, replacing the variables with the appropriate values for your system. However, do not change the alert names clusterScaleUp and clusterScaleDown.

    The clusterScaleUp rule specifies to scale up the number of Business Operations Center managed server pods when the number of servers goes below two for two minutes. The clusterScaledown rule specifies to scale down the number of Business Operations Center managed servers pods when the number of servers goes below two for two minutes. For examples of other expressions you can use, see "Sample Prometheus Alert Rules for Business Operations Center".

    serverFiles:  
      alerts:     
        groups:      
          - name: node_rules        
            rules:          
              - alert: clusterScaleUp            
                for: 2m            
                expr: sum by(weblogic_domainUID, weblogic_clusterName) (up{weblogic_domainUID="boc-domain"}) < 2            
                labels:              
                  domain_uid: boc-domain              
                  severity: critical                          
                annotations:              
                  description: 'Server count is less than 2'              
                  summary: 'Some wls cluster is in warning state.'          
              - alert: clusterScaleDown            
                for: 2m            
                expr: sum by(weblogic_domainUID, weblogic_clusterName) (up{weblogic_domainUID="boc-domain"}) > 3            
                labels:              
                  domain_uid: boc-domain              
                  severity: critical                        
                annotations:              
                  description: 'Server count is greater 3'              
                  summary: 'Some wls cluster is in warning state.'
  4. Configure Alertmanager to send alerts to the webhook application.

    To do so, copy and paste the following into your file, replacing the variables with the appropriate values for your system. However, do not change the alert names clusterScaleUp and clusterScaleDown.

    For the url key, use the following syntax: http://webhook.WLS_NameSpace.svc.cluster.local:8080/action, where WLS_NameSpace is the namespace for your WebLogic Server domain.

    alertmanagerFiles:
      alertmanager.yml:
        global:
          resolve_timeout: 5m
        route:
          group_by: ['alertname']
          receiver: 'null'
          group_wait: 10s
          group_interval: 10s
          repeat_interval: 5m
          routes:
          - match:
              alertname: clusterScaleUp
            receiver: 'web.hook'
          - match:
              alertname: clusterScaleDown
            receiver: 'web.hook'
        receivers:
        - name: 'web.hook'
          webhook_configs:
          - send_resolved: false
            url: 'http://webhook.oc-cn-brm.svc.cluster.local:8080/action'
        - name: 'null'
  5. Save and close your override-values.yaml file for Prometheus.

  6. Run the helm upgrade command to update your Prometheus Helm chart.

Configuring Prometheus Operator for Business Operations Center

To configure Prometheus Operator for Business Operations Center cloud native:

  1. Open your override-values.yaml file for Prometheus Operator.

  2. Configure the alert rules for Prometheus Operator.

    To do so, copy and paste the following additionalPrometheusRulesMap section into your file, replacing the variables with the appropriate values for your system. However, do not change the alert names clusterScaleUp and clusterScaleDown.

    The clusterScaleUp rule specifies to scale up the number of Business Operations Center managed server pods when the number of servers goes below two for two minutes. The clusterScaledown rule specifies to scale down the number of Business Operations Center managed servers pods when the number of servers goes below two for two minutes. For examples of other expressions you can use, see "Sample Prometheus Alert Rules for Business Operations Center".

    ## Provide custom recording or alerting rules to be deployed into the cluster.
    ##
    additionalPrometheusRulesMap:  
      - rule-name: Custom-rule    
        groups:    
      - name: custom-alert.rules      
        rules:      
      - alert: clusterScaleUp        
        annotations:          
          message: WLS cluster has less than 2 running server for more than 2 minutes.        
        expr: sum by(weblogic_domainUID) (up{serviceType="SERVER",weblogic_clusterName="cluster-1",weblogic_domainUID="boc-domain"}) < 2        
        for: 2m        
        labels:          
          domain_uid: boc-domain          
          severity: critical      
      - alert: clusterScaleDown        
        annotations:          
          message: WLS cluster has more than 3 running servers for more than 2 minutes.        
        expr: sum by(weblogic_domainUID) (up{serviceType="SERVER",weblogic_clusterName="cluster-1",weblogic_domainUID="boc-domain"}) > 3        
        for: 2m        
        labels:          
          domain_uid: boc-domain          
          severity: critical
  3. Configure Prometheus Operator to send alerts to the webhook application in WebLogic Monitoring Exporter.

    To do so, copy and paste the following alertmanager section into your file, replacing the variables with the appropriate values for your system. However, do not change the alert names clusterScaleUp and clusterScaleDown.

    For the url key, use the following syntax: http://webhook.BrmNameSpace.svc.cluster.local:8080/action, where BrmNameSpace is the namespace for your BRM Kubernetes objects.

    alertmanager:
      config:
        global:
          resolve_timeout: 5m
        route:
          group_by: ['alertname']
          group_wait: 10s
          group_interval: 10s
          repeat_interval: 5m
          receiver: 'null'
          routes:
          - match:
              alertname: clusterScaleUp
            receiver: 'web.hook'
          - match:
              alertname: clusterScaleDown
            receiver: 'web.hook'
        receivers:
        - name: 'null'
        - name: 'web.hook'
          webhook_configs:
          - send_resolved: false
            url: 'http://webhook.oc-cn-brm.svc.cluster.local:8080/action'
  4. Save and close your override-values.yaml file for Prometheus Operator.

  5. Run the helm upgrade command to update your Prometheus Operator Helm chart.

Creating Grafana Dashboards for Business Operations Center

Create a dashboard in Grafana for displaying your Business Operations Center metric data. You can alternatively use the sample dashboard JSON model that is included in the oc-cn-docker-files-12.0.0.x.0.tgz package.

Note:

For the sample dashboard to work properly, the datasource name for the WebLogic Domain must be Prometheus.

To use the sample dashboard, import the oc-cn-docker-files/samples/monitoring/ocboc-boc-dashboard.json dashboard file into Grafana. See "Export and Import" in the Grafana Dashboards documentation for more information.

Sample Prometheus Alert Rules for Business Operations Center

You can use custom expressions for your Prometheus alert rules when setting up autoscaling in Business Operations Center.

Sample Cluster Scale Up Expressions

To raise an alert when the average CPU usage across managed servers is greater than 70% for more than two minutes:

avg(avg_over_time(wls_jvm_process_cpu_load{weblogic_clusterName=~".+",weblogic_domainUID="boc-domain",weblogic_serverName=~".+"}[2m]))*100 > 70

To raise an alert when the average memory usage across managed servers is greater than 70% for more than two minutes:

100 - avg(avg_over_time(wls_jvm_heap_free_percent{weblogic_domainUID="boc-domain",weblogic_clusterName=~".+",weblogic_serverName=~".+"}[2m])) > 70

To raise an alert when the CPU usage is greater than 70% and memory usage is greater than 70%:

avg(avg_over_time(wls_jvm_process_cpu_load{weblogic_clusterName=~".+",weblogic_domainUID="boc-domain",weblogic_serverName=~".+"}[2m])) * 100 > 70 and on() 100 - avg(avg_over_time(wls_jvm_heap_free_percent{weblogic_clusterName=~".+",weblogic_domainUID="boc-domain",weblogic_serverName=~".+"}[2m])) > 70

Sample Cluster Scale Down Expressions

To raise an alert when the CPU usage is less than 40%, memory usage is less than 40%, and the number of managed servers is equal to 5:

avg(avg_over_time(wls_jvm_process_cpu_load{weblogic_clusterName=~".+",weblogic_domainUID="boc-domain",weblogic_serverName=~".+"}[2m])) * 100 < 40 and on() 100 - avg(avg_over_time(wls_jvm_heap_free_percent{weblogic_clusterName=~".+",weblogic_domainUID="boc-domain",weblogic_serverName=~".+"}[2m])) < 40 and on() sum by(weblogic_domainUID)(up{weblogic_clusterName="cluster-1",weblogic_domainUID="boc-domain"}) == 5