12 Monitoring and Autoscaling Billing Care Cloud Native

Learn how to use external applications, such as Prometheus and Grafana, to monitor and autoscale Oracle Communications Billing Care in a cloud native environment.

Topics in this document:

About Monitoring and Autoscaling in Billing Care Cloud Native

You set up the monitoring of Billing Care and the Billing Care REST API and the autoscaling of their managed-server pods by using the following external applications:

  • WebLogic Monitoring Exporter: Use this Oracle web application to scrape runtime information from Billing Care and the Billing Care REST API and then expose the metric data in Prometheus format. It exposes different WebLogic Mbeans metrics, such as memory usage and sessions count, that are required for monitoring and maintaining the Billing Care and Billing Care REST API applications.

  • Prometheus: Use this open-source toolkit to scrape metric data from WebLogic Monitoring Exporter and then store it in a time-series database. It can also be used to scale up or scale down your Billing Care managed server pods based on memory and CPU usage.

    You can use a standalone version of Prometheus or Prometheus Operator.

  • Grafana: Use this open-source tool to view on a graphical dashboard all Billing Care and Billing Care REST API metric data stored in Prometheus.

Setting Up Monitoring and Autoscaling in Billing Care and Billing Care REST API

To set up the monitoring and autoscaling of Billing Care and the Billing Care REST API in a cloud native environment:

  1. Deploy Prometheus in one of the following ways:

    • Deploy a standalone version of Prometheus in your cloud native environment. See "Installation" in the Prometheus documentation.

    • Deploy Prometheus Operator. See "prometheus-operator" on the GitHub website.

    For the list of compatible software versions, see "BRM Cloud Native Deployment Software Compatibility" in BRM Compatibility Matrix.

  2. Install Grafana. See "Install Grafana" in the Grafana documentation.

    For the list of compatible software versions, see "BRM Cloud Native Deployment Software Compatibility" in BRM Compatibility Matrix.

  3. Configure WebLogic Monitoring Exporter to scrape metric data from Billing Care in your cloud native environment. See "Configuring WebLogic Monitoring Exporter to Scrape Metric Data".

  4. Configure webhook to enable the autoscaling of Billing Care and Billing Care REST API pods in your cloud native environment. See "Configuring Webhook to Enable Autoscaling".

  5. Configure one of the following to collect metric data and send alerts:

  6. Configure Grafana for displaying Billing Care metric data. See "Creating Grafana Dashboards for Billing Care and Billing Care REST API".

Configuring WebLogic Monitoring Exporter to Scrape Metric Data

You configure WebLogic Monitoring Exporter to scrape metric data for Billing Care and the Billing Care REST API by enabling monitoring in each application and by specifying whether to use each application with Prometheus or Prometheus Operator.

When monitoring is enabled, WebLogic Monitoring Exporter scrapes WebLogic Server MBean metrics such as server status, web application session metrics, servlet metrics, JVM runtime metrics, and so on. See "WebLogic-Based Application Metrics" for a full list of metrics that are scraped. However, you can configure WebLogic Monitoring Exporter to scrape additional WebLogic Server MBeans to meet your business requirements.

To configure WebLogic Monitoring Exporter to scrape metric data for Billing Care and the Billing Care REST API in a cloud native environment:

  1. Open your override-values.yaml file for oc-cn-helm-chart.

  2. Configure monitoring for Billing Care cloud native:

    • Set the ocbc.bc.monitoring.isEnabled key to true.

    • Set the ocbc.bc.monitoring.operator.isEnabled key to true if you are using Prometheus Operator, or false if you are using a standalone version of Prometheus. The default is false.

  3. Configure monitoring for the Billing Care REST API:

    • Set the ocbc.bcws.monitoring.isEnabled key to true.

    • Set the ocbc.bcws.monitoring.operator.isEnabled key to true if you are using Prometheus Operator, or false if you are using a standalone version of Prometheus. The default in false.

  4. Optionally, configure WebLogic Monitoring Exporter to scrape additional metrics. To do so, set the following keys to the full array of WebLogic Server MBeans to monitor, in YAML structure. For the list of possible MBeans, see MBean Reference for Oracle WebLogic Server in the Oracle WebLogic Server documentation.

    • For Billing Care: ocbc.bc.monitoring.queries

    • For the Billing Care REST API: ocbc.bcws.monitoring.queries

    Note:

    Set the queries key to the full list of MBeans to scrape, including the default MBeans. That is, if you want to add one new metric, you must copy the default list from the domain's YAML file, add the new metric to that list, and then copy the full list to the queries key.

  5. Set the other optional monitoring keys as needed.

    For information about the other keys, read the descriptions in the oc-cn-helm-charts/values.yaml file.
  6. Save and close the file.

  7. Run the helm upgrade command to update the BRM Helm release:

    helm upgrade BrmReleaseName oc-cn-helm-chart --values OverrideValuesFile -n BrmNameSpace

    where:

    • BrmReleaseName is the release name for oc-cn-helm-chart and is used to track this installation instance.

    • OverrideValuesFile is the file name and path to your override-values.yaml file.

    • BrmNameSpace is the name space in which to create BRM Kubernetes objects for the BRM Helm chart.

    WebLogic Monitoring Exporter is started within the Billing Care and Billing Care REST API WebLogic Server pods and begins scraping metric data for Billing Care and the Billing Care REST API.

    If you enabled Prometheus Operator, a ServiceMonitor is also deployed. The ServiceMonitor specifies how to monitor groups of services. Prometheus Operator automatically generates the scrape configuration based on this definition.

Configuring Webhook to Enable Autoscaling

You can configure the webhook application to autoscale your Billing Care and Billing Care REST API pods. When configured to do so, the webhook application waits for alerts from Prometheus Alertmanager. When it receives a specific alert status, the webhook application calls a script that performs the scaling action.

You can optionally configure the webhook application to monitor for additional alert statuses that trigger calls to your custom scripts.

To configure WebLogic Monitoring Exporter to autoscale your Billing Care pods:

  1. Open your override-values.yaml file for oc-cn-helm-chart.

  2. Set the following keys to enable autoscaling:

    • webhook.isEnabled: Set this to true.

    • webhook.logPath: Set this to the path in which to write log files for the webhook application.

    • webhook.scripts.mountpath: Set this to the directory in which you will store any custom scripts to be run by the webhook application. The default is /u01/script.

    • webhook.wop.namespace: Set this to the namespace for WebLogic Kubernetes Operator. See "Installing WebLogic Kubernetes Operator" in BRM Cloud Native Deployment Guide.

    • webhook.wop.sa: Set this to the service account for the WebLogic Kubernetes Operator. The default is default.

    • webhook.wop.internalOperatorCert: Set this to the WebLogic Kubernetes Operator certificate. To retrieve the certificate for this key, run the following command:

      kubectl -n operator describe configmap

      where operator is the namespace for WebLogic Kubenetes Operator.

    For information about the other optional keys under the webhook section, read the descriptions in the oc-cn-helm-charts/values.yaml file.

  3. If you want the webhook application to monitor for additional alert statuses and call your custom scripts, do the following:

    1. Copy your custom scripts to the oc-cn-helm-chart/webhook_scripts directory.

      Note:

      You can configure the mount path for your custom scripts by using the webhook.scripts.mountPath key.

    2. In your override-values.yaml file for oc-cn-helm-chart, set the webhook.jsonConfig key to include the additional alerts to monitor and the scripts that are triggered when they occur. Use the following format:

      jsonConfig: {"alertName":"value", "alertStatus":["value"], "args":["arg1","arg2"], "script":"path/customScript", "workDirectory":"path"}

      Table 12-1 lists the possible values for each parameter.

      Table 12-1 Webhook Alerts

      Alert Parameter Description

      alertName

      Set this to the name of the alert to monitor, such as clusterScaleUp.

      alertStatus

      Set this to the alert's status that triggers a call to your custom script. For example: firing.

      args

      Set this to the list of arguments to pass to your custom script. The arguments must be listed in the order in which they will appear in the script's command line.

      There are three types of arguments:

      • static: These arguments can be directly mapped while calling your script. For example: "operator" or "operator-sa".
      • custom labels: Use the format @@LABEL:key-name@@, where key-name is an alert label passed in the alert notification. For example, to include the "domain_uid=bc-domain" argument, you would enter "--domain_uid=@@LABEL:domain_uid@@".
      • environment variables: Use the format @@ENV:env-name@@, where env-name is the environment variable that is looked up. For example, to include the "--wls_domain_namespace=oc-cn-brm" argument, you would enter "--wls_domain_namespace=@@ENV:NAMESPACE@@".

      script

      The name of the script to run along with its fully qualified path. For example: /u01/script/scalingAction.sh.

      workDirectory

      The script's current working directory. For example: /u01/oracle/app.

  4. Save and close your override-values.yaml file.

  5. Run the helm upgrade command to update your BRM Helm release:

    helm upgrade BrmReleaseName oc-cn-helm-chart --values OverrideValuesFile -n BrmNameSpace

The webhook application starts waiting for alerts from Prometheus Alertmanager.

Example: Configuring webhook to Autoscale Billing Care Pods

The following shows sample override-values.yaml entries for setting up the webhook application to perform autoscaling on your Billing Care and Billing Care REST API pods:

webhook:
    isEnabled: true
    logPath: /u01/logs
    logLevel: INFO
    deployment:
        imageName: webhook
        imageTag: $BRM_VERSION
        imagePullPolicy: IfNotPresent
    scripts:
        mountPath: /u01/script
    wop:
        namespace: WebLogicKubernetesOperator_Namespace
        sa: default
        internalOperatorCert: certificate
    jsonConfig: {"alertName":"clusterAlert", "alertStatus":["firing"], "args":["arg1","arg2"], "script":"/u01/script/customAction.sh", "workDirectory":"/u01/oracle/app"}

Configuring Standalone Prometheus for Billing Care

To configure a standalone version of Prometheus for Billing Care and the Billing Care REST API:

  1. Open your override-values.yaml file for Prometheus.

  2. Configure Prometheus to scrape the required metrics exposed by WebLogic Monitoring Exporter.

    To do so, copy and paste the following into your file, replacing the variables with the appropriate values for your system:

    extraScrapeConfigs: |    
      - job_name: 'wls-domain1'      
        kubernetes_sd_configs:      
      - role: pod      
        relabel_configs:      
      - source_labels: [__meta_kubernetes_namespace]        
        action: replace        
        target_label: namespace      
      - source_labels: [__meta_kubernetes_pod_label_weblogic_domainUID, __meta_kubernetes_pod_label_weblogic_clusterName]        
        action: keep        
        regex: billingcare-domain      
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]        
        action: replace        
        target_label: __metrics_path__        
        regex: (.+)      
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]        
        action: replace        
        regex: ([^:]+)(?::\d+)?;(\d+)        
        replacement: $1:$2        
        target_label: __address__      
      - action: labelmap        
        regex: __meta_kubernetes_pod_label_(.+)      
      - source_labels: [__meta_kubernetes_pod_name]        
        action: replace        
        target_label: pod_name      
        basic_auth:        
          username: username        
          password: password 
        
      - job_name: 'wls-domain2'      
        kubernetes_sd_configs:      
      - role: pod      
        relabel_configs:      
      - source_labels: [__meta_kubernetes_namespace]        
        action: replace        
        target_label: namespace      
      - source_labels: [__meta_kubernetes_pod_label_weblogic_domainUID, __meta_kubernetes_pod_label_weblogic_clusterName]        
        action: keep        
        regex: bcws-domain      
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]        
        action: replace        
        target_label: __metrics_path__        
        regex: (.+)      
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]        
        action: replace        
        regex: ([^:]+)(?::\d+)?;(\d+)        
        replacement: $1:$2        
        target_label: __address__      
      - action: labelmap        
        regex: __meta_kubernetes_pod_label_(.+)      
      - source_labels: [__meta_kubernetes_pod_name]        
        action: replace        
        target_label: pod_name      
        basic_auth:        
          username: username        
          password: password

    where username and password is your WebLogic Server user name and password.

  3. Configure the alert rules in Prometheus.

    To do so, copy and paste the following into your file, replacing the variables with the appropriate values for your system. However, do not change the alert names clusterScaleUp and clusterScaleDown.

    The clusterScaleUp rule specifies to scale up the number of Billing Care and Billing Care REST API managed server pods when the number of servers goes below two for two minutes. The clusterScaledown rule specifies to scale down the number of Billing Care and Billing Care REST API managed server pods when the number of servers goes below two for two minutes. For examples of other expressions you can use, see "Sample Prometheus Alert Rules for Billing Care and Billing Care REST API".

    serverFiles:  
      alerts:    
        groups:      
          - name: node_rules        
            rules:          
              - alert: clusterScaleUp            
                for: 2m            
                expr: sum by(weblogic_domainUID, weblogic_clusterName) (up{weblogic_domainUID="billingcare-domain"}) < 2            
                labels:              
                  domain_uid: billingcare-domain              
                  severity: critical            
                annotations:              
                  description: 'Server count is less than 2'              
                  summary: 'Some wls cluster is in warning state.'          
              - alert: clusterScaleDown            
                for: 2m            
                expr: sum by(weblogic_domainUID, weblogic_clusterName) (up{weblogic_domainUID="billingcare-domain"}) > 3            
                labels:              
                  domain_uid: billingcare-domain              
                  severity: critical            
                annotations:              
                  description: 'Server count is greater than 3'              
                  summary: 'Some wls cluster is in warning state.'    
           
              - alert: clusterScaleUp            
                for: 2m            
                expr: sum by(weblogic_domainUID, weblogic_clusterName) (up{weblogic_domainUID="bcws-domain"}) < 2            
                labels:              
                  domain_uid: bcws-domain              
                  severity: critical            
                annotations:              
                  description: 'Server count is less than 2'              
                  summary: 'Some wls cluster is in warning state.'          
              - alert: clusterScaleDown            
                for: 2m            
                expr: sum by(weblogic_domainUID, weblogic_clusterName) (up{weblogic_domainUID="bcws-domain"}) > 3            
                labels:              
                  domain_uid: bcws-domain              
                  severity: critical            
                annotations:              
                  description: 'Server count is greater than 3'              
                  summary: 'Some wls cluster is in warning state.'
  4. Configure Prometheus Alertmanager to send alerts to the webhook application.

    To do so, copy and paste the following into your file, replacing the variables with the appropriate values for your system. However, do not change the alert names clusterScaleUp and clusterScaleDown.

    For the url key, use the following syntax: http://webhook.BRMNameSpace.svc.cluster.local:8080/action, where BRMNameSpace is the namespace for your BRM Kubernetes objects.

    alertmanagerFiles:  
      alertmanager.yml:    
        global:      
          resolve_timeout: 5m    
        route:      
          group_by: ['alertname']      
          receiver: 'null'      
          group_wait: 10s      
          group_interval: 10s      
          repeat_interval: 5m      
          routes:      
          - match:          
            alertname: clusterScaleUp        
            receiver: 'web.hook'      
          - match:          
            alertname: clusterScaleDown        
            receiver: 'web.hook'    
            receivers:    
          - name: 'web.hook'      
            webhook_configs:      
          - send_resolved: false        
            url: 'http://webhook.oc-cn-brm.svc.cluster.local:8080/action'    
          - name: 'null'
  5. Save and close your override-values.yaml file for Prometheus.

  6. Run the helm upgrade command to update your Prometheus Helm chart.

Configuring Prometheus Operator for Billing Care

To configure Prometheus Operator for Billing Care cloud native:

  1. Open your override-values.yaml file for Prometheus Operator.

  2. Configure the alert rules for Prometheus Operator.

    To do so, copy and paste the following additionalPrometheusRulesMap section into your file, replacing the variables with the appropriate values for your system. However, do not change the alert names clusterScaleUp and clusterScaleDown.

    The clusterScaleUp rule specifies to scale up the number of managed server Billing Care or Billing Care REST API pods when the number of servers goes below two for two minutes. The clusterScaledown rule specifies to scale down the number of Billing Care or Billing Care REST API managed server pods when the number of servers goes below two for two minutes. For examples of other expressions you can use, see "Sample Prometheus Alert Rules for Billing Care and Billing Care REST API".

    ## Provide custom recording or alerting rules to be deployed into the cluster.
    ##
    
    additionalPrometheusRulesMap:  
      - rule-name: Custom-rule    
        groups:    
      - name: custom-alert.rules      
        rules:      
        - alert: clusterScaleUp        
          annotations:          
            message: WLS cluster has less than 2 running server for more than 2 minutes.        
          expr: sum by(weblogic_domainUID) (up{serviceType="SERVER",weblogic_clusterName="cluster-1",weblogic_domainUID="billingcare-domain"}) < 2        
          for: 2m        
          labels:          
            domain_uid: billingcare-domain          
            severity: critical      
        - alert: clusterScaleDown        
          annotations:          
            message: WLS cluster has more than 3 running servers for more than 2 minutes.        
          expr: sum by(weblogic_domainUID) (up{serviceType="SERVER",weblogic_clusterName="cluster-1",weblogic_domainUID="billingcare-domain"}) > 3        
          for: 2m        
          labels:          
            domain_uid: billingcare-domain          
            severity: critical      
        - alert: clusterScaleUp        
          annotations:          
            message: WLS cluster has less than 2 running server for more than 2 minutes.        
          expr: sum by(weblogic_domainUID) (up{serviceType="SERVER",weblogic_clusterName="cluster-1",weblogic_domainUID="bcws-domain"}) < 2        
          for: 2m        
          labels:          
            domain_uid: bcws-domain          
            severity: critical      
        - alert: clusterScaleDown        
          annotations:          
            message: WLS cluster has more than 3 running servers for more than 2 minutes.        
          expr: sum by(weblogic_domainUID) (up{serviceType="SERVER",weblogic_clusterName="cluster-1",weblogic_domainUID="bcws-domain"}) > 3        
          for: 2m        
          labels:          
            domain_uid: bcws-domain          
            severity: critical
  3. Configure Prometheus Operator to send alerts to the webhook application in WebLogic Monitoring Exporter.

    To do so, copy and paste the following alertmanager section into your file, replacing the variables with the appropriate values for your system. However, do not change the alert names clusterScaleUp and clusterScaleDown.

    For the url key, use the following syntax: http://webhook.BrmNameSpace.svc.cluster.local:8080/action, where BrmNameSpace is the namespace for your BRM Kubernetes objects.

    alertmanager:  
      config:    
        global:      
          resolve_timeout: 5m    
        route:     
          group_by: ['alertname']      
          group_wait: 10s      
          group_interval: 10s      
          repeat_interval: 5m      
          receiver: 'null'      
          routes:      
          - match:          
              alertname: clusterScaleUp        
            receiver: 'web.hook'      
          - match:          
              alertname: clusterScaleDown        
            receiver: 'web.hook'    
        receivers:    
        - name: 'null'    
        - name: 'web.hook'      
          webhook_configs:      
          - send_resolved: false        
            url: 'http://webhook.oc-cn-brm.svc.cluster.local:8080/action'
  4. Save and close your override-values.yaml file for Prometheus Operator.

  5. Run the helm upgrade command to update your Prometheus Operator Helm chart.

Creating Grafana Dashboards for Billing Care and Billing Care REST API

You can create a dashboard in Grafana for displaying your Billing Care and Billing Care REST API metric data.

Alternatively, you can use the sample dashboards that are included in the oc-cn-docker-files-12.0.0.x.0.tgz package. To use the sample dashboards, import the following dashboard files into Grafana. See "Export and Import" in the Grafana Dashboards documentation for more information.

  • Billing Care: oc-cn-docker-files/samples/monitoring/ocbc-billingcare-dashboard.json

  • Billing Care REST API: oc-cn-docker-files/samples/monitoring/ocbc-billingcare-rest-api-dashboard.json

Note:

For the sample dashboards to work properly, the datasource name for the WebLogic Domain must be Prometheus.

Sample Prometheus Alert Rules for Billing Care and Billing Care REST API

You can use custom expressions for your Prometheus alert rules when setting up autoscaling in Billing Care and the Billing Care REST API.

Sample Scale Up Expressions

To raise an alert when the average CPU usage across managed servers for more than 2 minutes is greater than 70%:

  • For a Billing Care REST API domain:

    avg(avg_over_time(wls_jvm_process_cpu_load{weblogic_clusterName=~".+",weblogic_domainUID="bcws-domain",weblogic_serverName=~".+"}[2m]))*100> 70
  • For a Billing Care domain:

    avg(avg_over_time(wls_jvm_process_cpu_load{weblogic_clusterName=~".+",weblogic_domainUID="billingcare-domain",weblogic_serverName=~".+"}[2m]))*100 > 70

To raise an alert when the average memory usage over 2 minutes across managed servers is greater than 70%:

  • For a Billing Care REST API domain:

    100 - avg(avg_over_time(wls_jvm_heap_free_percent{weblogic_domainUID="bcws-domain",weblogic_clusterName=~".+",weblogic_serverName=~".+"}[2m])) > 70
  • For a Billing Care domain:

    100 - avg(avg_over_time(wls_jvm_heap_free_percent{weblogic_domainUID="billingcare-domain",weblogic_clusterName=~".+",weblogic_serverName=~".+"}[2m])) > 70

To raise an alert when the CPU usage is greater than 70% and the memory usage is greater than 70%:

  • For a Billing Care REST API domain:

    avg(avg_over_time(wls_jvm_process_cpu_load{weblogic_clusterName=~".+",weblogic_domainUID="bcws-domain",weblogic_serverName=~".+"}[2m]))* 100 > 70 and on() 100 -avg(avg_over_time(wls_jvm_heap_free_percent{weblogic_clusterName=~".+",weblogic_domainUID="bcws-domain",weblogic_serverName=~".+"}[2m]))> 70
  • For a Billing Care domain:

    avg(avg_over_time(wls_jvm_process_cpu_load{weblogic_clusterName=~".+",weblogic_domainUID="billingcare-domain",weblogic_serverName=~".+"}[2m]))* 100 > 70 and on() 100 -avg(avg_over_time(wls_jvm_heap_free_percent{weblogic_clusterName=~".+",weblogic_domainUID="billingcare-domain",weblogic_serverName=~".+"}[2m]))> 70

Sample Scale Down Expressions

To raise an alert when the CPU usage is less than 40%, memory usage is less than 40%, and the number of managed servers is equal to 5 for two minutes:

  • For a Billing Care REST API domain:

    avg(avg_over_time(wls_jvm_process_cpu_load{weblogic_clusterName=~".+",weblogic_domainUID="bcws-domain",weblogic_serverName=~".+"}[2m]))* 100 < 40 and on() 100 -avg(avg_over_time(wls_jvm_heap_free_percent{weblogic_clusterName=~".+",weblogic_domainUID="bcws-domain",weblogic_serverName=~".+"}[2m]))< 40 and on() sum by(weblogic_domainUID)(up{weblogic_clusterName="cluster-1",weblogic_domainUID="bcws-domain"}) ==5
  • For a Billing Care domain:

    avg(avg_over_time(wls_jvm_process_cpu_load{weblogic_clusterName=~".+",weblogic_domainUID="billingcare-domain",weblogic_serverName=~".+"}[2m]))* 100 < 40 and on() 100 -avg(avg_over_time(wls_jvm_heap_free_percent{weblogic_clusterName=~".+",weblogic_domainUID="billingcare-domain",web