Monitoring BRM Components

41 Monitoring BRM Components

Learn how to use external applications, such as Pushgateway, Prometheus, and Grafana, to monitor components in Oracle Communications Billing and Revenue Management (BRM).

Topics in this document:

About Monitoring Your BRM Components

You set up the monitoring of your BRM components by using the following external applications:

Pushgateway: Pushgateway caches your BRM metrics and exposes them to Prometheus.
Prometheus: Prometheus scrapes your BRM metrics from Pushgateway and then stores them in a time-series database.
Grafana: Use this open-source tool to view on a graphical dashboard all BRM metric data stored in Prometheus.

Setting Up Monitoring for BRM Components

To set up monitoring for your BRM components:

Install the following external software on your system:
- Prometheus Pushgateway. See https://github.com/prometheus/pushgateway on the GitHub website.
- Prometheus. See "Installation" in the Prometheus documentation.
- Grafana. See "Install Grafana" in the Grafana documentation.
For the list of compatible software versions, see "BRM Software Compatibility" in BRM Compatibility Matrix.
Optionally, customize Pushgateway. See "Customizing Pushgateway for BRM".
Enable Perflib-based monitoring of your CM. See "Enabling Monitoring of Your CM".
Enable Perflib-based monitoring of your Oracle DM. See "Enabling Monitoring of Your Oracle DM".
Enable Perflib-based monitoring of the Account Synchronization DM and Synchronization Queue DM. See "Enabling Monitoring of dm_ifw_sync and dm_aq".
Expose metrics for BRM Java applications such as Batch Controller and REL Daemon. See "Enabling Monitoring of BRM Java Applications".
Expose metrics for Web Services Manager through WebLogic Monitoring Exporter. See "Enabling Monitoring of Web Services Manager".
Configure Prometheus to scrape BRM metric data from the Pushgateway endpoint. See "Configuring Prometheus for BRM Components".
Configure Grafana to display metric data for your BRM components. See "Creating Grafana Dashboards for BRM Components".

Customizing Pushgateway for BRM

By default, Pushgateway exposes BRM metrics at http://localhost:9091/metrics, and Prometheus scrapes the brm_status_tracker and brm_memory_usage metrics from Pushgateway every 5 seconds. However, you can customize where Pushgateway exposes the BRM metrics and how Prometheus scrapes them.

To customize Pushgateway for BRM:

Open the BRM_home/bin/configurations.yaml file in a text editor.
To customize where Pushgateway exposes BRM metrics, edit these keys:
- pushgateway.host: The hostname of the machine on which to deploy Pushgateway. The default is localhost.
- pushgateway.port: The port number for Pushgateway. The default is 9091.
- pushgateway.protocol: The protocol type, such as http or https. The default is http.
To customize the metric names and scrape intervals for Prometheus, modify these keys:
- prometheus.service_monitoring.brm_status_metric_name: The name of the metric for monitoring the current status of BRM services.
- prometheus.service_monitoring.scrape_interval: How frequently Prometheus scrapes the BRM service status metric, in seconds.
- prometheus.memory_monitoring.brm_status_metric_name: The name of metric for monitoring the CPU memory used by BRM services.
- prometheus.memory_monitoring.scrape_interval: How frequently Prometheus scrapes the CPU memory status metric, in seconds.
Save and close the file.

Example 41-1 Customizing the Pushgateway Protocol and Prometheus Scrape Interval

The configurations.yaml entries in this example would change the Pushgateway protocol to https and the Prometheus scrape interval for both metrics to 1 second.

pushgateway:
   host: localhost
   port: 9091
   protocol: "https"

prometheus:
   service_monitoring: 
      brm_status_metric_name: "brm_status_tracker"
      scrape_interval: 1
   memory_monitoring:
      brm_status_metric_name: "brm_memory_usage"
      scrape_interval: 1

Enabling Monitoring of Your CM

To enable monitoring of your CM:

In your CM configuration file (BRM_home/sys/cm/pin.conf), edit the following Perflib-related entries:
- In the fm_perflib_config entry, enter the full path to your fm_perflib.so file:
```
- cm fm_module Path/fm_perflib.so fm_perflib_config - pin
```
- In the perflib_monitor_file entry, enter the full path to your Perflib data file:
```
- perflib_monitor_file perflib_data Path/perflib_data.dat
```
- In the perflib_pin_shlib entry, enter the full path to the shared library in which core BRM functionality for the CMs and applications are defined:
```
- perflib perflib_pin_shlib Path/libcmpin.so
```

In the PERFLIB_home/perf.env file, set the Perflib timing-related entries in Table 41-1.

Table 41-1 Perflib Timing Entries

Entry Name	Description
PERFLIB_VAR_TIME	Whether Perflib tracing is activated immediately: 0: Timing is disabled at startup. 1: Real-time timing is enabled at startup. This is the default. 2: File-based timing is enabled at startup. 3: File-based and real-time timing is enabled at startup.
PERFLIB_VAR_FLIST	Whether Perflib flist tracing is activated immediately: 0: Flist logging is disabled. This is the default. 1: Summary logging is enabled at startup. 2: Full flist logging is enabled at startup.
PERFLIB_VAR_ALARM	Whether Perflib alarm functionality is activated immediately: 0: Alarms are disabled at startup. 1: Alarms are enabled at startup. This is the default.

Entry Name

Description

PERFLIB_VAR_TIME

Whether Perflib tracing is activated immediately:

0: Timing is disabled at startup.
1: Real-time timing is enabled at startup. This is the default.
2: File-based timing is enabled at startup.
3: File-based and real-time timing is enabled at startup.

PERFLIB_VAR_FLIST

Whether Perflib flist tracing is activated immediately:

0: Flist logging is disabled. This is the default.
1: Summary logging is enabled at startup.
2: Full flist logging is enabled at startup.

PERFLIB_VAR_ALARM

Whether Perflib alarm functionality is activated immediately:

0: Alarms are disabled at startup.
1: Alarms are enabled at startup. This is the default.

Start the CM process by running the following command. Ignore the LD_PRELOAD errors.
```
./start_it start_cm
```
Use testnap to run a few opcodes. For example:
```
xop PCM_OP_SEARCH 0x10 1
```
For more information about testnap, see "Using the testnap Utility to Test BRM" in BRM Developer's Guide.

Run pstatus to verify that opcode metrics are getting captured in the Perflib data file:

pstatus -s10 perflib_data.dat

If successful, you should see something similar to this:

Timestamp            Program     Opcode  Opcode Name            Calls    Errors    Records     Elapsed       OpAvg      SysAvg   SysRate/s     PID
---------            -------     ------  -----------            -----    ------    -------     -------       -----      ------   ---------     ---
14/07/2021 04:07:34  testnap        155  PCM_OP_ACT_LOGIN           2         0          0    0.288688    0.144344         inf        0.00    9250
14/07/2021 04:07:34  testnap          3  PCM_OP_READ_OBJ            2         0          2    0.052501    0.026251         inf        0.00    9250
14/07/2021 04:07:34  testnap        156  PCM_OP_ACT_LOGOUT          2         0          0    0.043381    0.021691         inf        0.00    9250

To expose Prometheus metrics for the CM, run the start_monitor script:
```
./start_monitor -P 9090 perflib_data.dat,cm
```

Run the following cURL command to view the exposed metrics:

curl http://localhost:9090/metrics

If successful, you should see something similar to this:

brm_opcode_calls_total{application="cm", opcode="20", opflags="0", program_name="cm", object_type="", opcode_name="PCM_OP_GET_DD"} 6
brm_opcode_errors_total{application="cm", opcode="20", opflags="0", program_name="cm", object_type="", opcode_name="PCM_OP_GET_DD"} 0
brm_opcode_records_total{application="cm", opcode="20", opflags="0", program_name="cm", object_type="", opcode_name="PCM_OP_GET_DD"} 0
brm_opcode_exec_time_total{application="cm", opcode="20", opflags="0", program_name="cm", object_type="", opcode_name="PCM_OP_GET_DD"} 1.371886758
brm_opcode_user_cpu_time_total{application="cm", opcode="20", opflags="0", program_name="cm", object_type="", opcode_name="PCM_OP_GET_DD"} 0.019318000
brm_opcode_system_cpu_time_total{application="cm", opcode="20", opflags="0", program_name="cm", object_type="", opcode_name="PCM_OP_GET_DD"} 0.006629000

Enabling Monitoring of Your Oracle DM

To enable monitoring of your Oracle DM:

In your Oracle DM configuration file (BRM_home/sys/cm/pin.conf), set the perflib_enabled entry to 1:
```
- dm perflib_enabled 1
```
In the PERFLIB_home/perf_dm_oracle.env file, set the Perflib timing-related entries in Table 41-1.
Start the dm_oracle process by running the following command. Ignore the LD_PRELOAD errors.
```
./start_it start_dm_oracle
```
Start testnap and then run a few opcodes. For example:
```
xop PCM_OP_READ_OBJ 0x0040 1
```
For more information about testnap, see "Using the testnap Utility to Test BRM" in BRM Developer's Guide.
Run pstatus to verify that opcode metrics are getting captured in the Perflib data file:
```
pstatus -s10 perflib_dm_oracle.dat
```

After you start the monitoring process, the Oracle DM metrics will be exposed at the following endpoint:

http://localhost:9091/metrics

Enabling Monitoring of dm_ifw_sync and dm_aq

To enable monitoring of your Account Synchronization DM (dm_ifw_sync) and Synchronization Queue DM (dm_aq):

In your dm_ifw_sync configuration file (BRM_home/sys/dm_ifw_sync/pin.conf), set the perflib_enabled entry to 1:
```
- dm perflib_enabled 1
```
In your dm_aq configuration file (BRM_home/sys/dm_aq/pin.conf), set the perflib_enabled entry to 1:
```
- dm perflib_enabled 1
```

Alternatively, you can enable monitoring of dm_oracle, dm_ifw_sync, and dm_aq by setting the following entry to 1 in your Perflib_home/perf_dm.env file:

DM_PERFLIB_ENABLED=1

To temporarily disable the monitoring of dm_oracle, dm_ifw_sync, or dm_aq, set its environment variable to 0 in its respective start script. For example, to temporarily disable the monitoring of dm_ifw_sync, you would modify the following line in bold in the start_dm_ifw_sync file:

if [ -f ${DMPERFLIBENV} ]; then
source ${DMPERFLIBENV}
export PERFLIB_DATA_FILE=$PERFLIB_HOME/perflib_dm_ifw_sync_data.dat
export DM_PERFLIB_ENABLED=0
fi

After you start the monitoring process, the dm_ifw_sync and dm_aq metrics will be exposed at the following endpoint:

http://localhost:3000/metrics

Enabling Monitoring of BRM Java Applications

You use the JMX framework to expose metrics for BRM Java applications, such as Batch Controller, REL daemon, and EAI Java Server (JS), in Prometheus format.

To enable the monitoring of BRM Java applications:

Download the latest JMX Exporter .jar file from https://github.com/prometheus/jmx_exporter.

For a list of compatible versions, see "Additional BRM Software Requirements" in BRM Compatibility Matrix.
Create a configuration file named config.yaml in your BRM_home/bin directory.
Configure the JMX Exporter by adding the parameters defined in https://github.com/prometheus/jmx_exporter#configuration.
To configure BRM Java applications to expose JMX metrics in Prometheus format, edit the component's start script to include the JMX_EXPORTER_OPTS environment variable.

For example, you would expose JMX metrics for EAI JS by adding the following lines to the start_eai_js script:
```
JMX_EXPORTER_OPTS="-javaagent:/scratch/jmx_prometheus_javaagent-version.jar=12347:/scratch/config.yaml" 
$JAVA -Deai_js=1 ${JMX_EXPORTER_OPTS} -mx256m -ms64m -ss1m com.portal.js.JSMain >>${JSLOG} 2>&1 &
```
where version is the JMX Exporter version number.

The JMX metrics in Prometheus format will be available at the /metrics endpoint using the port configured in the component's start script. In this example, the endpoint for EAI JS metrics would be http://localhost:12347/metrics.

Enabling Monitoring of Web Services Manager

You use the WebLogic Monitoring Exporter to expose metrics for Web Services Manager in Prometheus format.

To enable monitoring of Web Services Manager:

Download the latest supported version of WebLogic Monitoring Exporter (getN.N.sh) from https://github.com/oracle/weblogic-monitoring-exporter/releases, where N.N is the version number.

For a list of compatible versions, see "Additional BRM Software Requirements" in BRM Compatibility Matrix.

Edit the wls-exporter-config.yaml file to include the metrics to scrape from BRM. For the list of supported metrics, see "WebLogic-Based Application Metrics".

For example:

metricsNameSnakeCase: true
domainQualifier: true
#restPort: 7001
queries:
- key: name
  keyName: server
  applicationRuntimes:
    key: name
    keyName: app
    componentRuntimes:
      type: WebAppComponentRuntime
      prefix: webapp_config_
      key: name
      values: [deploymentState, contextRoot, sourceInfo, openSessionsHighCount, openSessionsCurrentCount, sessionsOpenedTotalCount, sessionCookieMaxAgeSecs, sessionInvalidationIntervalSecs, sessionTimeoutSecs, singleThreadedServletPoolSize, sessionIDLength, servletReloadCheckSecs, jSPPageCheckSecs]
      servlets:
        prefix: wls_servlet_
        key: servletName
        values: invocationTotalCount
 
- JVMRuntime:
    prefix: wls_jvm_
    key: name
 
- executeQueueRuntimes:
    prefix: wls_socketmuxer_
    key: name
    values: [pendingRequestCurrentCount]
 
- workManagerRuntimes:
    prefix: wls_workmanager_
    key: name
    values: [stuckThreadCount, pendingRequests, completedRequests]
 
- threadPoolRuntime:
    prefix: wls_threadpool_
    key: name
    values: [executeThreadTotalCount, queueLength, stuckThreadCount, hoggingThreadCount]

Run the following command:
```
bash getN.N.sh wls-exporter-config.yaml
```
The wls-exporter.war is downloaded to your current directory.
Deploy the wls-exporter.war in the same WebLogic domain as BrmWebServices.war or infranetwebsvc.war.

WebLogic Monitoring Exporter exposes Web Services Manager metrics in Prometheus format to the http://localhost:8080/wls-exporter/metrics endpoint on the WebLogic Server.

Configuring Prometheus for BRM Components

To configure Prometheus to scrape metric data from your Pushgateway endpoint:

Edit your prometheus.yaml file to include:

The targets to scrape for your components that use Perflib-based monitoring (CM, Oracle DM, dm_aq, and dm_ifw_sync), REL daemon, Batch Controller, and BRM REST Services Manager, and optionally, node exporter.
Prometheus Alertmanager configuration.

For example:

global:  
  scrape_interval: 1s  
  evaluation_interval: 15s

alerting:  
  alertmanagers:  
    - static_configs:    
      - targets:       
        - alertmanager_host:9093 

rule_files:  
  - "first_rules.yaml"  
  - "second_rules.yaml" 

scrape_configs:    
  - job_name: 'prometheus'     
     
    static_configs:    
    - targets: ['localhost:9090','localhost:9091','localhost:3000']   

  - job_name: 'perflib'   

    static_configs:    
    - targets: ['localhost:12345']   

  - job_name: 'rel_daemon'     
    
    static_configs:    
    - targets: ['localhost:12346']   

  - job_name: 'batch_controller'     
    
    static_configs:    
    - targets: ['localhost:12347']

  - job_name: 'brm-rest-services-manager'     
    
    static_configs:    
    - targets: ['localhost:12348']

  - job_name: 'node-exporter'     
    
    static_configs:    
    - targets: ['localhost:12349']

For information about editing this file, see "Prometheus Configuration" in the Prometheus documentation.

Configure the alert rules in Prometheus.

To do so, add alert rules for your components that are similar to the ones shown below to the rules files referenced in prometheus.yaml.

groups:  
  - name: brm-rsm-alert-rules    
     rules:
      - alert: CPU_UsageWarning
        annotations:
          message: CPU has reached 80% utilization
        expr: avg without(cpu) (rate(node_cpu_seconds_total{job="node-exporter", instance="node_exporter_host:node_exporter_port", mode!="idle"}[5m])) > 0.8
        for: 5m
        labels:
          severity: critical
      - alert: Memory_UsageWarning
        annotations:
          message: Memory has reached 80% utilization
        expr: node_memory_MemTotal_bytes{job="node-exporter", instance="node_exporter_host:node_exporter_port"} 
              - node_memory_MemFree_bytes{job="node-exporter", instance="node_exporter_host:node_exporter_port"} 
              - node_memory_Cached_bytes{job="node-exporter",instance="node_exporter_host:node_exporter_port"} 
              - node_memory_Buffers_bytes{job="node-exporter", instance="1"} > 22322927872
        for: 5m
        labels:
          severity: critical

For more information about defining alert rules, see "Alerting Rules" in the Prometheus documentation.

Note:

You can also configure alert rules and add or remove email recipients in the Grafana user interface. See "Legacy alerting" in the Grafana documentation for more information.

Restart Prometheus by running this command:
```
./prometheus --config.file=prometheus.yaml
```
Start monitoring BRM services for their current status by running this command:
```
start_service_monitor
```
Start monitoring the CPU and memory used by BRM services by running this command:
```
start_memory_monitor
```

To ensure that you configured Prometheus correctly and that it has started scraping BRM metric data, go to the following URL: http://localhost:9090/graph.

Note:

If you need to stop the service and CPU monitors for any reason, run the following commands:

stop_service_monitor
stop_memory_monitor

Creating Grafana Dashboards for BRM Components

You can create a dashboard in Grafana for displaying the metric data for your BRM components.

Alternatively, you can use the sample dashboards that are included with the BRM SDK package. To use the sample dashboards, import the JSON files from the BRM_home/PortalDevKit/source/samples/dashboards directory into Grafana. Table 41-2 describes each sample dashboard.

Table 41-2 Sample Grafana Dashboards

File Name	Description
ocbrm-batch-controller-dashboard.json	Allows you to view JVM-related metrics for the Batch Controller.
ocbrm-cm-dashboard.json	Allows you to view CPU and opcode-level metrics for the CM.
ocbrm-dm-oracle-dashboard.json	Allows you to view CPU and opcode-level metrics for the Oracle DM.
ocbrm-dm-oracle-shm-dashboard.json	Allows you to view shared memory, front-end, and back-end metrics for the Oracle DM.
ocbrm-eai-js-dashboard.json	Allows you to view JVM and opcode-related metrics for the EAI JS.
ocbrm-services-dashboard.json	Allows you to view metrics regarding the status and memory usage of BRM services.
ocrsm-rsm-dashboard.json	Allows you to view standard Helidon MP monitoring metrics for BRM REST Services Manager.

For information about importing dashboards into Grafana, see "Export and Import" in the Grafana Dashboards documentation.

BRM Opcode Metrics

Table 41-3 describes the metrics for retrieving runtime information for BRM opcodes.

Table 41-3 BRM Opcode Metrics

Metric Name	Metric Type	Description	BRM Service
brm_opcode_calls_total	Counter	The total number of calls to a BRM opcode.	cm dm_oracle dm_ifw_sync dm_aq
brm_opcode_errors_total	Counter	The total number of errors when executing a BRM opcode.	cm dm_oracle dm_ifw_sync dm_aq
brm_opcode_exec_time_total	Counter	The total time taken to execute a BRM opcode.	cm dm_oracle dm_ifw_sync dm_aq
brm_opcode_user_cpu_time_total	Counter	The total CPU time taken to execute the BRM opcode in user space.	cm dm_oracle dm_ifw_sync dm_aq
brm_opcode_system_cpu_time_total	Counter	The total CPU time taken to execute the BRM opcode in OS kernel space.	cm dm_oracle dm_ifw_sync dm_aq
brm_opcode_records_total	Counter	The total number of records returned by the BRM opcode execution.	cm dm_oracle dm_ifw_sync dm_aq
brm_dmo_shared_memory_used_current	Gauge	The total number of shared memory blocks currently used by dm_oracle.	cm
brm_dmo_shared_memory_used_max	Counter	The maximum number of shared memory blocks currently used by dm_oracle.	cm
brm_dmo_shared_memory_free_current	Gauge	The total number of free shared memory blocks available to dm_oracle.	cm
brm_dmo_shared_memory_hwm	Gauge	The shared memory high-water mark for dm_oracle.	cm
brm_dmo_shared_memory_bigsize_used_max	Counter	The maximum big size shared memory used by dm_oracle in bytes.	cm
brm_dmo_shared_memory_bigsize_used_current	Gauge	The total big size shared memory used by dm_oracle in bytes.	cm
brm_dmo_shared_memory_bigsize_hwm	Gauge	The big size shared memory high-water mark for dm_oracle in bytes.	cm
brm_dmo_front_end_connections_total	Gauge	The total number of connections for a dm_oracle front-end process.	cm
brm_dmo_front_end_max_connections_total	Counter	The maximum number of connections for a dm_oracle front-end process.	cm
brm_dmo_front_end_trans_done_total	Counter	The total number of transactions handled by the dm_oracle front-end process.	cm
brm_dmo_front_end_ops_done_total	Counter	The total number of operations handled by the dm_oracle front-end process.	cm
brm_dmo_back_end_ops_done_total	Counter	The total number of operations done by the dm_oracle back-end process.	cm
brm_dmo_back_end_ops_error_total	Counter	The total number of errors encountered by the dm_oracle back-end process.	cm
brm_dmo_back_end_trans_done_total	Counter	The total number of transactions handled by the dm_oracle back-end process.	cm
brm_dmo_back_end_trans_error_total	Counter	The total number of transaction errors in the dm_oracle back-end process.	cm
com_portal_js_JSMetrics_CurrentConnectionCount	Counter	The current concurrent connection to the Java Server from the CM.	eai_js
com_portal_js_JSMetrics_MaxConnectionCount	Counter	The maximum concurrent connections to the Java Server from the CM.	eai_js
com_portal_js_JSMetrics_SuccessfulOpcodeCount	Counter	The count of opcodes called from the CM, the execution of which succeeded in JS.	eai_js
com_portal_js_JSMetrics_FailedOpcodeCount	Counter	The count of opcodes called from the CM, the execution of which failed in JS.	eai_js
com_portal_js_JSMetrics_TotalOpcodeCount	Counter	The total count of opcodes called from the CM.	eai_js
com_portal_js_JSMetrics_TotalOpcodeExecutionTime	Counter	The total time taken in milliseconds across all opcodes.	eai_js