Monitoring BRM Cloud Native Services

10 Monitoring BRM Cloud Native Services

Learn how to monitor your Oracle Communications Billing and Revenue Management (BRM) cloud native services by using Prometheus and Grafana.

Topics in this document:

About Monitoring BRM Cloud Native Services

You can set up monitoring for the following BRM cloud native services:

CM
Oracle DM
Oracle DM shared memory, front-end processes, and back-end processes
Account Synchronization DM
Synchronization Queue DM
BRM Java Applications: RE Loader Daemon, Batch Controller, and EAI Java Server (JS)
Web Services Manager
BRM database

The metrics for the database are generated by OracleDB_exporter, and the metrics for all other BRM services are generated directly by BRM cloud native. You use Prometheus to scrape and store the metric data, and then use Grafana to display the data in a graphical dashboard.

Setting Up Monitoring for BRM Cloud Native Services

To set up monitoring for BRM cloud native services:

Deploy Prometheus in your Kubernetes Cluster in one of the following ways:
- Deploy a standalone version of Prometheus in your cloud native environment. See "Installation" in the Prometheus documentation.
- Deploy Prometheus Operator. See "prometheus-operator" on the GitHub website.
For the list of compatible software versions, see "BRM Cloud Native Deployment Software Compatibility" in BRM Compatibility Matrix.
Install Grafana. See "Install Grafana" in the Grafana documentation.

For the list of compatible software versions, see "BRM Cloud Native Deployment Software Compatibility" in BRM Compatibility Matrix.
Configure BRM cloud native to collect metrics for its components and export them to Prometheus. See "Configuring BRM Cloud Native to Collect Metrics".
Configure how Perflib generates metric data for BRM opcodes. See "Configuring Perflib for BRM Opcode Monitoring".
Configure OracleDB_exporter to scrape metrics from your Oracle database and export them to Prometheus. See "Configuring OracleDB_Exporter to Scrape Database Metrics".
Create Grafana Dashboards for viewing your metric data. See "Configuring Grafana for BRM Cloud Native".

Configuring BRM Cloud Native to Collect Metrics

To configure BRM cloud native to collect metrics for its components and then expose them in Prometheus format:

In your override-values.yaml file for oc-cn-helm-chart, set the monitoring.prometheus.operator.enable key to one of the following:
- true if you are using Prometheus Operator.
- false if you are using a standalone version of Prometheus. This is the default.
To collect metrics for the CM, do the following:
1. In your override-values.yaml file for oc-cn-helm-chart, set the ocbrm.cm.deployment.perflib_enabled key to true.
2. In the oms-cm-perflib-config ConfigMap, review and update the Perflib configuration. For information about the possible values, see "Configuring Perflib for BRM Opcode Monitoring".
3. In the oms-cm-config ConfigMap, review and update the Perflib configuration. For information about the possible values, see "Configuring Perflib for BRM Opcode Monitoring".
To collect metrics for Oracle DM shared memory, front-end processes, and back-end processes, do the following:

In the oms-cm-perflib-config ConfigMap, set the data.ENABLE_STATUS_DM_METRICS key to true.
To collect metrics for the dm-oracle pod, do the following:
- In your override-values.yaml file for oc-cn-helm-chart, set the ocbrm.dm_oracle.deployment.perflib_enabled key to true.
- In the oms-dm-oracle-perflib-config ConfigMap, review and update the Perflib configuration. For information about the possible values, see "Configuring Perflib for BRM Opcode Monitoring".
- In the oms-dm-oracle-config ConfigMap, review and update the Perflib configuration. For information about the possible values, see "Configuring Perflib for BRM Opcode Monitoring".
To collect metrics for dm_ifw_sync and dm_aq, do the following:

In your override-values.yaml file for oc-cn-helm-chart, set these keys:
- ocbrm.dm_ifw_sync.deployment.perflib_enabled: Set this key to true.
- ocbrm.dm_aq.deployment.perflib_enabled: Set this key to true.
To collect metrics for the BRM Java applications, REL Daemon, Batch Controller, and EAI Java Server, do the following:

In your override-values.yaml file for oc-cn-helm-chart, set the monitoring.prometheus.jmx_exporter.enable key to true.
To collect metrics for Web Services Manager, do the following:

In your override-values.yaml file for oc-cn-helm-chart, set the ocbrm.wsm.deployment.monitoring.isEnabled key to true.
To persist the Perflib timing files in your BRM database, do the following:
1. In your override-values.yaml file for oc-cn-helm-chart, set the ocbrm.perflib.deployment.persistPerlibLogs key to true.
2. Check the values of these Perflib timing-related environment variables in your oms-cm-perflib-config and oms-dm-oracle-perflib-config ConfigMaps: PERFLIB_VAR_TIME, PERFLIB_VAR_FLIST, and PERFLIB_VAR_ALARM. See Table 10-1 for more information.
Run the helm upgrade command to update the BRM Helm release:
```
helm upgrade BrmReleaseName oc-cn-helm-chart --values OverrideValuesFile -n BrmNameSpace
```
where:
- BrmReleaseName is the release name for oc-cn-helm-chart and is used to track this installation instance.
- OverrideValuesFile is the file name and path to your override-values.yaml file.
- BrmNameSpace is the name space in which to create BRM Kubernetes objects for the BRM Helm chart.

After you update the Helm release, metrics will be exposed to Prometheus through the CM pod at the /metrics endpoint with the following ports:

CM: Port 11961
Oracle DM shared memory, back-end processes, and front-end processes: Port 11961 or Port 31961
Oracle DM: Port 12951
dm_ifw_sync and dm_aq: Port 12951

Example: Enabling Monitoring for All BRM Components

This shows sample override-values.yaml entries for enabling the collection of the following metrics for Prometheus:

CM
Oracle DM
Oracle DM shared memory, front-end processes, and back-end processes
Account Synchronization DM (dm_ifw_sync)
Synchronization Queue DM (dm_aq)
Web Services Manager
BRM Java applications: REL Daemon, Batch Controller, and EAI Java Server

It also configures BRM to persist the Perflib timing files in your BRM database.

monitoring:
    prometheus:
    operator:
        enable: false
    jmx_exporter:
        enable: true
ocbrm:
    cm:
        deployment:
            perflib_enabled: true
    dm_oracle:    
        deployment:        
            perflib_enabled: true
    perflib:
        deployment:
            persistPerflibLogs: true
    wsm:
        deployment:
            monitoring:
                isEnabled: true
    dm_ifw_sync:
        deployment:
            perflib_enabled: true
    dm_aq:
        deployment:
            perflib_enabled: true

Configuring Perflib for BRM Opcode Monitoring

The BRM cloud native deployment package includes the BRM Performance Profiling Toolkit (Perflib), which the Connection Manager (CM), Oracle Data Manager (DM), Synchronization Queue DM, and Account Synchronization DM depend on for generating and exposing BRM opcode metrics.

You configure how Perflib generates the metric data by setting environment variables in the following:

For the CM: The oms-cm-perflib-config ConfigMap
For the DMs: The oms-dm-oracle-perflib-config ConfigMap

Table 10-1 describes the environment variables you can use to configure Perflib for the CM and DMs.

Table 10-1 Perflib Environment Variables

Environment Variable	Description
PERFLIB_ENABLED	Whether to enable opcode monitoring with Perflib. 0: Disables Perflib. 1: Enables Perflib. This is the default.
PERFLIB_HOME	The location of the Perflib Toolkit.
PERFLIB_DEBUG	The debug log level for Perflib. 0: Turn off debugging. This is the default. 1: Log summary information to stderr. 2: Log detailed opcode execution information to stderr. 4: Log trace information to stderr.
PERFLIB_MAX_LOG_SIZE	The maximum number of opcodes that can be logged in any one log file. You can use this to prevent excessively large log files if detailed tracing is used for long periods. 0: A single file is created with no limits. This is the default. Number: Defines the maximum number of opcodes that can be tracked before a new file is opened.
PERFLIB_AGGREGATION_PERIOD	The amount of time that data is recorded into a bucket, in minutes or hours. When the amount of time expires, Perflib creates a new bucket. For example, each bucket could record an hour's worth of data, 2 hours of data, or 5 minutes of data. The allowed values for hours: 1h, 2h, 3h, 4h, 6h, 8h, 12h, or 24h. The allowed values for minutes: 1m, 2m, 3m, 4m, 5m, 6m, 10m, 12m, 15m, 30m, or 60m. The default is 1h.
PERFLIB_FLUSH_FREQUENCY	How frequently, in seconds, to flush in-memory aggregation data to trace files on disk. The default is 3600 (1 hour).
PERFLIB_LOG_SINGLE_FILE	The prefix to use for tracing filenames, such as cm_batch, cm_aia, or cm_rt. This allows you to distinguish the trace files for each type of application. The default is perf_log.
PERFLIB_PIN_SHLIB	The full path of the shared library that contains the BRM opcode functions being interposed. This environment variable is used for the CM only. The default is /oms/lib/libcmpin.so.
PERFLIB_DATA_FILE	The full path name of the memory-mapped data file that is used by Perflib to store control variables and real-time trace data. The following special formatting characters can be used as part of the data file name and is substituted by Perflib when the data file is created: %p: Substituted with the process ID of the process using Perflib. %t: Substituted with the current time stamp (as a UNIX time number). %T: Substituted with the current timestamp (as a YYYYMMDDHHMMSS string). The default is /oms_logs/perflib_data.dat.
PERFLIB_LOG_DIR	The directory where trace output is written. The default is /oms_logs.
PERFLIB_DATA_FILE_RESET	Whether real-time tracing data and variable settings are maintained between application executions. This enables statistics to continue to accumulate across an application restart. Y: All variables and trace data are destroyed when the application starts. This is the default. N: The existing data is retained.
PERFLIB_VAR_TIME	Whether the Perflib tracing is activated immediately. 0: Timing is disabled at startup. 1: Real-time timing is enabled at startup. This is the default. 2: File-based timing is enabled at startup. 3: File-based and real-time timing is enabled at startup.
PERFLIB_VAR_FLIST	Whether the Perflib flist tracing is activated immediately. 0: Flist logging is disabled. This is the default. 1: Summary logging is enabled at startup. 2: Full flist logging is enabled at startup.
PERFLIB_VAR_ALARM	Whether the Perflib alarm functionality is activated immediately. 0: Alarms are disabled at startup. 1: Alarms are enabled at startup. This is the default.
PERFLIB_AUTO_FLUSH	Whether the CM flushes data regularly (with the frequency set by PERFLIB_FLUSH_FREQUENCY). 0: Disables flushing. In this case, if a CM does not receive any opcode requests, flushing is not performed until the CM terminates or an opcode arrives. This is the default. 1: Enables intra-opcode flushing. That is, flushing occurs between different top-level opcodes. 2: Enables full flushing. That is, flushing occurs within an opcode without waiting for it to complete. This can be useful when tracing very long-running opcodes. This environment variable is used for the CM only.
PERFLIB_COLLECT_CPU_USAGE	Whether user and system CPU usage is tracked at the opcode level, allowing CPU hogs to be identified more easily. 0: Collection is disabled. Positive value: CPU data is collected for opcodes down to that level. For example, setting it to 1 would collect CPU data for the top-level opcodes, while setting it to 2 would collect data for both the top-level opcodes and all the children.
PERFLIB_LOCK_METHOD	The method used to lock between processes. 0: Use POSIX shared-memory mutexes. This is the default. 1: Use file-based advisory locks.
PERFLIB_ASYNC_FLUSHING	Whether flushing to the trace file from memory is done within the opcode execution, or asynchronously in a separate thread. 0: Flush data to the trace file within the opcode execution. 1: Flush data to the trace file in a separate processing thread. This is the default.
PERFLIB_TRACE_OBJECT_TYPE	Whether Perflib records the BRM object type associated with different database operations, such as PCM_OP_SEARCH, PCM_OP_READ_FLDS, PCM_OP_WRITE_FLDS, and so on. This can help you understand which objects are being read or written most frequently and how much time is being spent for different objects. For PCM_OP_EXEC_SPROC, the latest versions of Perflib will record the stored procedure name that was run. 0: Do not collect object type data. 1: Collect object type data and record it in realtime or batch trace files. This is the default.
PERFLIB_GROUP_TRANSACTIONS	Whether Perflib tracks BRM transactions as a single unit. The opcodes run as part of a transaction are grouped under a virtual opcode, TRANSACTION_GROUP. 0: Do not group transactions. This is the default. 1: Group transactions.
PERFLIB_LOG_MAX_SINGLE_FILE_SIZE	The threshold file size at which a new single log file is created (it only works with the PERFLIB_LOG_SINGLE_FILE parameter). Whenever a flush of aggregate timing data causes the configured size to be exceeded, the log file is renamed and a new file is created for any subsequent data. The size is expressed in bytes. For example, 5242880 is equivalent to 512 Mb. If the parameter is not defined or set to 0, the file size defaults to 1 Gb.
PERFLIB_ALARM_CONFIG_FILE	How Perflib handles alarms. Perflib provides an example alarm file, alarm_config.lst, which shows how operation-specific configurations may be done.
PERFLIB_ALARM	The general alarm that triggers the logging of information regarding any opcode call that exceeds a particular elapsed time.
ENABLE_STATUS_DM_METRICS	Whether Prometheus generates metrics for the Oracle DM shared memory, front-end processes, and back-end processes. true: Enables DM shared memory, front end, and back end metrics in Prometheus format. false: Disables DM shared memory, front end, and back end metrics. This is the default.
PERFLIB_LOG_CORRELATION_IN_CALL_STACK	Whether Perflib adds the BRM correlation ID to call-stack logs. 0: Do not add correlation IDs to call-stack traces. 1: Add correlation IDs to call-stack traces. This is the default.
PERFLIB_FLIST_LOG_TO_STDOUT	Instructs Perflib to generate flist logs to standard output. 0: Writes opcode flists and stack trace logs to files. This is the default. 1: Writes opcode flists and stack trace logs to STDOUT.

Configuring OracleDB_Exporter to Scrape Database Metrics

You use OracleDB_Exporter to scrape metrics from your BRM database and export them to Prometheus. Prometheus can then read the metrics and display them in the a graphic format in Grafana.

To configure OracleDB_Exporter to scrape and export metrics from your BRM database:

Download and install the following external applications:
- OracleDB_exporter. See https://github.com/iamseth/oracledb_exporter on the GitHub website.
- Oracle database client.
For the list of compatible software versions, see "BRM Cloud Native Deployment Software Compatibility" in BRM Compatibility Matrix.
Specify the BRM database metrics to scrape and export in the Exporter_home/default-metrics.toml file, where Exporter_home is the directory in which you deployed OracleDB_Exporter.

For more information, see https://github.com/iamseth/oracledb_exporter/blob/master/README.md on the GitHub website.
Open your override-values.yaml file for Prometheus.
Configure Prometheus to fetch performance data from OracleDB_exporter.

To do so, copy and paste the following into your override-values.yaml file, replacing hostname with the host name of the machine on which OracleDB_exporter is deployed:
```
static_configs:
- targets: [hostname:33775']
- job_name: 'oracledbexporter'
static_configs:
- targets: ['hostname:9161']
```
Save and close your file.
Run the helm upgrade command to update your Prometheus Helm chart release.

The metrics for your BRM database are available at http://hostname:9161/metrics.

Configuring Grafana for BRM Cloud Native

You can create a dashboard in Grafana for displaying the metric data for your BRM cloud native services.

Alternatively, you can use the sample dashboards that are included in the oc-cn-docker-files-12.0.0.x.0.tgz package. To use the sample dashboards, import the dashboard files from the oc-cn-docker-files/samples/monitoring/ directory into Grafana. See "Export and Import" in the Grafana Dashboards documentation for more information.

Table 10-2 describes each sample dashboard.

Table 10-2 Sample Grafana Dashboards

File Name	Description
oc-cn-applications-dashboard.json	Provides a high-level view of all BRM components that have been installed, grouped by whether they are running or have failed.
ocbrm-batch-controller-dashboard.json	Allows you to view JVM-related metrics for the Batch Controller.
ocbrm-cm-dashboard.json	Allows you to view CPU and opcode-level metrics for the CM.
ocbrm-dm-ifw-dashboard.json	Allows you to view opcode-level, CPU usage, and memory usage metrics for the Account Synchronization DM.
ocbrm-dm-oracle-dashboard.json	Allows you to view opcode-level, CPU usage, and memory usage metrics for the Oracle DM.
ocbrm-dm-oracle-shm-dashboard.json	Allows you to view shared memory, front-end process, and back-end process metrics for the Oracle DM.
ocbrm-eai-js-dashboard.json	Allows you to view JVM and opcode-related metrics for the EAI JS.
ocbrm-overview-dashboard.json	Allows you to view metrics for BRM services at the pod, container, network, and input-output level.
ocbrm-rel-dashboard.json	Allows you to view JVM-related metrics for Rated Event (RE) Loader.
ocbrm-wsm-weblogic-server-dashboard.json	Allows you to view metrics for Web Services Manager.

Note:

For the sample dashboard to work properly, the datasource name for the WebLogic Domain must be Prometheus.

You can also configure Grafana to send alerts to your dashboard, an email address, or Slack when a problem occurs. For example, you could configure Grafana to send an alert when an opcode exceeds a specified number of errors. For information about setting up alerts, see "Grafana Alerts" in the Grafana documentation.

BRM Opcode Metric Group

Use the BRM opcode metric group to retrieve runtime information for BRM opcodes. Table 10-3 lists the metrics in this group.

Table 10-3 BRM Opcode Metrics

Metric Name	Metric Type	Metric Description	Pod
brm_opcode_calls_total	Counter	The total number of calls for a BRM opcode.	cm dm-oracle dm-ifw-sync dm-aq
brm_opcode_errors_total	Counter	The total number of errors when executing a BRM opcode.	cm dm-oracle dm-ifw-sync dm-aq
brm_opcode_exec_time_total	Counter	The total time taken to run a BRM opcode.	cm dm-oracle dm-ifw-sync dm-aq
brm_opcode_user_cpu_time_total	Counter	The total CPU time taken to run the BRM opcode in user space.	cm dm-oracle dm-ifw-sync dm-aq
brm_opcode_system_cpu_time_total	Counter	The total CPU time taken to run the BRM opcode in OS Kernel space.	cm dm-oracle dm-ifw-sync dm-aq
brm_opcode_records_total	Counter	The total number of records returned by the BRM opcode execution.	cm dm-oracle dm-ifw-sync dm-aq
brm_dmo_shared_memory_used_current	Gauge	The total number of shared memory blocks currently used by dm_oracle.	cm
brm_dmo_shared_memory_used_max	Counter	The maximum number of shared memory blocks currently used by dm_oracle.	cm
brm_dmo_shared_memory_free_current	Gauge	The total number of free shared memory blocks available to dm_oracle.	cm
brm_dmo_shared_memory_hwm	Gauge	The shared memory high water mark for dm_oracle.	cm
brm_dmo_shared_memory_bigsize_used_max	Counter	The maximum big size shared memory used by dm_oracle in bytes.	cm
brm_dmo_shared_memory_bigsize_used_current	Gauge	The total big size shared memory used by dm_oracle in bytes.	cm
brm_dmo_shared_memory_bigsize_hwm	Gauge	Big size shared memory high water mark for dm_oracle in bytes.	cm
brm_dmo_front_end_connections_total	Gauge	The total number of connections for a dm_oracle front-end process.	cm
brm_dmo_front_end_max_connections_total	Counter	The maximum number of connections for a dm_oracle front-end process.	cm
brm_dmo_front_end_trans_done_total	Counter	The total number of transactions handled by the dm_oracle front-end process.	cm
brm_dmo_front_end_ops_done_total	Counter	The total number of operations handled by the dm_oracle front-end process.	cm
brm_dmo_back_end_ops_done_total	Counter	The total number of operations done by the dm_oracle back-end process.	cm
brm_dmo_back_end_ops_error_total	Counter	The total number of errors encountered by the dm_oracle back-end process.	cm
brm_dmo_back_end_trans_done_total	Counter	The total number of transactions handled by the dm_oracle back-end process.	cm
brm_dmo_back_end_trans_error_total	Counter	The total number of transaction errors in the dm_oracle back-end process.	cm
com_portal_js_JSMetrics_CurrentConnectionCount	Counter	The current count of concurrent connections to the Java Server from the CM.	cm (eai-java-server)
com_portal_js_JSMetrics_MaxConnectionCount	Counter	The maximum concurrent connections to the Java Server from the CM.	cm (eai-java-server)
com_portal_js_JSMetrics_SuccessfulOpcodeCount	Counter	The count of opcodes called from the CM, the execution of which succeeded in the Java Server.	cm (eai-java-server)
com_portal_js_JSMetrics_FailedOpcodeCount	Counter	The count of opcodes called from the CM, the execution of which failed in the Java Server.	cm (eai-java-server)
com_portal_js_JSMetrics_TotalOpcodeCount	Counter	The total count of opcodes called from the CM.	cm (eai-java-server)
com_portal_js_JSMetrics_TotalOpcodeExecutionTime	Counter	The total time taken in, milliseconds, across all opcodes.	cm (eai-java-server)