9 Monitoring Network Bridge Processes

You can monitor system processes, such as memory and thread usage, in your Oracle Communications Network Bridge components in a cloud native environment.

About Monitoring Network Bridge Cloud Native

You can set up monitoring of your Network Bridge cloud native components using a Kubernetes Service Monitor, Prometheus Operator, and Grafana Dashboards. Service Monitor exposes JVM and application metric data through a single endpoint in an OpenMetrics/Prometheus exposition format. Prometheus then scrapes the metrics and stores them for analysis and monitoring through the Grafana Dashboards.

Network Bridge cloud native exposes metric data for the following components:

  • Mediation component

  • Egress component

  • HTTP to Diameter Adapter component

  • Diameter to HTTP Adapter component

  • Diameter Proxy component

Setting Up Monitoring of Network Bridge Components

Setting up monitoring of Network Bridge cloud native components involves the following high-level tasks:

  1. Deploying the following prerequisite software on your Network Bridge cloud native environment:

    For the list of compatible versions, see "Network Bridge Cloud Native Software Compatibility" in CCS Compatibility Matrix.

  2. Enabling the Network Bridge Service Monitor and Grafana Dashboards.

Enabling the Network Bridge Service Monitor

By default, the Network Bridge Service Monitor and Grafana Dashboards are disabled. You must enable both if you want to monitor your system's processes using Prometheus and Grafana.

To enable the Service Monitor and Grafana Dashboards:

  1. Create an override-values.yaml file for the oc-ccs-helm-chart-version Helm chart.

  2. Enable the Network Bridge Service Monitor and Grafana Dashboards by setting these keys:

    • serviceMonitorEnabled: Set this to true.

    • grafanaDashboardsEnabled: Set this to true.

    • grafanaNamespace: Set this to the namespace in which to deploy the Grafana Dashboards.

    • serviceMonitor.namespace: Set this to the namespace in which to deploy the Service Monitor CRD.

  3. Optionally, modify the default settings for Grafana:

    • granfanaDashboards.labels.grafana_dashboard: Specify the labels to add to the Grafana CRDs. This helps Grafana discover the dashboards.

    • granfanaDashboards.labels.release: Set this to the release name in which Grafana is deployed.

    • granfanaDashboards.annotations.k8s-sidecar-target-directory: Set this to the directory for the Grafana sidecar.

  4. Save and close your override-values.yaml file.

  5. Run the helm upgrade command to update your Network Bridge Helm release:

    helm upgrade NbReleaseName oc-ccs-helm-chart-version --namespace NbNameSpace --values override-values.yaml

    where NbReleaseName is the release name for Network Bridge, and NbNameSpace is the namespace in which to create Kubernetes objects for the Network Bridge Helm chart.

Network Bridge Cloud Native Metrics

Network Bridge cloud native collects metrics in the following groups to produce data for monitoring your components:

Mediation and REST Proxy Metrics

The Mediation and REST Proxy group contains standard metrics about the central processing unit (CPU) and memory utilization of JVMs, which are members of the Network Bridge grid. It also contains metrics for tracking the processing performance of requests to and responses from the Mediation, Egress, and REST Proxy components. Table 9-1 lists the metrics in this group.

Table 9-1 Mediation and REST Proxy Metrics

Metric Name Type Description

jvm_buffer_count_buffers

Gauge

Contains the estimated number buffers in the JVM memory pool.

jvm_buffer_memory_used_bytes

Gauge

Contains an estimated amount of memory the JVM is using for the buffer pool.

jvm_buffer_total_capacity_bytes

Gauge

Contains the estimated total capacity of the buffers in this pool.

jvm_gc_live_data_size_bytes

Gauge Contains the size, in bytes, of the long-lived heap memory pool after reclamation.

jvm_gc_max_data_size_bytes

Gauge

Contains the maximum size of the long-lived heap memory pool.

jvm_gc_memory_allocated_bytes_total

Counter

Tracks the total size of increases to the young heap memory after one GC to before the next one.

jvm_gc_memory_promoted_bytes_total

Counter

Tracks the total size of incremental increases to the old generation memory pool from before GC to after GC.

jvm_gc_pause_seconds

Summary

Contains information about the time spent in GC pause.

jvm_gc_pause_seconds_max

Gauge

Contains the maximum amount of time spent in GC pause.

jvm_memory_committed_bytes

Gauge

Contains the amount of memory, in bytes, that is committed for the JVM to use.

jvm_memory_max_bytes

Gauge

Contains the maximum amount of memory, in bytes, that can be used for memory management.

jvm_memory_used_bytes

Gauge

Contains the amount of memory used, in bytes.

jvm_threads_daemon_threads

Gauge

Contains the current number of live daemon threads.

jvm_threads_live_threads

Gauge

Contains the current number of live threads, including both daemon and non-daemon threads.

jvm_threads_peak_threads

Gauge

Contains the peak live thread count since the JVM started or the peak was reset.

jvm_threads_states_threads

Gauge

Contains the current number of threads in the NEW state.

log4j2_events_total

Counter

Tracks the total number of fatal-level log events.

nb_service_mutation_execution_seconds

Histogram

Contains the time taken to perform a mutation.

nb_service_mutation_execution_seconds_max

Gauge

Contains the time taken to perform a mutation.

nb_service_processor_circuit_breaker_state

Gauge

Contains the current state of the circuit breaker: CLOSED (0.0), HALF_OPEN (1.0), or OPEN (2.0).

ccs_service_processor_failed_total

Counter

Tracks the total number of failed events.

nb_service_processor_retries_total

Counter

Tracks the total number of retries.

ccs_service_processor_seconds

Histogram

Contains information about the time taken to process the request.

ccs_service_processor_seconds_max

Gauge

Contains the time taken to process the request.

nb_service_request_failed_total

Counter

Tracks the total number of failed mediation requests.

nb_service_request_seconds

Histogram

Provides information about the total amount of time to process the request.

nb_service_request_seconds_max

Gauge

Contains the total amount of time to process the request.

nb_service_rule_matches_total

Counter

Tracks the number of times a rule has been matched.

process_cpu_usage

Gauge

Contains the recent CPU usage for the JVM process.

process_files_max_files

Gauge

Contains the maximum number of file descriptors.

process_files_open_files

Gauge

Contains the number of open file descriptors.

process_start_time_seconds

Gauge

Contains the start time of the process since the UNIX epoch time.

process_uptime_seconds

Gauge

Contains the JVM's total amount of uptime.

system_cpu_count

Gauge

Contains the number of processors available to the JVM.

system_cpu_usage

Gauge

Contains the recent CPU usage for the entire system.

system_load_average_1m

Gauge

Contains the total number of runnable entities queued to available processors, and the number of runnable entities running on the available processors averaged over a period of time.

Diameter Adapter Metrics

The Diameter Adapter Metrics group contains standard metrics about the CPU and memory utilization of JVMs and the processing performance of requests to and responses from the HTTP-to-Diameter Adapter and Diameter-to-HTTP Adapter components. Table 9-2 lists the metrics in this group.

Table 9-2 Diameter Adapter Metrics

Metric Type Description

jvm_buffer_count_buffers

Gauge

Contains the estimated number of buffers in the JMV memory pool.

jvm_buffer_memory_used_bytes

Gauge

Contains an estimated amount of memory the JVM is using for the buffer pool.

jvm_buffer_total_capacity_bytes

Gauge

Contains the estimated total capacity of the buffers in this pool.

jvm_gc_live_data_size_bytes

Gauge

Contains the size, in bytes, of the long-lived heap memory pool after reclamation.

jvm_gc_max_data_size_bytes

Gauge

Contains the maximum size of the long-lived heap memory pool.

jvm_gc_memory_allocated_bytes_total

Counter

Tracks the total size of increases to the young heap memory after one GC to before the next one.

jvm_gc_memory_promoted_bytes_total

Counter

Tracks the total size of incremental increases to the old generation memory pool from before GC to after GC.

jvm_gc_pause_seconds

Summary

Contains information about the time spent in GC pause.

jvm_gc_pause_seconds_max

Gauge

Contains the maximum amount of time spent in GC pause.

jvm_memory_committed_bytes

Gauge

Contains the amount of memory, in bytes, that is committed for the JVM to use.

jvm_memory_max_bytes

Gauge

Contains the maximum amount of memory, in bytes, that can be used for memory management.

jvm_memory_used_bytes

Gauge

Contains the amount of memory used, in bytes.

jvm_threads_daemon_threads

Gauge

Contains the current number of live daemon threads.

jvm_threads_live_threads

Gauge

Contains the current number of live threads, including both daemon and non-daemon threads.

jvm_threads_peak_threads

Gauge

Contains the peak live thread count since the JVM started or the peak was reset.

jvm_threads_states_threads

Gauge

Contains the current number of threads in the NEW state.

log4j2_events_total

Counter

Tracks the total number of fatal-level log events.

nb_service_processor_failed_total

Counter

Tracks the total number of failed events.

nb_service_processor_retries_total

Counter

Tracks the total number of processor retries.

nb_service_processor_seconds

Histogram

Provides information about the time taken to process the request.

nb_service_processor_seconds_max

Gauge

Contains the time taken to process the request.

nb_service_request_failed_total

Counter

Tracks the total number of failed events.

nb_service_request_seconds

Histogram

Tracks the delta between the Usage UsageDate and the time the service received the event.

nb_service_request_seconds_max

Gauge

Tracks the delta between the Usage UsageDate and the time the service received the event.

process_cpu_usage

Gauge

Contains the recent CPU usage for the JVM process.

process_files_max_files

Gauge

Contains the maximum number of file descriptors.

process_files_open_files

Gauge

Contains the number of open file descriptors.

process_start_time_seconds

Gauge

Contains the start time of the process since the UNIX epoch time.

process_uptime_seconds

Gauge

Contains the JVM's total amount of uptime.

system_cpu_count

Gauge

Contains the number of processors available to the JVM.

system_cpu_usage

Gauge

Contains the recent CPU usage for the entire system.

system_load_average_1m

Gauge

Contains the total number of runnable entities queued to available processors, and the number of runnable entities running on the available processors averaged over a period of time.

Diameter Proxy Metrics

The Diameter Proxy group contains standard metrics about the CPU and memory utilization of JVMs and the processing performance of requests to and responses from the Diameter Proxy component. Table 9-3 lists the metrics in this group.

Table 9-3 Diameter Proxy Metrics

Metric Type Description

jvm_buffer_count_buffers

Gauge

Contains the estimated number of buffers in the JVM memory pool.

jvm_buffer_memory_used_bytes

Gauge

Contains an estimated amount of memory the JVM is using for the buffer pool.

jvm_buffer_total_capacity_bytes

Gauge

Contains the estimated total capacity of the buffers in this pool.

jvm_gc_live_data_size_bytes

Gauge

Contains the size, in bytes, of the long-lived heap memory pool after reclamation.

jvm_gc_max_data_size_bytes

Gauge

Contains the maximum size of the long-lived heap memory pool.

jvm_gc_memory_allocated_bytes_total

Counter

Tracks the total size of incremental increases to the young heap memory after one GC to before the next one.

jvm_gc_memory_promoted_bytes_total

Counter

Tracks the total size of incremental increases to the old generation memory pool from before GC to after GC.

jvm_gc_pause_seconds

Summary

Contains information about the time spent in GC pause.

jvm_gc_pause_seconds_max

Gauge

Contains the maximum amount of time spent in GC pause.

jvm_memory_committed_bytes

Gauge

Contains the amount of memory, in bytes, that is committed for the JVM to use.

jvm_memory_max_bytes

Gauge

Contains the maximum amount of memory, in bytes, that can be used for memory management.

jvm_memory_used_bytes

Gauge

Contains the amount of memory used, in bytes.

jvm_threads_daemon_threads

Gauge

Contains the current number of live daemon threads.

jvm_threads_live_threads

Gauge

Contains the current number of live threads, including both daemon and non-daemon threads.

jvm_threads_peak_threads

Gauge

Contains the peak live thread count since the JVM started or the peak was reset.

jvm_threads_states_threads

Gauge

Contains the current number of threads in the NEW state.

log4j2_events_total

Counter

Tracks the total number of fatal-level log events.

nb_service_message_seconds

Histogram

Timer information about for recording messages to and from the Diameter Proxy.

nb_service_message_seconds_max

Gauge

Timer for recording messages to and from the Diameter Proxy.

nb_service_open_connections

Gauge

Tracks the number of open Diameter connections.

nb_service_peer_event_total

Counter

Tracks the total number of Diameter Peer events.

nb_service_processor_failed_total

Counter

Tracks the total number of failed events.

nb_service_processor_retries_total

Counter

Tracks the total number of processor retries.

nb_service_processor_seconds

Histogram

Contains the time taken to process the request.

nb_service_processor_seconds_max

Gauge

Contains the time taken to process the request.

nb_service_request_failed_total

Counter

Tracks the total number of failed Diameter Proxy requests.

nb_service_request_seconds

Histogram

Contains information about the total amount of time to process a request.

nb_service_request_seconds_max

Gauge

Contains the total amount of time to process the request.

process_cpu_usage

Gauge

Contains the recent CPU usage for the JVM process.

process_files_max_files

Gauge

Contains the maximum number of file descriptors.

process_files_open_files

Gauge

Contains the number of open file descriptors.

process_start_time_seconds

Gauge

Contains the start time of the process since the UNIX epoch time.

process_uptime_seconds

Gauge

Contains the JVM's total amount of uptime.

system_cpu_count

Gauge

Contains the number of processors available to the JVM.

system_cpu_usage

Gauge

Contains the recent CPU usage for the entire system.

system_load_average_1m

Gauge

Contains the total number of runnable entities queued to available processors, and the number of runnable entities running on the available processors averaged over a period of time.