9 Monitoring Network Bridge Processes
You can monitor system processes, such as memory and thread usage, in your Oracle Communications Network Bridge components in a cloud native environment.
About Monitoring Network Bridge Cloud Native
You can set up monitoring of your Network Bridge cloud native components using a Kubernetes Service Monitor, Prometheus Operator, and Grafana Dashboards. Service Monitor exposes JVM and application metric data through a single endpoint in an OpenMetrics/Prometheus exposition format. Prometheus then scrapes the metrics and stores them for analysis and monitoring through the Grafana Dashboards.
Network Bridge cloud native exposes metric data for the following components:
-
Mediation component
-
Egress component
-
HTTP to Diameter Adapter component
-
Diameter to HTTP Adapter component
-
Diameter Proxy component
Setting Up Monitoring of Network Bridge Components
Setting up monitoring of Network Bridge cloud native components involves the following high-level tasks:
-
Deploying the following prerequisite software on your Network Bridge cloud native environment:
-
Deploying Prometheus Operator on your Network Bridge cloud native environment. See "Installing Prometheus Operator".
-
Deploying Grafana on your Network Bridge cloud native environment. See "Installing Grafana".
For the list of compatible versions, see "Network Bridge Cloud Native Software Compatibility" in CCS Compatibility Matrix.
-
-
Enabling the Network Bridge Service Monitor and Grafana Dashboards.
Enabling the Network Bridge Service Monitor
By default, the Network Bridge Service Monitor and Grafana Dashboards are disabled. You must enable both if you want to monitor your system's processes using Prometheus and Grafana.
To enable the Service Monitor and Grafana Dashboards:
-
Create an override-values.yaml file for the oc-ccs-helm-chart-version Helm chart.
-
Enable the Network Bridge Service Monitor and Grafana Dashboards by setting these keys:
-
serviceMonitorEnabled: Set this to true.
-
grafanaDashboardsEnabled: Set this to true.
-
grafanaNamespace: Set this to the namespace in which to deploy the Grafana Dashboards.
-
serviceMonitor.namespace: Set this to the namespace in which to deploy the Service Monitor CRD.
-
-
Optionally, modify the default settings for Grafana:
-
granfanaDashboards.labels.grafana_dashboard: Specify the labels to add to the Grafana CRDs. This helps Grafana discover the dashboards.
-
granfanaDashboards.labels.release: Set this to the release name in which Grafana is deployed.
-
granfanaDashboards.annotations.k8s-sidecar-target-directory: Set this to the directory for the Grafana sidecar.
-
-
Save and close your override-values.yaml file.
-
Run the helm upgrade command to update your Network Bridge Helm release:
helm upgrade NbReleaseName oc-ccs-helm-chart-version --namespace NbNameSpace --values override-values.yaml
where NbReleaseName is the release name for Network Bridge, and NbNameSpace is the namespace in which to create Kubernetes objects for the Network Bridge Helm chart.
Network Bridge Cloud Native Metrics
Network Bridge cloud native collects metrics in the following groups to produce data for monitoring your components:
Mediation and REST Proxy Metrics
The Mediation and REST Proxy group contains standard metrics about the central processing unit (CPU) and memory utilization of JVMs, which are members of the Network Bridge grid. It also contains metrics for tracking the processing performance of requests to and responses from the Mediation, Egress, and REST Proxy components. Table 9-1 lists the metrics in this group.
Table 9-1 Mediation and REST Proxy Metrics
Metric Name | Type | Description |
---|---|---|
jvm_buffer_count_buffers |
Gauge |
Contains the estimated number buffers in the JVM memory pool. |
jvm_buffer_memory_used_bytes |
Gauge |
Contains an estimated amount of memory the JVM is using for the buffer pool. |
jvm_buffer_total_capacity_bytes |
Gauge |
Contains the estimated total capacity of the buffers in this pool. |
jvm_gc_live_data_size_bytes |
Gauge | Contains the size, in bytes, of the long-lived heap memory pool after reclamation. |
jvm_gc_max_data_size_bytes |
Gauge |
Contains the maximum size of the long-lived heap memory pool. |
jvm_gc_memory_allocated_bytes_total |
Counter |
Tracks the total size of increases to the young heap memory after one GC to before the next one. |
jvm_gc_memory_promoted_bytes_total |
Counter |
Tracks the total size of incremental increases to the old generation memory pool from before GC to after GC. |
jvm_gc_pause_seconds |
Summary |
Contains information about the time spent in GC pause. |
jvm_gc_pause_seconds_max |
Gauge |
Contains the maximum amount of time spent in GC pause. |
jvm_memory_committed_bytes |
Gauge |
Contains the amount of memory, in bytes, that is committed for the JVM to use. |
jvm_memory_max_bytes |
Gauge |
Contains the maximum amount of memory, in bytes, that can be used for memory management. |
jvm_memory_used_bytes |
Gauge |
Contains the amount of memory used, in bytes. |
jvm_threads_daemon_threads |
Gauge |
Contains the current number of live daemon threads. |
jvm_threads_live_threads |
Gauge |
Contains the current number of live threads, including both daemon and non-daemon threads. |
jvm_threads_peak_threads |
Gauge |
Contains the peak live thread count since the JVM started or the peak was reset. |
jvm_threads_states_threads |
Gauge |
Contains the current number of threads in the NEW state. |
log4j2_events_total |
Counter |
Tracks the total number of fatal-level log events. |
nb_service_mutation_execution_seconds |
Histogram |
Contains the time taken to perform a mutation. |
nb_service_mutation_execution_seconds_max |
Gauge |
Contains the time taken to perform a mutation. |
nb_service_processor_circuit_breaker_state |
Gauge |
Contains the current state of the circuit breaker: CLOSED (0.0), HALF_OPEN (1.0), or OPEN (2.0). |
ccs_service_processor_failed_total |
Counter |
Tracks the total number of failed events. |
nb_service_processor_retries_total |
Counter |
Tracks the total number of retries. |
ccs_service_processor_seconds |
Histogram |
Contains information about the time taken to process the request. |
ccs_service_processor_seconds_max |
Gauge |
Contains the time taken to process the request. |
nb_service_request_failed_total |
Counter |
Tracks the total number of failed mediation requests. |
nb_service_request_seconds |
Histogram |
Provides information about the total amount of time to process the request. |
nb_service_request_seconds_max |
Gauge |
Contains the total amount of time to process the request. |
nb_service_rule_matches_total |
Counter |
Tracks the number of times a rule has been matched. |
process_cpu_usage |
Gauge |
Contains the recent CPU usage for the JVM process. |
process_files_max_files |
Gauge |
Contains the maximum number of file descriptors. |
process_files_open_files |
Gauge |
Contains the number of open file descriptors. |
process_start_time_seconds |
Gauge |
Contains the start time of the process since the UNIX epoch time. |
process_uptime_seconds |
Gauge |
Contains the JVM's total amount of uptime. |
system_cpu_count |
Gauge |
Contains the number of processors available to the JVM. |
system_cpu_usage |
Gauge |
Contains the recent CPU usage for the entire system. |
system_load_average_1m |
Gauge |
Contains the total number of runnable entities queued to available processors, and the number of runnable entities running on the available processors averaged over a period of time. |
Diameter Adapter Metrics
The Diameter Adapter Metrics group contains standard metrics about the CPU and memory utilization of JVMs and the processing performance of requests to and responses from the HTTP-to-Diameter Adapter and Diameter-to-HTTP Adapter components. Table 9-2 lists the metrics in this group.
Table 9-2 Diameter Adapter Metrics
Metric | Type | Description |
---|---|---|
jvm_buffer_count_buffers |
Gauge |
Contains the estimated number of buffers in the JMV memory pool. |
jvm_buffer_memory_used_bytes |
Gauge |
Contains an estimated amount of memory the JVM is using for the buffer pool. |
jvm_buffer_total_capacity_bytes |
Gauge |
Contains the estimated total capacity of the buffers in this pool. |
jvm_gc_live_data_size_bytes |
Gauge |
Contains the size, in bytes, of the long-lived heap memory pool after reclamation. |
jvm_gc_max_data_size_bytes |
Gauge |
Contains the maximum size of the long-lived heap memory pool. |
jvm_gc_memory_allocated_bytes_total |
Counter |
Tracks the total size of increases to the young heap memory after one GC to before the next one. |
jvm_gc_memory_promoted_bytes_total |
Counter |
Tracks the total size of incremental increases to the old generation memory pool from before GC to after GC. |
jvm_gc_pause_seconds |
Summary |
Contains information about the time spent in GC pause. |
jvm_gc_pause_seconds_max |
Gauge |
Contains the maximum amount of time spent in GC pause. |
jvm_memory_committed_bytes |
Gauge |
Contains the amount of memory, in bytes, that is committed for the JVM to use. |
jvm_memory_max_bytes |
Gauge |
Contains the maximum amount of memory, in bytes, that can be used for memory management. |
jvm_memory_used_bytes |
Gauge |
Contains the amount of memory used, in bytes. |
jvm_threads_daemon_threads |
Gauge |
Contains the current number of live daemon threads. |
jvm_threads_live_threads |
Gauge |
Contains the current number of live threads, including both daemon and non-daemon threads. |
jvm_threads_peak_threads |
Gauge |
Contains the peak live thread count since the JVM started or the peak was reset. |
jvm_threads_states_threads |
Gauge |
Contains the current number of threads in the NEW state. |
log4j2_events_total |
Counter |
Tracks the total number of fatal-level log events. |
nb_service_processor_failed_total |
Counter |
Tracks the total number of failed events. |
nb_service_processor_retries_total |
Counter |
Tracks the total number of processor retries. |
nb_service_processor_seconds |
Histogram |
Provides information about the time taken to process the request. |
nb_service_processor_seconds_max |
Gauge |
Contains the time taken to process the request. |
nb_service_request_failed_total |
Counter |
Tracks the total number of failed events. |
nb_service_request_seconds |
Histogram |
Tracks the delta between the Usage UsageDate and the time the service received the event. |
nb_service_request_seconds_max |
Gauge |
Tracks the delta between the Usage UsageDate and the time the service received the event. |
process_cpu_usage |
Gauge |
Contains the recent CPU usage for the JVM process. |
process_files_max_files |
Gauge |
Contains the maximum number of file descriptors. |
process_files_open_files |
Gauge |
Contains the number of open file descriptors. |
process_start_time_seconds |
Gauge |
Contains the start time of the process since the UNIX epoch time. |
process_uptime_seconds |
Gauge |
Contains the JVM's total amount of uptime. |
system_cpu_count |
Gauge |
Contains the number of processors available to the JVM. |
system_cpu_usage |
Gauge |
Contains the recent CPU usage for the entire system. |
system_load_average_1m |
Gauge |
Contains the total number of runnable entities queued to available processors, and the number of runnable entities running on the available processors averaged over a period of time. |
Diameter Proxy Metrics
The Diameter Proxy group contains standard metrics about the CPU and memory utilization of JVMs and the processing performance of requests to and responses from the Diameter Proxy component. Table 9-3 lists the metrics in this group.
Table 9-3 Diameter Proxy Metrics
Metric | Type | Description |
---|---|---|
jvm_buffer_count_buffers |
Gauge |
Contains the estimated number of buffers in the JVM memory pool. |
jvm_buffer_memory_used_bytes |
Gauge |
Contains an estimated amount of memory the JVM is using for the buffer pool. |
jvm_buffer_total_capacity_bytes |
Gauge |
Contains the estimated total capacity of the buffers in this pool. |
jvm_gc_live_data_size_bytes |
Gauge |
Contains the size, in bytes, of the long-lived heap memory pool after reclamation. |
jvm_gc_max_data_size_bytes |
Gauge |
Contains the maximum size of the long-lived heap memory pool. |
jvm_gc_memory_allocated_bytes_total |
Counter |
Tracks the total size of incremental increases to the young heap memory after one GC to before the next one. |
jvm_gc_memory_promoted_bytes_total |
Counter |
Tracks the total size of incremental increases to the old generation memory pool from before GC to after GC. |
jvm_gc_pause_seconds |
Summary |
Contains information about the time spent in GC pause. |
jvm_gc_pause_seconds_max |
Gauge |
Contains the maximum amount of time spent in GC pause. |
jvm_memory_committed_bytes |
Gauge |
Contains the amount of memory, in bytes, that is committed for the JVM to use. |
jvm_memory_max_bytes |
Gauge |
Contains the maximum amount of memory, in bytes, that can be used for memory management. |
jvm_memory_used_bytes |
Gauge |
Contains the amount of memory used, in bytes. |
jvm_threads_daemon_threads |
Gauge |
Contains the current number of live daemon threads. |
jvm_threads_live_threads |
Gauge |
Contains the current number of live threads, including both daemon and non-daemon threads. |
jvm_threads_peak_threads |
Gauge |
Contains the peak live thread count since the JVM started or the peak was reset. |
jvm_threads_states_threads |
Gauge |
Contains the current number of threads in the NEW state. |
log4j2_events_total |
Counter |
Tracks the total number of fatal-level log events. |
nb_service_message_seconds |
Histogram |
Timer information about for recording messages to and from the Diameter Proxy. |
nb_service_message_seconds_max |
Gauge |
Timer for recording messages to and from the Diameter Proxy. |
nb_service_open_connections |
Gauge |
Tracks the number of open Diameter connections. |
nb_service_peer_event_total |
Counter |
Tracks the total number of Diameter Peer events. |
nb_service_processor_failed_total |
Counter |
Tracks the total number of failed events. |
nb_service_processor_retries_total |
Counter |
Tracks the total number of processor retries. |
nb_service_processor_seconds |
Histogram |
Contains the time taken to process the request. |
nb_service_processor_seconds_max |
Gauge |
Contains the time taken to process the request. |
nb_service_request_failed_total |
Counter |
Tracks the total number of failed Diameter Proxy requests. |
nb_service_request_seconds |
Histogram |
Contains information about the total amount of time to process a request. |
nb_service_request_seconds_max |
Gauge |
Contains the total amount of time to process the request. |
process_cpu_usage |
Gauge |
Contains the recent CPU usage for the JVM process. |
process_files_max_files |
Gauge |
Contains the maximum number of file descriptors. |
process_files_open_files |
Gauge |
Contains the number of open file descriptors. |
process_start_time_seconds |
Gauge |
Contains the start time of the process since the UNIX epoch time. |
process_uptime_seconds |
Gauge |
Contains the JVM's total amount of uptime. |
system_cpu_count |
Gauge |
Contains the number of processors available to the JVM. |
system_cpu_usage |
Gauge |
Contains the recent CPU usage for the entire system. |
system_load_average_1m |
Gauge |
Contains the total number of runnable entities queued to available processors, and the number of runnable entities running on the available processors averaged over a period of time. |