Monitoring Offline Mediation Controller

4 Monitoring Offline Mediation Controller

Learn how to monitor your components in Oracle Communications Offline Mediation Controller.

Topics in this document:

Monitoring Node Performance in Administration Client

You can monitor node performance in Administration Client in the following ways:

For a current view of node performance, use the Node Performance View. This view displays a list of nodes running on the selected host. You can monitor node up time, current NARs, current rate, average rate, and total NARs.
To get node performance statistics, you can configure statistics reporting: input records, output records, duplicate records, aggregated records, and discarded records. You can enable or disable statistics reporting for each node, for all nodes, or for no nodes.

Monitoring Node Performance with Prometheus and Grafana

You can monitor node performance in Offline Mediation Controller by using Prometheus and Grafana. For the list of compatible Prometheus and Grafana software versions, see "Additional Offline Mediation Controller Software Requirements" in Offline Mediation Controller Compatibility Matrix.

By default, Offline Mediation Controller tracks and exposes the following Node Manager-level statistics through a single endpoint in Prometheus format:

The total network account records (NARs) processed
The current NARs processed
The current processing rate
The average processing rate

The metric data for all Node Manager components are exposed at http://localhost:8082/metrics, but you can change the port number where the metrics are exposed or disable the metric endpoint altogether. See "Configuring the Node Manager Metric Endpoint".

You can then configure Prometheus to scrape the Node Manager metrics from the endpoint and store them for analysis and monitoring. You can also set up a visualization tool, such as Grafana, to display your metric data in a graphical format.

You can use the following sample Grafana Dashboard templates to visualize the Offline Mediation Controller metrics:

OCOMC_JVM_Dashboard.json: Allows you to view JVM-related metrics for Offline Mediation Controller.
OCOMC_Node_Manager_Summary.json: Allows you to view NAR processing metrics for the Node Manager.
OCOMC_Node_Summary.json: Allows you to view NAR processing metrics for all nodes.
OCOMC_Summary_Dashboard.json: Allows you to view NAR-related metrics for all Offline Mediation Controller components.

To use the sample dashboards, import the JSON files from the OMC_home/sampleData/dashboards directory into Grafana. For information about importing dashboards, see "Manage Dashboards" in the Grafana Dashboards documentation.

Configuring the Node Manager Metric Endpoint

You can configure Offline Mediation Controller to expose Node Manager metric data at a port other than 8082 as well as disable or enable the metric endpoint.

To configure the Node Manager metric endpoint:

Open your OMC_home/bin/UDCEnvironment file in a text editor.
Set the IS_METRIC_ENABLED environment variable to one of the following:
- true: Enables the metric endpoint. This is the default.
- false: Disables the metric endpoint.
Set the METRIC_PORT key to the port number at which you want to expose the Node Manager metrics. The default is 8082.
Save and close the UDCEnvironment file.
Restart Node Manager. See "Starting and Stopping Node Manager".

Monitoring System Processes

You can monitor Offline Mediation Controller system processes such as disk, memory, and CPU usage levels by setting thresholds in the OMC_home/config/nodemgr/nodemgr.cfg file. You use this file to set a warning threshold, an error threshold, and the action that is triggered when a process's threshold is crossed.

Note:

If you do not modify the nodemgr.cfg file, Offline Mediation Controller uses the default threshold values.

To monitor disk errors, see "Using the Disk Status Monitor".
To monitor memory errors, see "Using the Memory Monitor".
To monitor CPU usage levels, see "Using the CPU Usage Monitor".

By default, Offline Mediation Controller generates a single alarm for each error condition, even if the error condition occurs multiple times. To generate an alarm or trap for every error occurrence, open the nodemgr.cfg file and change the SUPPRESS_MULTIPLE_ALARMS parameter value to No.

Using the Disk Status Monitor

You use the disk status monitor to alert you to potential disk issues, so you can take action to avoid unrecoverable errors.

Note:

The disk status monitor runs only on Solaris workstations that have the Sun Solstice DiskSuite metastat command installed.

Table 4-1 lists the parameters you can add or modify in the nodemgr.cfg file.

Table 4-1 Disk Status Monitor Parameters

Parameter	Description
DISK_STATUS_CMD	The full path to the metastat command. The default is /usr/sbin/metastat.
DISK_STATUS_POLLTIME	Amount of time to wait between polling intervals.

Using the Memory Monitor

You use the memory monitor to alert you when memory usage exceeds a specified threshold. In addition to the threshold, you can configure the memory monitor to log memory usage statistics.

Table 4-2 lists the parameters you can add or modify in the nodemgr.cfg file.

Table 4-2 Memory Monitor Parameters

Parameter	Description
LOG_MEMORY_USAGE	Set to Y to log memory usage statistics. The default is N.
MEMORY_MAJOR_THRESHOLD	The level at which a major alarm is raised, as a percentage. The default is 85.
MEMORY_WARNING_THRESHOLD	The level at which a warning alarm is raised, as a percentage. The default is 70.
MEMORY_SAMPLE_TIME	A time interval, in seconds, during which the memory usage must be above a specific threshold level before an alarm is raised. The default is 60.
MEMORY_SAMPLE_FREQ	The number of polls that are taken during each sample period. The default is 4.

For example, using the default values for MEMORY_SAMPLE_TIME (60 seconds) and MEMORY_SAMPLE_FREQ (4), the memory usage polls would occur every 15 seconds (60 seconds divided by 4). In this case, an alarm would be generated if the memory usage level was above the specified threshold for 4 consecutive polls.

Using the CPU Usage Monitor

The CPU usage monitor generates a critical or major alarm if the CPU usage level reaches a specified value.

Table 4-3 lists the parameters you can add or modify in the nodemgr.cfg file.

Table 4-3 CPU Usage Monitor Parameters

Parameter	Description
CPU_REDTHRESHOLD	The percentage of CPU in use that will generate a critical alarm. The default is 90.
CPU_YELLOWTHRESHOLD	The percentage of CPU in use that will generate a major alarm. The default is 80.
CPU_SAMPLETIME	The period, in seconds, in which to poll a fixed number of times. The default is 60.
CPU_SAMPLEFREQ	How often to poll during the fixed period. The default is 3.

For example, using the default values for CPU_SAMPLETIME (60 seconds) and CPU_SAMPLEFREQ (3), a poll will take place every 20 seconds (60 seconds divided by 3).

Monitoring How Offline Mediation Controller Components are Running

You can monitor Offline Mediation Controller components to ensure that they are still running and to restart the server by using the ProcessControl script. The script periodically monitors the status of the Offline Mediation Controller components that are defined in the offline_mediation.conf file. See "Configuring the ProcessControl Script to Run Components" for information about editing the offline_mediation.conf file.

To monitor how the Offline Mediation Controller server and components are running:

Stop all Offline Mediation Controller components.
Open the /etc/inittab file in a text editor.

Add the following entry:

NT:3:respawn:/etc/init.d/ProcessControl monitor

Save and close the file.
Run the following command, which periodically monitors the status of the Offline Mediation Controller components that are defined in the offline_mediation.conf file:
```
./ProcessControl monitor
```

Using Server Monitoring

The Offline Mediation Controller server monitoring feature creates log files that report hardware performance at all times, and divides that data into convenient statistical categories.

Statistical Categories

Each statistical category has an entry in the nodemgr.cfg file to indicate if performance logging is desired. The default values are pre-set in this file and you can change them where necessary. The nodemgr.cfg file is located at OMC_home/config/nodemgr. The statistical categories are listed in the following sections.

Disk Utilization

This function monitors the percentage of total disk space currently used on the Offline Mediation Controller partition. The corresponding entry in the nodemgr.cfg file is: SERVERMONITOR_DISK_UTILIZATION.

The disk utilization log file is located in OMC_home/serverMonitoring/IP_Port/diskUtilization.

The log file values are as follows:

partition = Offline Mediation Controller installation, used to determine partition being monitored
kbytes = Total disk space in partition, measured in kbytes
used = Total disk space in partition used, measured in kbytes
available = Disk space not in use
capacity = Percentage of disk space in used

Here is an example of the disk utilization log file:

<poll date="2005/09/27" time="14:44:18" partition="/opt/nm500" kbytes="5886725" used="5139094" avail="688764" capacity="89%" />
<poll date="2005/09/27" time="14:49:19" partition="/opt/nm500" kbytes="5886725" used="5139129" avail="688729" capacity="89%" /> 
<poll date="2005/09/27" time="14:54:19" partition="/opt/nm500" kbytes="5886725" used="5139137" avail="688721" capacity="89%" /> 
<poll date="2005/09/27" time="14:59:20" partition="/opt/nm500" kbytes="5886725" used="5139144" avail="688714" capacity="89%" /> 
<poll date="2005/09/27" time="15:04:20" partition="/opt/nm500" kbytes="5886725" used="5139150" avail="688708" capacity="89%" />

Disk Status

This function monitors the health of the disk containing Offline Mediation Controller using the metastat command. Note: the metastat command must be previously installed on the system to correctly use this feature. The corresponding entry in the nodemgr.cfg file is: SERVERMONITOR_DISK_STATUS.

The disk status log file is located in OMC_home/serverMonitoring/IP_Port/diskStatus.

Here is an example of the disk status log file:

<poll date="2005/09/27" time="16:26:15" diskHealth="healthy" /> 
<poll date="2005/09/27" time="16:36:15" diskHealth="healthy" /> 
<poll date="2005/09/27" time="16:46:15" diskHealth="healthy" />

CPU Utilization

This function monitors the percentage of the processor(s) currently in use in the system. The corresponding entry in the nodemgr.cfg file is: SERVERMONITOR_CPU_UTILIZATION.

The CPU utilization log file is located in OMC_home/serverMonitoring/IP_Port/cpuUtilization.

The log file values are as follows:

cpuActive = Percentage of CPU taken up with user processes
cpuSystem = Percentage of CPU taken up with system processes
cpuIdle = Percentage of CPU not being used

Here is an example of the CPU utilization log file:

<poll date="2005/09/27" time="14:39:46" cpuActive="34" cpuSystem="4" cpuIdle="62" /> 
<poll date="2005/09/27" time="14:40:06" cpuActive="62" cpuSystem="3" cpuIdle="35" /> 
<poll date="2005/09/27" time="14:40:26" cpuActive="38" cpuSystem="4" cpuIdle="58" /> 
<poll date="2005/09/27" time="14:40:46" cpuActive="16" cpuSystem="3" cpuIdle="81" />

Memory Utilization

This function monitors the percentage of the memory currently in use in the system. The corresponding entry in the nodemgr.cfg file is: SERVERMONITOR_MEMORY_UTILIZATION.

The memory utilization log file is located in OMC_home/serverMonitoring/IP_Port/memoryUtilization.

The log file values are as follows:

freeMemory = Amount of memory not used in the heap, measured in bytes
maxMemory = Memory limit available for process to grow into (-Xmx option), measured in bytes
usedMemory = The following calculation: maxMemory - freeMemory, measured in bytes
memory Utilization = The following calculation: (currently allocated process limit - freeMemory) / maxMemory, measured in bytes

Here is an example of the memory utilization log file:

<poll date="2005/09/27" time="14:36:07" memoryUtilization="2.8679903" usedMemory="5.4105304E7" freeMemory="1.23482E7" maxMemory="6.6453504E7" />
<poll date="2005/09/27" time="14:36:22" memoryUtilization="2.4237578" usedMemory="5.6333232E7" freeMemory="1.0120272E7" maxMemory="6.6453504E7" />
<poll date="2005/09/27" time="14:36:37" memoryUtilization="2.577369" usedMemory="5.6435312E7" freeMemory="1.0018192E7" maxMemory="6.6453504E7" /> 
<poll date="2005/09/27" time="14:36:52" memoryUtilization="2.6964898" usedMemory="5.6514472E7" freeMemory="9939032.0" maxMemory="6.6453504E7" />

Open Files - System File Monitoring

This function tracks the number of files open on the operating system. To enable system file monitoring in Offline Mediation Controller, the open source package "lsof" must be installed in a location accessible from the $PATH variable. The corresponding entry in the nodemgr.cfg file is: SERVERMONITOR_OPEN_FILES.

The log file is located in OMC_home/serverMonitoring/IP_Port/systemFiles.

The log file value "openFiles" is the number of open files in the entire system.

Here is an example of the open files log file:

<poll date="2005/09/27" time="16:19:55" openFiles="2203" /> 
<poll date="2005/09/27" time="16:20:58" openFiles="2222" /> 
<poll date="2005/09/27" time="16:22:02" openFiles="2298" /> 
<poll date="2005/09/27" time="16:23:05" openFiles="2298" /> 
<poll date="2005/09/27" time="16:24:09" openFiles="2201" /> 
<poll date="2005/09/27" time="16:25:12" openFiles="2247" /> 
<poll date="2005/09/27" time="16:26:15" openFiles="2201" />

Log File Information

Log File Format

A server monitor log is an XML file, which tracks performance values gathered at each poll instance. Each log contains a date and timestamp, followed by the statistical values gathered during that period. Each statistical category has its own performance log file. For example:

<poll date="04/27/2005" time="13:49:07" cpuActive= "4" cpuIdle= "96" /> 
<poll date="04/27/2005" time="13:50:07" cpuActive= "6" cpuIdle= "94" /> 
<poll date="04/27/2005" time="13:51:07" cpuActive= "5" cpuIdle= "95" /> 
<poll date="04/27/2005" time="13:52:07" cpuActive= "5" cpuIdle= "95" />

Log File Duration

A server monitor log file contains performance data spanning a day or month, depending on which value you select in the nodemgr.cfg file. The default value is daily. For example:

SERVERMONITOR_LOG_GRANULARITY 'monthly/daily'

Log File Retention

You can specify the number of performance logs the Node Manager will retain. The default value of 180 allows for half of a year of data retention. For example:

SERVERMONITOR_LOG_RETENTION '###'

Log File Rollover

As a new day or month begins, the Node Manager automatically opens a file for the new period. At that time, the Node Manager also performs post-processing on the XML file from the previous day or month. The post-processing involves adding opening and closing tags to the XML file to ensure the data is well-formed.

CSV File Creation

For each performance log XML file, the Node Manager creates a corresponding CSV file. This command-delimited file mirrors the information present in the performance log XML file, and is suitable for importing into Microsoft Excel. For example:

Date,time,cpuActive,cpuIdle 
04/27/2005,13:49:07,4,96 
04/27/2005,13:49:07,6,94 
04/27/2005,13:49:07,5,95

Log File Naming

The performance log files (XML and CSV), are named according to the statistical category and time period to which they pertain. The datestamp is in the format: YearMonthDay. For example:

cpu_utilization_20050431.xml (daily) 
cpu_utilization_20050431.csv 
cpu_utilization_200504.xml (monthly) 
cpu_utilization_200504.csv

Using Logs to Monitor Offline Mediation Controller Components

Offline Mediation Controller records system activity in log files. One log file is generated for each Offline Mediation Controller component and for each node on the mediation host. Review the log files daily to monitor your system and detect and diagnose system problems.

Types of Log Files

Offline Mediation Controller generates log files for Offline Mediation Controller components and for the nodes you create.

Log Files for Offline Mediation Controller Components

For Offline Mediation Controller components such as Administration Server, Node Manager, and Administration Client, log files are named component.log; for example, nodemgr.log, adminserver.log, and GUI.log. The closed log files are saved using the Offline Mediation Controller component or cartridge node name, and an incrementing number; for example, nodemgr.log.1, nodemgr.log.2.

Log Files for Offline Mediation Controller Nodes on the Mediation Host

For each node on the mediation host, the log file is named nodeID.log, where nodeID is the unique ID that is assigned by Administration Server when you create a node on a mediation host; for example, if the unique ID of the node is 2ys4tt-16it-hslskvi1, the log file name is 2ys4tt-16it-hslskvi1.log.

Location of Log Files

The following are the minimum Offline Mediation Controller log files:

nodemgr.log
adminserver.log
GUI.log

Depending on the number of nodes you create, your installation will have one or more node log files.

Default Log File Locations

The log files for Offline Mediation Controller components are stored in the OMC_home/log/component directory; for example, the Node Manager log file is in the OMC_home/log/nodemgr directory.

The log files for the nodes are stored in the OMC_home/log/nodeID directory; for example, if the node ID of the node is 2ys4tt-16it-hslskvi1, the node log file is in the OMC_home/log/2ys4tt-16it-hslskvi1 directory.

Note:

Oracle recommends that you not change the default location of the log files. If you change the location of the log files, you cannot access the log information from Administration Client.

Table 4-4 lists the components and their corresponding log file locations.

Table 4-4 Offline Mediation Controller Components and Log File Locations

Component	Log File Locations
Node Manager	OMC_home/log/nodemgr/nodemgr.log
Administration Server	OMC_home/log/adminserver/adminserver.log
Administration Client	OMC_home/log/gui/GUI.log
Node	OMC_home/log/nodeID/nodeID.log

About the Logger Properties Files

Each Offline Mediation Controller component or node has its own logger properties file. When an Offline Mediation Controller component or a node is started for the first time, the logger properties file is dynamically created in the OMC_home/config/component directory. By default, the logger properties file is set to the default logging level.

Table 4-5 lists the components and their corresponding logger properties file locations.

Table 4-5 Offline Mediation Controller Components and Logger Properties File Locations

Component	Logger Properties File Locations
Node Manager	OMC_home/config/nodemgr/nodemgrLogger.properties
Administration Server	OMC_home/config/adminserver/adminserverLogger.properties
Administration Client	OMC_home/config/GUI/GUILogger.properties
Node	OMC_home/config/nodeID/nodeIDLogger.properties

Setting the Reporting Level for Logging Messages

By default, Offline Mediation Controller components report information messages. You can set Offline Mediation Controller to report or to not report information messages. The following levels of reporting are supported:

NO = No logging
WARN = Log only warning messages
INFO = Log information messages (default)
DEBUG = Log debug messages
ALL = Log warning, information, and debug messages

Note:

To avoid performance degradation, use INFO level logging for debugging.

To change the severity level for logging:

Open the logger properties file for the component in a text editor. See "About the Logger Properties Files".
Search for the following entry:
```
log4j.logger.componentName.component=severity,componentAppender
```
where:
- componentName is the name of the Offline Mediation Controller component.
- component is the Offline Mediation Controller component or the node ID.
- severity is the current severity level for the logging.
For example:
```
log4j.logger.NodeManager.nodemgr=WARN,nodemgrAppender
```
Change the entry to the desired severity level for logging.

For example, to change the log level from WARN to INFO for Node Manager:
```
log4j.logger.NodeManager.nodemgr=INFO,nodemgrAppender
```
Save and close the file.

Note:

You do not need to restart the running process to enable the changes in the logging level. A predefined delay of two minutes is set before the changed logger configuration takes effect.