23 Monitoring Your System with System Manager

This chapter describes how to use Oracle Communications Billing and Revenue Management (BRM) Manager to monitor your BRM system.

For information about System Manager opcodes, see "System Manager FM Standard Opcodes" in BRM Developer's Reference.

About System Manager

System Manager, in conjunction with Node Managers, manages and monitors BRM servers. The System Manager lets you:

Start, stop, and monitor Connection Managers (CMs), Connection Manager Master Processes (CMMPs), and Data Managers (DMs).
Detect failure if a server stops or fails.
Centralize log management.

System Manager also provides a framework that you can use to create enhanced management and control functions.

You can use System Manager's command-line utilities and the testnap testing utility to send opcodes to System Manager. For information on using testnap, see "Testing Your Applications and Custom Modules" and "testnap".

For an example of how to use System Manager with testnap, see "Getting the Status of the Servers on All Nodes".

Terminology

To understand System Manager, you need to know these terms:

server: An instance of a server process, such as a Connection Manager (CM) or a Data Manager (DM).
node: A computer running one or more servers.

Figure 23-1 shows the relationships among these components:

Figure 23-1 System Manager Component Relationships

Description of ''Figure 23-1 System Manager Component Relationships''

BRM Server States

You can obtain the status of servers by using System Manager or a Node Manager. Each BRM server always has one of these states:

Starting state

The initial state of the server. A BRM Server can be started by a Node Manager or by another method.
Running state

After the BRM servers start, each server gets the address of the Node Manager from the input parameters, connects to that address, and sends a message to the Node Manager that it has started. When the Node Manager gets a message from a BRM server, it changes the state of that server to the running state.
Stopping state

When the Node Manager gets a message to stop a server, it updates the state of the server to stopping. There are two ways a server can be shutdown: immediately or after completing the current transactions.
- Soft Shutdown: After receiving a Soft Shutdown message, servers complete current transactions and stop accepting new requests. After current transactions are finished, the servers terminate their own processes. This is the default mode.
- Immediate Shutdown: After receiving an Immediate Shutdown message, servers abort all transactions and terminate their own processes. You can set a flag in the input flist to activate the Immediate Shutdown mode. This can be useful if, for example, your system has stopped responding.
Down state

If the Node Manager cannot start a server in the stopping state after several tries, Node Manager stops trying and changes the state of the server to the down state.

Understanding the Node Manager

Each Node Manager controls and manages BRM server processes running on a computer node. Each computer node has one Node Manager which is responsible for managing BRM on that node. The Node Manager needs the following information about each BRM server:

Server name
Server executable
Server working directory

When a Node Manager is started, it reads its configuration file. If the start_servers switch there is set to 1, Node Manager starts the servers. Once a server has been started with the Node Manager, the server notifies the Node Manager that it has started.

As long as the Node Manager configuration file includes the correct server_info entries for the CM and DM, you can manage these servers, even if they are started by another method. See "Editing the Node Manager Configuration File" for information about these entries.

For example, you can use the command pin_ctl start all to start the CM and DM processes before the Node Manager starts. Even though these processes weren't initially started by the Node Manager, you will still be able to use the System Manager to stop and start the CM and DM.

Client Connection to Node Manager

Node Manager is a multithreaded program that uses the PCM protocol for communication. You should initially use PCM_CONNECT or the pin_nmgr_connect routine to make the connection to the Node Manager. The Node Manager creates one thread for monitoring servers and creates one thread for each client connection.

Node Manager Opcodes

Node Manager uses these opcodes:

PCM_OP_INFMGR_GET_INFO: Gets information about servers that are configured on a node.
PCM_OP_INFMGR_GET_STATUS: Gets status of servers on a node.
PCM_OP_INFMGR_START_SERVER: Starts servers on a node.
PCM_OP_INFMGR_STOP_SERVER: Stops servers running on a node.
PCM_OP_INFMGR_MODIFY_MONITOR_INTERVAL: Modifies the monitoring interval. The default monitoring interval is two minutes. After each interval, System Manager gets the latest status of BRM servers.

Understanding System Manager

System Manager lets you check or manage BRM servers. When it starts, it reads all Node Manager addresses from its configuration file and connects to the Node Managers. After making the connection, System Manager gets configuration information from each node.

System Manager includes a log manager that monitors a well-known address and collects all log messages sent by Node Managers. Its monitoring component checks the status of servers at each monitoring interval. The default monitoring interval is two minutes, but you can change it while System Manager is running.

You can use this facility through the System Manager opcodes or the "System Manager Command-Line Interface".

System Manager Command-Line Interface

Using the command-line interface, infmgr_cli, to issue commands to System Manager is simpler than using testnap or passing an opcode through PCM_OP(). (For an example of using testnap to issue commands to System Manager, see "Getting the Status of the Servers on All Nodes".)

To use the command-line interface for controlling System Manager:

Ensure that System Manager and any Node Managers are running.
If necessary, modify the configuration file of infmgr_cli to specify the host name and port number where System Manager is running.
Run infmgr_cli.
Enter any of the supported commands at the prompt.

Table 23-1 lists the supported commands:

Table 23-1 System Manager Command-Line Interface

Command	Description
gi	Gets information for a node or servers.
gs	Gets status for a node, or server.
sdt	Schedules downtime for a server.
cdt	Cancels scheduled downtime for a server.
sfw	Tells a satellite CM to start or resume passing opcodes to the main CM.
startserv	Starts the server.
stopserv	Stops the server.
efw	Tells a satellite CM to stop passing opcodes to the main CM.
h	Displays help messages.
?	Displays help messages.
q	Quits the command-line interface.

The h, ?, and q commands take no parameters. The syntax for the other commands is:

gi (get information)
```
gi [-u] [-c|-n|-s  name]
  
```
where
- -u specifies to System Manager to get information from Node Managers and update the local cache. If -u is not specified, System Manager gets the information from the local cache.
- -c name specifies a cell (not yet supported).
- -n name specifies a node.
- -s name specifies a server.
The node or server name is the name specified in the Node Manager's configuration file. This command accepts only one node or server name.

This command calls the PCM_OP_INFMGR_GET_INFO opcode.
gs (get status)
```
gs [-u] [-c|-n|-s  name]
  
```
where
- -u specifies to System Manager to get information from Node Managers and update the local cache. If -u is not specified, System Manager gets the information from the local cache.
- -c name specifies a cell (not yet supported).
- -n name specifies a node.
- -s name specifies a server.
The node or server name is the name specified in the Node Manager's configuration file. This command accepts only one node or server name.

This command calls the PCM_OP_INFMGR_GET_STATUS opcode.
sdt (schedule downtime)
```
sdt [server_name] [start_time] [end_time]
  
```
where
- server_name specifies the server.
- start_time specifies when the downtime is to begin.
- end_time specifies when the downtime is to finish.
Use this format for start_time and end_time:
```
month/date/year hour:minute
  
```
where
- month uses two digits to specify the month.
- date uses two digits to specify the date.
- year uses four digits to specify the year.
- hour uses two digits to specify the hour, based on the 24-hour clock.
- minute uses two digits to specify the minute.
This command calls the PCM_OP_INFMGR_SCHEDULE_DOWNTIME opcode.
cdt (cancel scheduled downtime)
```
cdt [server_name]
  
```
where server_name specifies the server.

This command calls the PCM_OP_INFMGR_CANCEL_DOWNTIME opcode.
sfw (start forwarding)
```
sfw [cm_ptr]
  
```
where cm_ptr specifies the satellite CM that is to start or resume forwarding opcodes to the main CM. The CM name must be as given in the System Manager's configuration file.

This command calls the PCM_OP_INFMGR_SATELLITE_CM_START_FORWARDING opcode.
efw (end forwarding)
```
efw [cm_ptr]
  
```
where cm_ptr specifies the satellite CM that is to stop forwarding opcodes to the main CM. The CM name must be as given in the System Manager's configuration file.

This command calls the PCM_OP_INFMGR_SATELLITE_CM_STOP_FORWARDING opcode.
startserv[server name]

where server name is the name given to a server in the configuration file of Node Manager.
stopserv[server name]

where server name is the name given to a server in the configuration file of Node Manager.

Important:
A server can be stopped by System Manager only if it is started by Node Manager.

Centralized Log Management

System Manager provides centralized log management when log manager options are enabled for System Manager and Node manager.

The PINLOG module writes messages (default error messages) for each individual server into the Node Manager log file. When a Node Manager starts a server, all error messages that are generated by the server are stored in the Node Manager log file.

You can configure the Node Manager to monitor its own log file or route log file messages to a destination that you specify using the logmgr_ptr entry of the Node Manager configuration file. The destination you choose for the log file messages can be the System Manager log file or your own defined log collector. If the logmgr_ptr entry is used in a Node Manager configuration file, the Node Manager sends messages from its log file to the host and port you specified in that entry. If logmgr_port is used in the System Manager configuration file, System Manager starts a log collector thread and collects all log messages that are sent by Node Managers. Figure 23-2 illustrates the log-management options:

Figure 23-2 Log Management Options

Description of ''Figure 23-2 Log Management Options''

Installing System Manager and Node Manager

Both System Manager and Node Manager are installed along with BRM. Startup information for the managers is included in the init-d.pin.nmgr and init-d.pin.infmgr files. This information is read during the installation of BRM.

You start and stop the manager using the pin_ctl command, as shown in Table 23-2:

Table 23-2 pin_ctl Commands for System and Node Managers

Manager	Start	Stop	Start then Stop
System Manager	pin_ctl start infmgr	pin_ctl stop infmgr	pin_ctl bounce infmgr
Node Manager	pin_ctl start nmgr	pin_ctl stop nmgr	pin_ctl bounce nmgr

Configuring System Manager and Node Manager

To configure the System Manager and Node Manager, edit the configuration file in BRM_Home/sys/infmgr and BRM_Home/sys/nmgr directories, where BRM_Home is the directory in which you installed BRM components.

Editing the Node Manager Configuration File

Node Manager monitors servers (CMs and DMs). Each server is specified in the configuration file of Node Manager by a server_info entry in this format:

server_info   process_name   program_path   working_path  host_name  port

where

process_name is the name of the server. This name has to be unique among all servers in the BRM network.
- For a CM, process_name must contain substring cm and not substring dm.
- For a DM, process_name must contain substring dm and not substring cm.
program_path is the path to the executable program for the process.
working_path is the path to the working directory for the process, where the configuration file for that process can be found.
host_name is the component host name.
port is component port number.

For example:

server_info  dm1   /BRM_Home/bin/dm  /BRM_Home/sys/dm_oracle frisco 11961
server_info  cm_master  /BRM_Home/bin/cm  /BRM_Home/sys/cm joe 21331

For information about other entries in the configuration file, refer to the comments preceding each entry in the file.

Editing the System Manager Configuration File

System Manager monitors one or more Node Managers, which can be running on different computers. You specify each Node Manager by a node_ptr entry in the configuration file of System Manager, in this format:

infmgr node_ptr   node_name   host_name   port_number

where

node_name is the name of the Node Manager. This name must be unique among all Node Managers running in the BRM network.

host_name is the host name or IP address of the computer running this node.

port_number is the port number of the computer running this node.

For information about other entries in the configuration file, refer to the comments preceding each entry in the file.

Getting the Status of the Servers on All Nodes

You can use the testnap utility or the command-line interface to connect to System Manager and get information. To use the testnap utility, you first create a file to be used for the input flist. When testnap reads this file, it generates an flist and sends it to System Manager. In the example below, you create a file called nodes with two entries:

The first entry is the POID.
The second entry tells System Manager that you want to search all nodes.

To get the status of all the servers on all nodes, perform these steps:

Edit your testnap configuration file to connect to System Manager:

#
- nap cm_ptr 11980                          #"well known port" for System Manager
- nap cm_name creator                       # where System Manager runs
- -   userid  0.0.0.1 /service/pcm_client 1 # userid
#
- nap login_type      1                     # type 1 is with password
- nap login_name      root.0.0.0.1
- nap login_pw        password
#

Create a file with these entries and save it:

0 PIN_FLD_POID           POID [0] 0.0.0.1 /service 1 0
0 PIN_FLD_TYPE           ENUM [0] 2

Read the file you created with testnap:
```
r filename 1
  
Example:
r nodes 1
  
```
Use opcode 802 to get the status of all servers:
```
xop 802 - 1
  
```

The returned result is:

0 PIN_FLD_POID           POID [0] 0.0.0.1 /service 1 0
0 PIN_FLD_RESULT         ENUM [0] 0
0 PIN_FLD_NODES         ARRAY [0] allocated 4, used 4
1     PIN_FLD_NODE_NAME       STR [0] "pinpc43_node"
1     PIN_FLD_RESULTS       ARRAY [0] allocated 2, used 2
2         PIN_FLD_SERVER_NAME     STR [0] "dm1"
2         PIN_FLD_STATUS         ENUM [0] 2
1     PIN_FLD_RESULTS       ARRAY [1] allocated 2, used 2
2         PIN_FLD_SERVER_NAME     STR [0] "cm1"
2         PIN_FLD_STATUS         ENUM [0] 2
1     PIN_FLD_RESULT         ENUM [0] 0
0 PIN_FLD_NODES         ARRAY [2] allocated 4, used 4
1     PIN_FLD_NODE_NAME       STR [0] "creator_node"
1     PIN_FLD_RESULTS       ARRAY [0] allocated 2, used 2
2         PIN_FLD_SERVER_NAME     STR [0] "dm2"
2         PIN_FLD_STATUS         ENUM [0] 2
1     PIN_FLD_RESULTS       ARRAY [1] allocated 2, used 2
2         PIN_FLD_SERVER_NAME     STR [0] "cm2"
2         PIN_FLD_STATUS         ENUM [0] 2
1     PIN_FLD_RESULT         ENUM [0] 0

The results show that:

BRM servers are configured under two nodes named pinpc43_node and creator_node.
Two servers named cm1 and dm1 are configured under node pinpc43_node.
Two servers named cm2 and dm2 are configured under node creator_node.
All servers are in the running state.