Diagnosing Some Common Problems with BRM

38 Diagnosing Some Common Problems with BRM

Learn about some common problems that you might find when troubleshooting Oracle Communications Billing and Revenue Management (BRM).

Topics in this document:

Problems Starting BRM Components

Problem: Bad Bind, Error 13

One of the log files (log or pinlog) for the DM or CM has a reference to “bad bind" and “errno 13".

Cause

The port number specified in the configuration file (dm_port or cm_ptr entry) is incorrect.

Another possibility is that the port number is below 1023, and the CM, CMMP, or DM was not started as root. System processes that use port numbers below 1023 must be started as root. If you use a port number greater than 1024, you do not have to start the process as root.

Solution

Edit the configuration file for the component to specify an unassigned port number above 1023, such as 1950.

Problem: Bad Bind, Error 125

The log file for the DM or CM has a reference to “bad bind" and “errno 125".

Cause

Duplicate port number. Some other process is already using the port.

Solution

Edit the configuration file for the component to specify an unassigned port number above 1023, such as 1950.

Problem: Cannot Connect to Oracle Database

When you look at the processes running, you see the Oracle DM master and front ends running but no back end running.

Causes

The database name configuration entry (sm_database) for the DM points to the wrong database name. (The error message shows which database name the DM is trying to connect to.)
The Oracle password configuration entry (sm_pw) is missing.
The Oracle tnsnames file is missing or incorrect.
The oracle_sid or oracle_home environment variable is set incorrectly.
The Oracle DM is spawning too many back-end processes simultaneously for the IPC or BEQ protocol to handle.

Solutions

Enter the correct database name and Oracle user name and password for the BRM database in the configuration file for the Oracle DM and restart the DM.
Create a valid Oracle tnsnames file and check the environment variables.
If you are using the IPC or BEQ protocol, configure the Oracle DM to wait a specified amount of time before spawning or respawning a new back-end process. To do this, add the following entry to the Oracle DM configuration file (BRM_home/sys/dm_oracle/pin.conf), where BRM_home is the directory in which BRM is installed:
```
- dm dm_restart_delay DelayTime 
```
Note:

Adding a delay increases the Oracle DM startup time.

where DelayTime is the amount of time, in microseconds, the Oracle DM should wait before spawning a new back-end process. Set DelayTime to the smallest possible time that fixes your connection problems. As a guideline, start with 1000000 microseconds (1 second) and then decrease the time until you find the optimal setting for your system.

Problem: Lost TCP Connections

Cause

BRM recognizes when an application closes a TCP connection. If the computer running the client application fails, however, the application might not close the TCP socket.

Solutions

In the pin.conf files for the CM and the Connection Manager Master Process (CMMP), the keepalive entry specifies whether to monitor the TCP connection. See "Enabling Java PCM Clients to Use Operating System TCP/IP Keepalive Parameters".

Note:

This entry should be set to avoid sockets not being closed properly due to network problems or hardware crashes.

The CM monitors the TCP connections by using the standard TCP keepalive feature. This lets you detect lost connections and clean up the CM and DM.

With the keepalive feature turned on, BRM uses the system's keepalive APIs to detect a lost connection and to try to reconnect, before closing the socket.

For more information on TCP keepalive options, see the TCP and keepalive documentation for your operating system.

Problem: Hung and or Looping Processes

A hung process does not respond in a normal fashion.

A looping process uses CPU cycles without doing any useful work.

Cause

If the CM does not respond to a login attempt, one of the processes in the system might be hung.

If the CPU time for a process is increasing and is out of proportion to the rest of the processes, this might be a looping process.

Solutions

Check the status of the CM. See "Monitoring Connection Manager Activity". The CM should show a new connection. If the CM report shows that the CM is “waiting on DM," the DM might be hung. You can check the database by verifying that it responds to manual SQL commands.

To check the CPU time used by a process, enter the following command twice, separated by a 10- to 30-second interval (or as much as several minutes on a lightly loaded system):

ps -ef | grep process

Stopping a Hung or Looping Process

Note:

Before you stop a hung or looping DM or CM process, check its status at least twice at 30-second intervals (or up to several minutes on a lightly loaded system). For more information, see "Monitoring Data Manager Activity" or "Monitoring Connection Manager Activity".

Enter the following command to stop a hung or looping process:

kill -ABRT process_id

BRM stops the process and writes a core image file of the process. If you contact Oracle Technical Support about this problem, send the core file along with the relevant log files. (See "Contacting Technical Support".)

Problem: ORA-01502: Index 'PINPAP.I_EVENT_ITEM_OBJ__ID' or Partition of Such Index Is in Unusable State

While loading the CDRs using the direct path load option, an error stating that the index is in an unusable state occurs.

Cause

While IREL processes the CDRs using the direct path loading option, it updates the indexes. However, as the index is being updated, another application, for example, pin_monitor_balance would also access the same index partition.

Solution

Configure the dm_sql_retry entry in the pin.conf file. This is specified as an integer value that indicates the number of times an SQL statement is to be retried if this error occurs.

Note:

This is not a mandatory parameter to be set in the pin.conf file. The default behavior is to not try running the SQL statement if the error occurs.

Problems Stopping BRM Components

Problem: No Permission to Stop the Component

You run the stop script, but the script fails. You find a reference to “permission denied" in the log file for the component.

Cause

You do not have permission to stop the BRM system.

Solution

Problem: No pid File

You run the stop script, but the script fails. You find a reference to “no pid file" in the log file for the component.

Cause

BRM cannot find the .pid file.

Solution

Identify the process ID for the component you want to stop, and then stop the process manually. See "Starting and Stopping the BRM System".

Problems Connecting to BRM

Problem: Cannot Connect to the Database

When you try to start a client application, you get an error message advising you of “problems connecting to the database."

Cause

The CM might not be set to handle the number of current client sessions.

Solution

Set the cm_max_connects entry in the configuration file for the CM to a number larger than the number of client sessions you anticipate. Then restart the CM.

Problem: AMM Takes Longer Time to Connect to the Database

When you try to enable the Account Migration Manager (AMM) jobs by running the pin_amt utility, the Authentication lapse error is logged in the pin_amt.log file.

Cause

When enabling the migration jobs, AMM takes much longer time than expected to connect to the Oracle database.

Solution

Do the following:

Open the BRM_home/bin/pin_amt file in a text editor.

Search for the following entries in the file:

$JAVA -DPIN_HOME=$PIN_DIR 
-DJAVA_OPTS="-Xmx2048m -Xms256m" com.portal.amt.AmtUI $*

Replace the entries with the following:

$JAVA -Djava.security.egd=file:/dev/./urandom 
-Dsecurerandom.source=file:/dev/./urandom 
-DPIN_HOME=$PIN_DIR -DJAVA_OPTS="-Xmx2048m -Xms256m" com.portal.amt.AmtUI $*

Save and close the file.

Problem: Cannot Connect to the CM

An application cannot connect to BRM, and the log file for the application (which might be default.pinlog in the current directory) shows the error “PIN_ERR_NAP_CONNECT_FAILED(27)."

Causes

The configuration file (pin.conf) for the application might be pointing to the wrong CM.
The CM is not running.
The CM is not set to handle this many connections.
No TCP sockets are available on the client or CM machine, perhaps because you used many sockets recently and the sockets have not been released from their two-minute wait period after the connections were closed.

Solutions

Open the configuration file for the application and check the entries that specify the CM.
Check for CM processes. See "Checking the Number and ID of a BRM Process".
Set the cm_max_connects entry in the configuration file for the CM to a number larger than the number of application sessions you anticipate. Then restart the CM.
Wait a few minutes to see if the sockets are freed up.

On Solaris: To see how many sockets are available:
```
netstat -n -f inet -p tcp | wc -l
```
On Linux: To see how many sockets are available:
```
netstat -n -A inet -t | wc -l
```
If the resulting number is close to 65535, there are too many socket connections for a single IP address on this machine.

Problem: CM Cannot Connect to a DM

You might find a message similar to the following:

DMfe #3: dropped connect from 111.122.123.1:45826, too full
W Thu Aug 06 13:58:05 2001 portalhost dm:17446 dm_front.c(1.47):1498

Cause

There are not enough connections allowed for the DM.

Solution

Use the dm_max_per_fe parameter in the DM configuration file to increase the number of CM connections allowed.
Install and configure an additional DM.

Problem: Rated Event Loader Cannot Connect to the CM

If SSL is enabled in BRM, sometimes Rated Event Loader fails to connect to the CM with the following error message:

An error occurred while attempting to connect to the Infranet CM. 
Please validate the infranet.connection property value and ensure the CM is 
running.

Cause

There might be a heavy load on Rated Event Loader.

Solution

Do the following:

Create a copy of the BRM_home/wallet/client directory by running the following command:
```
cp -r BRM_home/wallet/client BRM_home/wallet/client_ssl
```
Open the BRM_home/apps/pin_rel/Infranet.properties file in a text editor.
Modify the infranet.pcp.ssl.wallet.location entry to point to the absolute path of the directory that you created in step 1:
```
infranet.pcp.ssl.wallet.location = BRM_home/wallet/client_ssl 
```

Problems with Deadlocking

Problem: BRM “Hangs" or Oracle Deadlocks

Your BRM system stops responding, or Oracle reports deadlocking messages.

Cause

The DM might have too few back ends for the type of BRM activity.

Solution

Configure the DM with more back ends. For example, provide at least two DM back ends for each customer service representative. For more guidelines on setting the number of back ends, see "Improving Data Manager and Queue Manager Performance".

Problem: dm_oracle Cannot Connect to the Oracle Database

The Oracle DM (dm_oracle) waits indefinitely for a response from the Oracle database.

Cause

If there is a problem with the Oracle database, dm_oracle might stop responding when it attempts to connect to the database.

Solution

Set the database_request_timeout_duration parameter in the dm_oracle configuration file (BRM_home/sys/dm_oracle/pin.conf):

- dm database_request_timeout_duration milliseconds

where milliseconds is the number of milliseconds the DM waits for a response. For example:

- dm database_request_timeout_duration 10000

If the database does not respond during the wait period and you are using Oracle RAC, the DM times out and then makes one attempt to connect to another Oracle database instance.

If this pin.conf parameter is not specified or is set to 0, the connection attempt does not time out.

Note:

If you are using a single database or a multischema system without Oracle RAC, the DM attempts to connect to the same database schema again. In this case, the timeout setting is useful only if you are experiencing temporary network problems.
If you are using Oracle RAC, the tnsnames.ora file must be configured correctly for the reconnection to work.

Problems with Memory Management

Problem: Out of Memory

The DM will not start, and the error log file for the DM refers to “bad shmget" or “bad shmat" and “errno 12." Or, when the system is running, the CM or an application shows the error “PIN_ERR_NO_MEM" in its log file.

Causes

The DM or another queuing-based daemon did not have enough shared memory to complete the operation. This is caused by one or more of the following conditions:

Other processes are using all of the shared memory.
There are too many CM processes.
There are memory leaks in the CM or its FMs.
On Solaris: The shared memory segment allocated by one of the DM processes has not been cleaned up properly, leaving a sizeable chunk of memory allocated but unused. This condition, rare in normal operation, can be caused by the following activities:
- Repeated starting and stopping of the system.
- Stopping the DMs manually, especially by using kill -9.
On Solaris: The shared memory configuration for the system is less than the shared memory set for BRM.

Solutions

To check for memory leaks, use ps with the vsz flag at two or more intervals to see changes in shared memory.

On Solaris:

ps -eo pid,vsz,f,s,osz,pmem,comm | egrep 'cmldml [application]'

On Linux:

ps -eo pid,vsz,f,s,sz,pmem,comm | egrep 'cmldml [application]'

For a CM, vsz should grow only until an operation has passed through the CM and then stay constant. For example, if vsz is growing during billing, there is a memory leak.

To check for and clean up unused memory on Solaris:

Stop all DM processes. See "Starting and Stopping the BRM System".
Confirm that there are no DM processes running. See "Checking the Number and ID of a BRM Process".
Run df -k to check swap space usage. Confirm that the available space is very low.
Run ipcs -ma to show the shared memory segments that have been allocated but not used recently. A shared memory segment is probably abandoned when you see the following conditions:
- Number of attaches (NATTCH) is 0
- KEY is 0 (and not using a special dm_shmkey)
- Creator process ID (CPID) is gone
- Last detach time (DTIME) has a value
Run ipcrm -m segment_id on each of the unused segments to free up the space.
Run df -k again to confirm that the available swap space has been cleared.
Stop and restart the DM processes.

To increase the system shared memory on Solaris, open the /etc/system file and set the shminfo_shmmax configuration parameter to a value greater than the value of dm_shmsize in the DM configuration file (pin.conf). Stop and restart the computer.

Example /etc/system file for a 64 MB system:

set shmsys:shminfo_shmmax=37748736
set shmsys:shminfo_shmmin=1
set shmsys:shminfo_shmmni=100
set shmsys:shminfo_shmseg=10
set semsys:seminfo_semmns=200
set semsys:seminfo_semmni=70

In this example, the shared memory segment has been set to 36 MB (1048576 times 36).

To check for and clean up unused memory on Linux:

Stop all DM processes. See "Starting and Stopping the BRM System".
Confirm that there are no DM processes running. See "Checking the Number and ID of a BRM Process".
Run df -k to check swap space usage. Confirm that the available space is very low.
Run ipcs -ma to show the shared memory segments that have been allocated but not used recently.
Run ipcs -mac to show the shared memory segments that have been allocated along with the corresponding user information.
Run ipcs -mat to show the shared memory segments that have been allocated detach timing information.
Note:

In steps 4, 5, and 6, a shared memory segment is probably abandoned when you see the following conditions:
- Number of attaches (NATTCH) is 0
- KEY is 0 (and not using a special dm_shmkey)
- Creator process ID (CPID) is gone
- Last detach time (DTIME) has a value
Run ipcrm -m segment_id on each of the unused segments to free up the space.
Run df -k again to confirm that the available swap space has been cleared.
Stop and restart the DM processes.

To increase the system shared memory on Linux, open the /etc/sysctl.conf file and set the shminfo_shmmax configuration parameter to a value greater than the value of dm_shmsize in the DM configuration file (pin.conf). Stop and restart the computer.

Problem: Java Out of Memory Error

When using GUI applications such as Developer Center or batch applications such as Invoice formatter, you might sometimes receive “java.lang.OutOfMemoryError: Java heap space" error messages.

Cause

The Java application does not have enough memory to complete the operation.

Solution

Increase the maximum heap size used by the Java Virtual Machine (JVM). The exact amount varies greatly with your needs and system resources.

The heap size is controlled by the -Xmx size entry in the Java application startup script. By default, the -Xmx size entry is not present in the startup line. To increase the maximum heap size, add this entry and a number (in megabytes) to the application startup line. The following example adds a 1024 MB maximum heap size to the class:

java -Xmx1024m class

Note:

Increasing the heap size may degrade the performance of other processes if insufficient resources are available. You must adjust the heap size based on your application needs and within your system's limits.

Problem: Memory Problems with the Oracle DM

The error log file for the DM for your Oracle database refers to “No memory for...", such as “No memory for list in pini_flist_grow." You suspect memory problems, but your system has sufficient memory for the environment.

Cause

The DM is not configured to use sufficient shared memory.

Solution

Open the DM configuration file (BRM_home/sys/dm_oracle/pin.conf).
Increase the size of the dm_bigsize and dm_shmsize parameters. Follow the guidelines in the configuration file for editing these entries.
Save the configuration file.
Stop and restart the DM.

Problems Running Billing

Problem: Column Size Exceeded for EVENT_POID_LIST

When BRM has events, like payments or adjustments, that transfer items, you may receive an error indicating that the maximum size of the EVENT_POID_LIST column has been exceeded in the ITEM_T, BILLINFO_T, or AU_BILLINFO_T table. For example, you may see this error when running PCM_OP_PAYMENT_COLLECT.

Cause

The standard length of the EVENT_POID_LIST column in the ITEM_T, BILLINFO_T, and AU_BILLINFO_T tables is 4000 characters. Some pricing configurations create values that exceed this limit. The most likely table to generate this error is ITEM_T.

Solution

You can solve this problem by increasing the size of the column in the relevant table or tables.

You must run the following SQL commands before you run any of the table-specific commands below. You only need to run this set of commands once per schema. These commands must be run as a database user with sysdba privileges.

SHUTDOWN IMMEDIATE;
STARTUP UPGRADE;
ALTER SYSTEM SET max_string_size=extended;
@?/rdbms/admin/utl32k.sql
SHUTDOWN IMMEDIATE;
STARTUP;

In addition to the commands above, you must run the following commands once per schema. These commands should be run as the schema user, for example pin01:

update dd_objects_fields_t set length=32000 where FIELD_NAME='PIN_FLD_EVENT_POID_LIST';
commit;
update search_t set flags=0 where POID_ID0=701;
commit;

Run the following sets of commands depending on the table or tables that are causing the error. All of the commands below should be run as the schema user, for example pin01:

For errors from ITEM_T:

alter table item_t add epl_copy varchar2(32000);
update item_t set epl_copy = EVENT_POID_LIST;
update item_t set EVENT_POID_LIST = null;
commit;
alter table item_t modify EVENT_POID_LIST long;
alter table item_t modify EVENT_POID_LIST clob;
update item_t set EVENT_POID_LIST = epl_copy;
alter table item_t drop column epl_copy;
commit;

For errors from BILLINFO_T:

alter table billinfo_t add epl_copy varchar2(32000);
update billinfo_t set epl_copy = EVENT_POID_LIST;
update billinfo_t set EVENT_POID_LIST = null;
commit;
alter table billinfo_t modify EVENT_POID_LIST long;
alter table billinfo_t modify EVENT_POID_LIST clob;
update billinfo_t set EVENT_POID_LIST = epl_copy;
alter table billinfo_t drop column epl_copy;
commit;

For errors from AU_BILLINFO_T:

alter table au_billinfo_t add epl_copy varchar2(32000);
update au_billinfo_t set epl_copy = EVENT_POID_LIST;
update au_billinfo_t set EVENT_POID_LIST = null;
commit;
alter table au_billinfo_t modify EVENT_POID_LIST long;
alter table au_billinfo_t modify EVENT_POID_LIST clob;
update au_billinfo_t set EVENT_POID_LIST = epl_copy;
alter table au_billinfo_t drop column epl_copy;
commit;

Problem: Billing Daemons Are Running, but Nothing Happens

Even though the billing processes are running, BRM is not producing billing data.

Cause

There are too few back ends for the DM. Because billing daemons run in parallel, you must have at least one DM back end for each billing program thread, plus one back end for the main thread searches.

Solution

Edit the dm_n_be entry in the DM configuration file (pin.conf) to add more back ends to the DM, and then stop and restart the DM. See "Configuring DM Front Ends and Back Ends".

Problem: High CPU Usage for the Number of Accounts Processed

Running the billing scripts puts an inordinately heavy load on the computer, and processing the accounts takes a long time.

Cause

An index is missing or unbalanced; or in Oracle, an index is in the CHOOSE Optimizer mode and statistics are out of date.

Solution

Rebuild the BRM indexes before you run the billing scripts. See "Rebuilding Indexes" in BRM Installation Guide.

Problems Creating Accounts

Problem: fm_delivery_mail_sendmsgs Error Reported in the CM Log File

Problem: fm_delivery_mail_sendmsgs Error Reported in the CM Log File

Cause

BRM is trying to send a welcome email message, but the Email DM (dm_email) is not running.

Solution

Start the Email DM, or disable the welcome email message.

To start the Email DM, see "Sending Email to Customers Automatically" in BRM Managing Customers.
To disable the welcome message, see "Setting up Welcome Messages to Customers" in BRM Managing Customers.

Problems Loading Configuration Objects

Problem: Failed to create XML context in isXsltExists, error [266]

Problem: Failed to create XML context in isXsltExists, error [266]

Cause

The load_config utility tried to load the contents of XML configuration files into configuration (/config/*) objects in the BRM database, but the contents are not loaded.

Solution

Set the ORACLE_HOME environment variable to the BRM database client library path; for example, /tools/CGBU/contrib/Linux/x86_64/packages/oracle/db/12.2.0.1.0.

Problems During BRM Upgrade

Problem: The BRM root passwords stored in the wallet and /service/pcm object are not matching

Problem: The BRM root passwords stored in the wallet and /service/pcm object are not matching

A validation error appears due to incorrect BRM root password during the BRM installation for upgrade.

Cause

You have provided an incorrect BRM root password during the BRM installation for upgrade.

Solution

Change the incorrect BRM root password in the BRM client wallet by using the pin_config_editor utility. See "Changing Incorrect BRM Root Password" for more information.

Note:

You can also change the BRM root password in the BRM client wallet by using the PCM_OP_CUST_UPDATE_SERVICES opcode. For more information on the opcode, see BRM Opcode Guide.

Changing Incorrect BRM Root Password

To change the incorrect BRM root password:

Go to the BRM_home/bin directory.
Run the following command:
```
pin_config_editor -setconf -wallet clientWalletLocation -parameter -.login_pw –pwd
```
where clientWalletLocation is the path to the BRM client wallet.
At the command prompt, enter the existing BRM root password.

Note:

Ensure that you provide only the existing BRM root password. You can retrieve the existing BRM root password by using pin_crypt_app utility. See "Retrieving Configuration Entries from Client Wallet for Java PCM Applications" for more information.
Enter the wallet password.

Run the following command:

pin_config_editor -setconf -wallet clientWalletLocation -parameter infranet.connection

At the command prompt, enter values listed in Table 38-1.

Table 38-1 BRM Connection Information

Field	Description
User Name	The user name for connecting to BRM.
Password	The BRM user's password.
Host Name	The IP address or the host name of the machine on which the primary BRM Connection Manager (CM) or CM Master Process (CMMP) are running.
Port Number	The TCP port number of the CM or CMMP on the host computer. The default value is 11960.
Service Type	The BRM service type. The default value is /service/admin_client.
Service POID Id	The POID of the BRM service. The default value is 1.
Wallet Password	The password for the BRM wallet.