38 Diagnosing Some Common Problems with BRM

Learn about some common problems that you might find when troubleshooting Oracle Communications Billing and Revenue Management (BRM).

Topics in this document:

Problems Starting BRM Components

Problem: Bad Bind, Error 13

One of the log files (log or pinlog) for the DM or CM has a reference to “bad bind" and “errno 13".

Cause

The port number specified in the configuration file (dm_port or cm_ptr entry) is incorrect.

Another possibility is that the port number is below 1023, and the CM, CMMP, or DM was not started as root. System processes that use port numbers below 1023 must be started as root. If you use a port number greater than 1024, you do not have to start the process as root.

Solution

Edit the configuration file for the component to specify an unassigned port number above 1023, such as 1950.

Problem: Bad Bind, Error 125

The log file for the DM or CM has a reference to “bad bind" and “errno 125".

Cause

Duplicate port number. Some other process is already using the port.

Solution

Edit the configuration file for the component to specify an unassigned port number above 1023, such as 1950.

Problem: Cannot Connect to Oracle Database

When you look at the processes running, you see the Oracle DM master and front ends running but no back end running.

Causes

  • The database name configuration entry (sm_database) for the DM points to the wrong database name. (The error message shows which database name the DM is trying to connect to.)

  • The Oracle password configuration entry (sm_pw) is missing.

  • The Oracle tnsnames file is missing or incorrect.

  • The oracle_sid or oracle_home environment variable is set incorrectly.

  • The Oracle DM is spawning too many back-end processes simultaneously for the IPC or BEQ protocol to handle.

Solutions

  • Enter the correct database name and Oracle user name and password for the BRM database in the configuration file for the Oracle DM and restart the DM.

  • Create a valid Oracle tnsnames file and check the environment variables.

  • If you are using the IPC or BEQ protocol, configure the Oracle DM to wait a specified amount of time before spawning or respawning a new back-end process. To do this, add the following entry to the Oracle DM configuration file (BRM_home/sys/dm_oracle/pin.conf), where BRM_home is the directory in which BRM is installed:

    - dm dm_restart_delay DelayTime 

    Note:

    Adding a delay increases the Oracle DM startup time.

    where DelayTime is the amount of time, in microseconds, the Oracle DM should wait before spawning a new back-end process. Set DelayTime to the smallest possible time that fixes your connection problems. As a guideline, start with 1000000 microseconds (1 second) and then decrease the time until you find the optimal setting for your system.

Problem: Lost TCP Connections

Cause

BRM recognizes when an application closes a TCP connection. If the computer running the client application fails, however, the application might not close the TCP socket.

Solutions

In the pin.conf files for the CM and the Connection Manager Master Process (CMMP), the keepalive entry specifies whether to monitor the TCP connection. See "Enabling Java PCM Clients to Use Operating System TCP/IP Keepalive Parameters".

Note:

This entry should be set to avoid sockets not being closed properly due to network problems or hardware crashes.

The CM monitors the TCP connections by using the standard TCP keepalive feature. This lets you detect lost connections and clean up the CM and DM.

With the keepalive feature turned on, BRM uses the system's keepalive APIs to detect a lost connection and to try to reconnect, before closing the socket.

For more information on TCP keepalive options, see the TCP and keepalive documentation for your operating system.

Problem: Hung and or Looping Processes

A hung process does not respond in a normal fashion.

A looping process uses CPU cycles without doing any useful work.

Cause

If the CM does not respond to a login attempt, one of the processes in the system might be hung.

If the CPU time for a process is increasing and is out of proportion to the rest of the processes, this might be a looping process.

Solutions

Check the status of the CM. See "Monitoring Connection Manager Activity". The CM should show a new connection. If the CM report shows that the CM is “waiting on DM," the DM might be hung. You can check the database by verifying that it responds to manual SQL commands.

To check the CPU time used by a process, enter the following command twice, separated by a 10- to 30-second interval (or as much as several minutes on a lightly loaded system):

ps -ef | grep process 
Stopping a Hung or Looping Process

Note:

Before you stop a hung or looping DM or CM process, check its status at least twice at 30-second intervals (or up to several minutes on a lightly loaded system). For more information, see "Monitoring Data Manager Activity" or "Monitoring Connection Manager Activity".

Enter the following command to stop a hung or looping process:

kill -ABRT process_id 

BRM stops the process and writes a core image file of the process. If you contact Oracle Technical Support about this problem, send the core file along with the relevant log files. (See "Contacting Technical Support".)

Problem: ORA-01502: Index 'PINPAP.I_EVENT_ITEM_OBJ__ID' or Partition of Such Index Is in Unusable State

While loading the CDRs using the direct path load option, an error stating that the index is in an unusable state occurs.

Cause

While IREL processes the CDRs using the direct path loading option, it updates the indexes. However, as the index is being updated, another application, for example, pin_monitor_balance would also access the same index partition.

Solution

Configure the dm_sql_retry entry in the pin.conf file. This is specified as an integer value that indicates the number of times an SQL statement is to be retried if this error occurs.

Note:

This is not a mandatory parameter to be set in the pin.conf file. The default behavior is to not try running the SQL statement if the error occurs.

Problems Stopping BRM Components

Problem: No Permission to Stop the Component

You run the stop script, but the script fails. You find a reference to “permission denied" in the log file for the component.

Cause

You do not have permission to stop the BRM system.

Solution

Log in as root or as the user who started the BRM system.

Problem: No pid File

You run the stop script, but the script fails. You find a reference to “no pid file" in the log file for the component.

Cause

BRM cannot find the .pid file.

Solution

Identify the process ID for the component you want to stop, and then stop the process manually. See "Starting and Stopping the BRM System".

Problems Connecting to BRM

Problem: Cannot Connect to the Database

When you try to start a client application, you get an error message advising you of “problems connecting to the database."

Cause

The CM might not be set to handle the number of current client sessions.

Solution

Set the cm_max_connects entry in the configuration file for the CM to a number larger than the number of client sessions you anticipate. Then restart the CM.

Problem: AMM Takes Longer Time to Connect to the Database

When you try to enable the Account Migration Manager (AMM) jobs by running the pin_amt utility, the Authentication lapse error is logged in the pin_amt.log file.

Cause

When enabling the migration jobs, AMM takes much longer time than expected to connect to the Oracle database.

Solution

Do the following:

  1. Open the BRM_home/bin/pin_amt file in a text editor.

  2. Search for the following entries in the file:

    $JAVA -DPIN_HOME=$PIN_DIR 
    -DJAVA_OPTS="-Xmx2048m -Xms256m" com.portal.amt.AmtUI $*
    
  3. Replace the entries with the following:

    $JAVA -Djava.security.egd=file:/dev/./urandom 
    -Dsecurerandom.source=file:/dev/./urandom 
    -DPIN_HOME=$PIN_DIR -DJAVA_OPTS="-Xmx2048m -Xms256m" com.portal.amt.AmtUI $*
    
  4. Save and close the file.

Problem: Cannot Connect to the CM

An application cannot connect to BRM, and the log file for the application (which might be default.pinlog in the current directory) shows the error “PIN_ERR_NAP_CONNECT_FAILED(27)."

Causes

  • The configuration file (pin.conf) for the application might be pointing to the wrong CM.

  • The CM is not running.

  • The CM is not set to handle this many connections.

  • No TCP sockets are available on the client or CM machine, perhaps because you used many sockets recently and the sockets have not been released from their two-minute wait period after the connections were closed.

Solutions

  • Open the configuration file for the application and check the entries that specify the CM.

  • Check for CM processes. See "Checking the Number and ID of a BRM Process".

  • Set the cm_max_connects entry in the configuration file for the CM to a number larger than the number of application sessions you anticipate. Then restart the CM.

  • Wait a few minutes to see if the sockets are freed up.

    On Solaris: To see how many sockets are available:

    netstat -n -f inet -p tcp | wc -l
    

    On Linux: To see how many sockets are available:

    netstat -n -A inet -t | wc -l
    

    If the resulting number is close to 65535, there are too many socket connections for a single IP address on this machine.

Problem: CM Cannot Connect to a DM

You might find a message similar to the following:
DMfe #3: dropped connect from 111.122.123.1:45826, too full
W Thu Aug 06 13:58:05 2001 portalhost dm:17446 dm_front.c(1.47):1498

Cause

There are not enough connections allowed for the DM.

Solution

  • Use the dm_max_per_fe parameter in the DM configuration file to increase the number of CM connections allowed.

  • Install and configure an additional DM.

Problem: Rated Event Loader Cannot Connect to the CM

If SSL is enabled in BRM, sometimes Rated Event Loader fails to connect to the CM with the following error message:

An error occurred while attempting to connect to the Infranet CM. 
Please validate the infranet.connection property value and ensure the CM is 
running.

Cause

There might be a heavy load on Rated Event Loader.

Solution

Do the following:

  1. Create a copy of the BRM_home/wallet/client directory by running the following command:

    cp -r BRM_home/wallet/client BRM_home/wallet/client_ssl
    
  2. Open the BRM_home/apps/pin_rel/Infranet.properties file in a text editor.

  3. Modify the infranet.pcp.ssl.wallet.location entry to point to the absolute path of the directory that you created in step 1:

    infranet.pcp.ssl.wallet.location = BRM_home/wallet/client_ssl 
    

Problems with Deadlocking

Problem: BRM “Hangs" or Oracle Deadlocks

Your BRM system stops responding, or Oracle reports deadlocking messages.

Cause

The DM might have too few back ends for the type of BRM activity.

Solution

Configure the DM with more back ends. For example, provide at least two DM back ends for each customer service representative. For more guidelines on setting the number of back ends, see "Improving Data Manager and Queue Manager Performance".

Problem: dm_oracle Cannot Connect to the Oracle Database

The Oracle DM (dm_oracle) waits indefinitely for a response from the Oracle database.

Cause

If there is a problem with the Oracle database, dm_oracle might stop responding when it attempts to connect to the database.

Solution

Set the database_request_timeout_duration parameter in the dm_oracle configuration file (BRM_home/sys/dm_oracle/pin.conf):

- dm database_request_timeout_duration milliseconds

where milliseconds is the number of milliseconds the DM waits for a response. For example:

- dm database_request_timeout_duration 10000

If the database does not respond during the wait period and you are using Oracle RAC, the DM times out and then makes one attempt to connect to another Oracle database instance.

If this pin.conf parameter is not specified or is set to 0, the connection attempt does not time out.

Note:

  • If you are using a single database or a multischema system without Oracle RAC, the DM attempts to connect to the same database schema again. In this case, the timeout setting is useful only if you are experiencing temporary network problems.

  • If you are using Oracle RAC, the tnsnames.ora file must be configured correctly for the reconnection to work.

Problems with Memory Management

Problem: Out of Memory

The DM will not start, and the error log file for the DM refers to “bad shmget" or “bad shmat" and “errno 12." Or, when the system is running, the CM or an application shows the error “PIN_ERR_NO_MEM" in its log file.

Causes

The DM or another queuing-based daemon did not have enough shared memory to complete the operation. This is caused by one or more of the following conditions:

  • Other processes are using all of the shared memory.

  • There are too many CM processes.

  • There are memory leaks in the CM or its FMs.

  • On Solaris: The shared memory segment allocated by one of the DM processes has not been cleaned up properly, leaving a sizeable chunk of memory allocated but unused. This condition, rare in normal operation, can be caused by the following activities:

    • Repeated starting and stopping of the system.

    • Stopping the DMs manually, especially by using kill -9.

  • On Solaris: The shared memory configuration for the system is less than the shared memory set for BRM.

Solutions

To check for memory leaks, use ps with the vsz flag at two or more intervals to see changes in shared memory.

On Solaris:

ps -eo pid,vsz,f,s,osz,pmem,comm | egrep 'cmldml [application]'

On Linux:

ps -eo pid,vsz,f,s,sz,pmem,comm | egrep 'cmldml [application]'

For a CM, vsz should grow only until an operation has passed through the CM and then stay constant. For example, if vsz is growing during billing, there is a memory leak.

To check for and clean up unused memory on Solaris:

  1. Stop all DM processes. See "Starting and Stopping the BRM System".

  2. Confirm that there are no DM processes running. See "Checking the Number and ID of a BRM Process".

  3. Run df -k to check swap space usage. Confirm that the available space is very low.

  4. Run ipcs -ma to show the shared memory segments that have been allocated but not used recently. A shared memory segment is probably abandoned when you see the following conditions:

    • Number of attaches (NATTCH) is 0

    • KEY is 0 (and not using a special dm_shmkey)

    • Creator process ID (CPID) is gone

    • Last detach time (DTIME) has a value

  5. Run ipcrm -m segment_id on each of the unused segments to free up the space.

  6. Run df -k again to confirm that the available swap space has been cleared.

  7. Stop and restart the DM processes.

To increase the system shared memory on Solaris, open the /etc/system file and set the shminfo_shmmax configuration parameter to a value greater than the value of dm_shmsize in the DM configuration file (pin.conf). Stop and restart the computer.

Example /etc/system file for a 64 MB system:

set shmsys:shminfo_shmmax=37748736
set shmsys:shminfo_shmmin=1
set shmsys:shminfo_shmmni=100
set shmsys:shminfo_shmseg=10
set semsys:seminfo_semmns=200
set semsys:seminfo_semmni=70

In this example, the shared memory segment has been set to 36 MB (1048576 times 36).

To check for and clean up unused memory on Linux:

  1. Stop all DM processes. See "Starting and Stopping the BRM System".

  2. Confirm that there are no DM processes running. See "Checking the Number and ID of a BRM Process".

  3. Run df -k to check swap space usage. Confirm that the available space is very low.

  4. Run ipcs -ma to show the shared memory segments that have been allocated but not used recently.

  5. Run ipcs -mac to show the shared memory segments that have been allocated along with the corresponding user information.

  6. Run ipcs -mat to show the shared memory segments that have been allocated detach timing information.

    Note:

    In steps 4, 5, and 6, a shared memory segment is probably abandoned when you see the following conditions:

    • Number of attaches (NATTCH) is 0

    • KEY is 0 (and not using a special dm_shmkey)

    • Creator process ID (CPID) is gone

    • Last detach time (DTIME) has a value

  7. Run ipcrm -m segment_id on each of the unused segments to free up the space.

  8. Run df -k again to confirm that the available swap space has been cleared.

  9. Stop and restart the DM processes.

To increase the system shared memory on Linux, open the /etc/sysctl.conf file and set the shminfo_shmmax configuration parameter to a value greater than the value of dm_shmsize in the DM configuration file (pin.conf). Stop and restart the computer.

Problem: Java Out of Memory Error

When using GUI applications such as Developer Center or batch applications such as Invoice formatter, you might sometimes receive “java.lang.OutOfMemoryError: Java heap space" error messages.

Cause

The Java application does not have enough memory to complete the operation.

Solution

Increase the maximum heap size used by the Java Virtual Machine (JVM). The exact amount varies greatly with your needs and system resources.

The heap size is controlled by the -Xmx size entry in the Java application startup script. By default, the -Xmx size entry is not present in the startup line. To increase the maximum heap size, add this entry and a number (in megabytes) to the application startup line. The following example adds a 1024 MB maximum heap size to the class:

java -Xmx1024m class

Note:

Increasing the heap size may degrade the performance of other processes if insufficient resources are available. You must adjust the heap size based on your application needs and within your system's limits.

Problem: Memory Problems with the Oracle DM

The error log file for the DM for your Oracle database refers to “No memory for...", such as “No memory for list in pini_flist_grow." You suspect memory problems, but your system has sufficient memory for the environment.

Cause

The DM is not configured to use sufficient shared memory.

Solution

  1. Open the DM configuration file (BRM_home/sys/dm_oracle/pin.conf).

  2. Increase the size of the dm_bigsize and dm_shmsize parameters. Follow the guidelines in the configuration file for editing these entries.

  3. Save the configuration file.

  4. Stop and restart the DM.

Problems Running Billing

Problem: Billing Daemons Are Running, but Nothing Happens

Even though the billing processes are running, BRM is not producing billing data.

Cause

There are too few back ends for the DM. Because billing daemons run in parallel, you must have at least one DM back end for each billing program thread, plus one back end for the main thread searches.

Solution

Edit the dm_n_be entry in the DM configuration file (pin.conf) to add more back ends to the DM, and then stop and restart the DM. See "Configuring DM Front Ends and Back Ends".

Problem: High CPU Usage for the Number of Accounts Processed

Running the billing scripts puts an inordinately heavy load on the computer, and processing the accounts takes a long time.

Cause

An index is missing or unbalanced; or in Oracle, an index is in the CHOOSE Optimizer mode and statistics are out of date.

Solution

Rebuild the BRM indexes before you run the billing scripts. See "Rebuilding Indexes" in BRM Installation Guide.

Problems Creating Accounts

Problem: fm_delivery_mail_sendmsgs Error Reported in the CM Log File

Cause

BRM is trying to send a welcome email message, but the Email DM (dm_email) is not running.

Solution

Start the Email DM, or disable the welcome email message.

Problems Loading Configuration Objects

Problem: Failed to create XML context in isXsltExists, error [266]

Cause

The load_config utility tried to load the contents of XML configuration files into configuration (/config/*) objects in the BRM database, but the contents are not loaded.

Solution

Set the ORACLE_HOME environment variable to the BRM database client library path; for example, /tools/CGBU/contrib/Linux/x86_64/packages/oracle/db/12.2.0.1.0.

Problems During BRM Upgrade

Problem: The BRM root passwords stored in the wallet and /service/pcm object are not matching

A validation error appears due to incorrect BRM root password during the BRM installation for upgrade.

Cause

You have provided an incorrect BRM root password during the BRM installation for upgrade.

Solution

Change the incorrect BRM root password in the BRM client wallet by using the pin_config_editor utility. See "Changing Incorrect BRM Root Password" for more information.

Note:

You can also change the BRM root password in the BRM client wallet by using the PCM_OP_CUST_UPDATE_SERVICES opcode. For more information on the opcode, see BRM Opcode Guide.

Changing Incorrect BRM Root Password

To change the incorrect BRM root password:

  1. Go to the BRM_home/bin directory.

  2. Run the following command:

    pin_config_editor -setconf -wallet clientWalletLocation -parameter -.login_pw –pwd

    where clientWalletLocation is the path to the BRM client wallet.

  3. At the command prompt, enter the existing BRM root password.

    Note:

    Ensure that you provide only the existing BRM root password. You can retrieve the existing BRM root password by using pin_crypt_app utility. See "Retrieving Configuration Entries from Client Wallet for Java PCM Applications" for more information.

  4. Enter the wallet password.

  5. Run the following command:

    pin_config_editor -setconf -wallet clientWalletLocation -parameter infranet.connection 
  6. At the command prompt, enter values listed in Table 38-1.

    Table 38-1 BRM Connection Information

    Field Description

    User Name

    The user name for connecting to BRM.

    Password

    The BRM user's password.

    Host Name

    The IP address or the host name of the machine on which the primary BRM Connection Manager (CM) or CM Master Process (CMMP) are running.

    Port Number

    The TCP port number of the CM or CMMP on the host computer. The default value is 11960.

    Service Type

    The BRM service type. The default value is /service/admin_client.

    Service POID Id

    The POID of the BRM service. The default value is 1.

    Wallet Password

    The password for the BRM wallet.