39 Using Error Logs to Troubleshoot BRM

Learn how to read error messages when troubleshooting Oracle Communications Billing and Revenue Management (BRM).

Topics in this document:

Using Error Logs to Troubleshoot BRM

BRM error log files provide detailed information about system problems. If you are having a problem with BRM, look in the log files.

Log files include errors that need to be managed, as well as errors that do not need immediate attention (for example, invalid logins). To manage log files, you should make a list of the important errors for your system, as opposed to errors that do not need immediate attention.

Understanding Error-Message Syntax

BRM error messages use this syntax:

[severity] [date_&_time] [host_name] [program]:[pid] [file]:[line] [correlation_id] [message] 

where:

  • severity is the severity of the error message. It can be one of the following values:

    • E: Error. An error indicates that a component of your BRM system is not operating correctly. This is the most severe type of problem.

    • W: Warning. Warnings indicate that the system cannot perform a task because the database contains incorrect or inconsistent data. The system continues to operate, but you should investigate and resolve problems with the data immediately.

    • D: Debug. Debugging messages, which indicate problems with an application, are typically used by application developers to diagnose errors in custom applications. You see these messages only if you set error reporting to the highest level.

  • date_&_time is the date and time the message was logged.

  • host_name is the name of the computer generating the message. If several machines are sharing a log file using Network File System (NFS), or if all log files are stored in a central location, use this information to pinpoint the machine with the problem.

  • program is the name of the program (or process) generating the log message. This information helps resolve problems in billing utilities, for example, because all billing utilities use the same log file.

  • pid is the process ID for the process generating the log message. This information helps resolve problems in components, such as Connection Managers (CMs) and Data Managers (DMs), that might have many processes running in parallel with the same name (program).

  • file is the name of the source file where the error was detected. Technical Support uses this information when diagnosing system problems.

  • line is the line number in the source file where the error was detected. Technical Support uses this information when diagnosing system problems.

  • correlation_id is the identifier for all error messages related to a single error occurrence. This information can be used to sort error messages and to identify the set of error messages generated from a single error occurrence.

  • message is a detailed description of the error condition. Part of the message often includes the error type, location, and code, which you can use to interpret the error. See "Reference Guide to BRM Error Codes".

Resolving Clusters of Error Messages

An error often produces a cluster of error messages in the log file. The Facilities Modules (FMs), especially, tend to generate cascading messages. To resolve the error, isolate the group of messages, as defined by their common correlation ID, date/time, and process ID, and look at the first one in the series. The error location for that message generally indicates the source of the problem. Then find the last message text in the first error, to identify the operation that was associated with the error. Always consider whether an error could have been caused by something happening in a downstream process.

Logging External User Information in Error Logs

When using an external application to connect to BRM, if a request from the external application fails, BRM logs the BRM user, the external user and the external correlation ID in the correlation_ID of the log header, to identify the original request from the external application.

The additional information in the correlation_ID uses the following syntax:

BRM_user::external_user:external_correlation_ID

Where:

  • BRM_user: Specifies the user with which BRM client connects to the CM.

  • external_user: Specifies the user from the external system connecting to BRM.

  • external_correlation_ID: Specifies the correlation ID that an external system passes to BRM in order to correlate between an operation within its system and the corresponding operations in BRM.

For example:

2:CT1255:Account_Manager:1948:1684:63:1063403309:14:root.0.0.0.1::user1:123456789

Note:

If the external application does not provide the external user and external correlation ID, the correlation_ID displays empty strings.

For example:

2:CT1255:Account_Manager:1948:1684:63:1063403309:14:root.0.0.0.1:::

Interpreting Error Messages

The following examples show the typical process for evaluating and interpreting error messages to resolve problems with BRM.

Example 1: Failure of a Client Application

A BRM client application fails and displays an error message.

  • Look in the application's log file. The file shows the following error message:

    E Fri Sep 12 14:50:05 2003 db2.corp:12602 sample_app.c:173
    1:CT1255:Account_Manager:1948:1684:63:1063403309:14 
    op_cust_create_acct error [location=pin_errloc_dm class= errno= field num= recid=<0> reserved=<0>]
    

    The message shows that:

    • At 014:50:05 the system returned an error.

    • The host name is db2.corp.

    • The file name is sample_app.c.

    • The line of code is 173.

    • The correlation ID is

      1:CT1255:Account_Manager:1948:1684:63:1063403309:14
      
    • There was a problem creating an account.

    • The error was first found in the DM.

    Check the Data Manager log file (dm_oracle.pinlog) for an error message that occurred at the same time and has the same correlation ID, in this case Fri Sep 12 14:50:5 and
    1:CT1255:Account_Manager:1948:1684:63:1063403309:14:
    E Fri Sep 12 14:50:05 2003 db2.corp dm:12250 dm_subr.c(1.58):351 
    1:CT1255:Account_Manager:1948:1684:63:1063403309:14
    ORACLE error: do_sql_insert: oexec: code 1653, op 4, peo 1
    =ORA-01653: unable to extend table PIN.EVENT_T by 512 in
    tablespace PIN00"insert into event_t ( poid_db, poid_id1,
    poid_id0, poid_type, aobj_db, 
    

    The error message shows an Oracle error, with the Oracle code 1653.

  • Consult the Oracle documentation. Code 1653 indicates that there is a problem with growing an extent. Because the error message reported that BRM was unable to extend one of the tables, you can infer that the problem is that there is no more room in the database and you must increase its size, as explained in the Oracle documentation.

Example 2: Getting More Information from Error Numbers

You cannot start the DM.

  • Check the dm_oracle.pinlog file, which shows the following error message:

E THU Sep 11 00:30:49 2003 kauai dm:29349 dm_main.c(1.74):1723
1:CT1255:dm:28492:1:0:1063265316:0
DM master dm_die:"bad bind(2)", errno 125

This error message indicates that the DM cannot initiate itself. Usually, errno followed by a number means that a system message is associated with this error. You can check the error file: /usr/include/sys/errno.h. In this case, error 125 is listed as “EADDRINUSE: Address already in use". In other words, the DM process is trying to use a port that is already in use by another process.

Example 3: Getting More Information about Oracle Errors

An error in the application log file indicates the error location is the DM.

  • Check the dm_oracle.pinlog file, which shows the following error message:

E WED Aug 18 01:40:07 2003 kauai dm:402.354 dm_subr.c(1.80):481
1:CT1255:dm:28509:1:0:1061195411:7
ORACLE error: do_sql_insert: obndrv: code 1036, op 28, peo 0
=ORA-01036: illegal variable name/number
was binding ":poid_DB" buf 0x195b180, bufl 5, ftype 5

The message shows an Oracle error, number 1036, which you can investigate in the Oracle documentation by using the oerr command.

% oerr ora 1036
01036, 00000, "illegal variable name/number"
// *Cause: Unable to find bind context on user side
// *Action: Make sure that the variable being bound is in the sql statement.

The obndrv function is looking for the variable :poid_DB in the SQL statement, but the error says that it is not there.