7 Troubleshooting ASAP

This chapter provides troubleshooting information for Oracle Communications ASAP.

Overview of Troubleshooting ASAP

As an ASAP system administrator, you can perform the following tasks:

Troubleshooting Checklist

When any problems occur, it is best to do some troubleshooting before you contact Oracle Global Support:

You know your installation better than Oracle Global Support does. You know if anything in the system has been changed, so you are more likely to know where to look first.
Troubleshooting skills are important. Relying on Global Support to research and solve all of your problems prevents you from being in full control of your system.

If you have a problem with your Product system, ask yourself these questions first, because Oracle Global Support will ask them of you:

What exactly is the problem? Can you isolate it? For example, if it is a problem with an application, does it occur on one instance of the application, or all instances?
Oracle Global Support needs a clear and concise description of the problem, including when it began to occur.
What do the log files say?
This is the first thing that Oracle Global Support asks for. Check the error log for the Product component you're having problems with.
Have you read the documentation?
Look through the list of common problems and their solutions in Diagnosing some common problems with Product.
Has anything changed in the system? Did you install any new hardware or new software? Did the network change in any way?
Have you read the Release Notes?
The Release Notes include information about known problems and runarounds.
Does the problem resemble another one you had previously?
Has your system usage recently jumped significantly?
Is the system otherwise operating normally?
Has response time or the level of system resources changed?
Are users complaining about additional or different problems?
Can you run clients successfully?
Are any other processes on the system hardware functioning normally?

If you still can't resolve the problem, contact Oracle Global Support as described in "Getting Help with ASAP Problems".

Using Error Logs to Troubleshoot ASAP

ASAP error log files provide detailed information about system problems. If you're having a problem with ASAP, look in the log files.

Log files include errors that need to be managed, as well as errors that do not need immediate attention (for example, invalid login attempts). To manage log files, you should make a list of the important errors for your system, as opposed to errors that do not need immediate attention.

Understanding Error-Message Syntax

For more information about error message syntax, see "About Log Files".

Collecting Diagnostic Information

For more information about collecting diagnostic information, see "About Diagnostic Files".

Common ASAP Problems and Solutions

This section describes the following problems, and how to resolve them:

Problem: ASAP Servers Do Not Start - Database Tablespaces Used Up
Problem: ASAP Servers Do Not Start - Interfaces File Empty or Missing
Problem: ASAP Servers Do Not Start - No Information In Interfaces File
Problem: ASAP Servers Do Not Start - Wrong ASAP User Owner and Permissions
Problem: ASAP Servers Processes Do Not Start - Database Server Processes Used
Problem: ASAP Servers Processes Do Not Start - Database Server Sessions Used
Problem: ASAP Servers Do Not Start - Insufficient Server User Connections Defined
Problem: ASAP Servers Do Not Start - Insufficient Number of Threads
Problem: Control Server Crashes - No Free Messages
Problem: JNEP Server Does Not Start - Wrong Database Connection Information
Problem: JNEP Server Does Not Start - Invalid Server Port Numbers
Problem: NEP Server Does Not Start - Problem with JNEP Java Process Start Script
Problem: WebLogic Server Fails to Detect Passive RAC Database During Failover

Problem: ASAP Servers Do Not Start - Database Tablespaces Used Up

None or some of the ASAP servers can start as the database tablespace used by the server is used up and there is no free space. Error messages like the following are generated to indicate that writing to the database failed:

Error: RPC 'CSP_system_event' Failed

To resolve this issue increase the tablespace sizes. You may also set autoextend for the tablespaces.

Problem: ASAP Servers Do Not Start - Interfaces File Empty or Missing

ASAP servers are configured properly in database tables but information for all servers is missing in the file $SYBASE/interfaces file. If Control server information is missing, then no process will start when ASAP is restarted.

The following messages are generated:

at the UNIX console

"Failed to start Master Control Server CTRLH720"

in UTILITY.diag file:

'ct_connect(): directory service layer: internal directory control layer error: Requested server name not found.'

in ASAP.Console file:

CTRLH720 Server: Fatal Server Error:
Open Server Error # [16013] Severity [20] State [0] Error Text [Could not find server name 'CTRLH720' in interfaces file]
CTRLH720: srv_run() Failed

Make sure that $SYBASE/interfaces file exists and contains correct information for all ASAP servers configured.

Problem: ASAP Servers Do Not Start - No Information In Interfaces File

$SYBASE/interfaces file has only Control server information, but none of the other servers. However all servers are defined in the server configuration tables in Control server database (tables tbl_appl_proc, tbl_component, etc.).

The Control server will start successfully. For example:

Starting Master Control Server CTRLH720...
Successful Startup of Master Control CTRLH720

The following messages are generated:

at the UNIX console

Error : ASAP server is already running, or can not start one of ASAP server

in UTILITY.diag file, messages are logged for each server that had no info in interfaces file For example:
```
"Local Startup Error: Can't Startup Server NEP_H720 ...'
```

in ASAP.Console file:

Open Server Error # [16013] Severity [20] State [0] Error Text [Could not find server name 'NEP_H720' in interfaces file]
NEP_H720: srv_run() Failed
Failed to start Application [NEP_H720]
SARMH720 Server: Fatal Server Error:
Open Server Error # [16013] Severity [20] State [0] Error Text [Could not find server name 'SARMH720' in interfaces file]
SARMH720: srv_run() Failed
Failed to start Application [SARMH720]

Make sure that $SYBASE/interfaces file exists and contains correct information.

Problem: ASAP Servers Do Not Start - Wrong ASAP User Owner and Permissions

Owner and file permissions might have been changed preventing the ASAP servers record data to the file system. For example, file or folder owner changes or file permission changes for DATA and its subfolders may cause this issue.

The following error message is generated:

/oracle/user1/A720_I/scripts/start_control_sys: line 197: /oracle/user1/A720_I/DATA/logs/ASAP.Console: cannot create [Permission denied]

Make sure that all the folders and files under ASAP installation folder are owned by the ASAP user and the user has all the required file permissions. For example, it should have read/write for everything under DATA folder otherwise it cannot create the server diagnostic files.

Problem: ASAP Servers Processes Do Not Start - Database Server Processes Used

The following error message will be seen in the control or other server diagnostic files:

ORA-00020: maximum number of processes (%s) exceeded

Increase the value of the PROCESSES initialization parameter.

Problem: ASAP Servers Processes Do Not Start - Database Server Sessions Used

The following error message will be seen in the control or other server diagnostic files:

Error - ORA-00018: maximum number of sessions exceeded

Increase the value of the SESSIONS initialization parameter.

Problem: ASAP Servers Do Not Start - Insufficient Server User Connections Defined

Some ASAP servers do not start and a message like the following is observed in the Control server diagnostic file:

"Configuration of 70 connections has been exceeded, connection rejected."

Each ASAP server started establishes a defined number of connections to the Control server. There is an open server configuration parameter named MAX_CONNECTIONS which can be used by any open server application to limit the maximum number of connections from client applications. This problem is usually encountered when additional Network Element Processor (NEP) servers are configured as each additional NEP server means more connections to the Control server.

The solution is to increase the value of the parameter MAX_CONNECTIONS for the Control server in the ASAP_home/config/ASAP.cfg file. You can also set a value in the global section for all ASAP servers. Note that when adjusting this parameter, you may need to adjust additional parameters (see "Improving ASAP Performance").

Problem: ASAP Servers Do Not Start - Insufficient Number of Threads

ASAP servers cannot start with the reason being indicated as some remote procedure calls (RPCs) are not defined. For example:

Error : RPC install_exit_handler Not Defined
Error: Unable to Spawn Service Thread TORONTO - Insufficient Resources
SARMH720 Server: Information: Error 16115 Severity 10 State 0
'Could not start thread
Error: Unable to Spawn Service Thread WO Mgr 2 - Insufficient Resources

ASAP Servers are configured to have a maximum number of threads in the ASAP_home/config/ASAP.cfg file. If this number is not big enough, when a thread is tried to be started, the server process will generate an error message and terminate.

Adjust the value of the parameter MAX_THREADS in the ASAP.cfg, globally or for a specific server.

Note that when adjusting this parameter, you may need to adjust additional parameters (see "Improving ASAP Performance").

Problem: Control Server Crashes - No Free Messages

Control server crashes with the following messages in the Control server diagnostic file:

153957.532:48:PROG:Fatal Thread Error :1429:main.c
CTRLPRD1 Server: Fatal Thread Error:
Open Server Error # [16016] Severity [15] State [0] Error Text [No free messages.]
>> 153957.928:48:PROG:Fatal Thread Error :1436:main.c
Error: [CTRLPRD1] Fatal Thread Error - Terminating

Message resources were used up during high volume WO provisioning because MAX_MSGPOOL is not big enough for Control server.

Increase the value of MAX_MSGPOOL for the Control server in the ASAP_home/config/ASAP.cfg file. This should be done in the CTRL section as otherwise it will affect all the servers.

The minimum value of this configuration value is MAX_MSGQUEUES*256. You may need to adjust additional parameters (see "Improving ASAP Performance")

Problem: JNEP Server Does Not Start - Wrong Database Connection Information

The database connection information (DB_CONNECT) in ASAP.properties was wrong and as a result JNEP was unable to connect to database while starting.

While NEP server java process (JNEP) was creating a pool of database connections, it failed with the exception message :

"The Network Adapter could not establish the connection"
java.sql.SQLException: Io exception: The Network Adapter could not establish the connection
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:125)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:162)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:274)
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:319)
at oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:344)
at oracle.jdbc.driver.T4CConnection.(T4CConnection.java:148)
at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:545)
at oracle.jdbc.pool.OracleDataSource.get
>>

Database connection information in ASAP.properties file is wrong. Correct it as below and restart the NEP server.

DB_CONNECT=(DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = xx.xx.xx.xx)(PORT = 1521))) (CONNECT_DATA = (SERVICE_NAME = orcl)))

HOST, PORT or SERVICE_NAME could be wrong in this property.

Problem: JNEP Server Does Not Start - Invalid Server Port Numbers

NEP server does not start with an error message in the Control server diagnostic file. A port number outside the range might have been assigned to the NEP server in $SYBASE/interfaces. If the port number is within the range, it might have been bound by another process. Or the issue could be with the listener port that is assigned to the NEP java process. It could be out of range or already bound by another process.

Correct the port number for the NEP server in $SYBASE/interfaces file to be inside the range. If the port number is bound by another process already, it can not be used by the NEP server. A free, unused port number should be assigned for the NEP server.

Make sure that the port number assigned to the NEP java process (table tbl_listeners in Control server database) is also within the range and not bound by any other process.

Problem: NEP Server Does Not Start - Problem with JNEP Java Process Start Script

The NEP server java process is started by a script named $NEP_jinterpreter which is under $ASAP_BASE/programs folder. This script may not have proper execute permissions.

Give the proper execution permission for the script.

Problem: WebLogic Server Fails to Detect Passive RAC Database During Failover

In a development environment, when testing Oracle RAC failover capabilities, it is possible to experience the following WebLogic error:

weblogic.jms.common.JMSException: weblogic.messaging.kernel.KernelException: 
Error persisting message

The starting point for this scenario involves the following configuration:

A running ASAP server environment and active WebLogic server instance (ASAP Environment).
A running active Oracle RAC database (RAC1).
A passive Oracle RAC database that is shutdown (RAC2).

To achieve the error message listed above, you must complete the following steps:

Manually startup RAC2.
Manually shutdown RAC1 within four minutes of starting RAC2.

The ASAP Weblogic server instance requires up to four minute to detect RAC2. Shutting down RAC1 within this time period will prevent ASAP from failing over to RAC2 because the ASAP WebLogic server has not had enough time to detect the presence of RAC2.

This error message is possible in a specific scenario that is unlikely to happen in a production environment.

Component Failure Scenarios

This section describes the behavior of ASAP in the event of component failure.

WebLogic Administration and Managed Server Failure and Recovery Scenarios

If the administration server shuts down while the managed server is running, the administration server regains control of the domain upon restart without having to restart the managed server.

If the administration server cannot be restarted following a failure, you must restore the administration server on another machine.

Restarting the WebLogic Managed Server

If the administration server is running, the managed server retrieves configuration data from the administration server. If the administration server is unavailable, the managed server retrieves its configuration and security information from the filesystem.

SRP Failure Scenario

If the Service Request Processor (SRP) fails, the Service Activation Request Manager (SARM) and NEP provision all orders that are saved in the database and are awaiting provisioning. The SARM saves the order notification RPCs in the SARM database. It then sends periodic “heartbeat” messages to the failed SRP to check for availability. When the SRP starts again, the SARM retrieves and processes all order notification RPCs saved on disk before processing new orders.

SARM Failure Scenario

If the SARM fails, the SRP can neither send orders to the SARM nor receive notifications from the SARM, although the SRP is running and receiving orders from the upstream. Therefore, the SRP cannot process orders and remains idle until the SARM is restarted.

NEPs complete the processing of Atomic Service Description Layer (ASDL) requests currently in progress, and then it becomes idle.

The SRP and NEPs send periodic “heartbeat” messages to determine when the failed SARM becomes available.

NEP Failure Scenario

In the event of NEP failure, the SRP continues to translate and send orders to the SARM, which continues to provision ASDLs scheduled to be provisioned by any operational NEP. All ASDLs to be provisioned by NEs managed by the failed NEP are added to SARM Provisioning Pending Queue. The SARM sends periodic “heartbeat” messages to determine when the failed NEP becomes available.

Control Server Failure Scenario

If the Control server is shut down, all other ASAP applications controlled by it are also shut down.

Database Failure Scenario

If the SRP database fails, the SRP is shut down if it relies on the SRP database. For example, the C SRP server. Similarly, the SARM and NEP servers shut down whenever their respective databases fail. If the Control database fails, the Control server is shut down, taking down all other ASAP applications controlled by it. In a distributed environment, if the primary Control server goes down, the secondary Control servers detect this and shut down.

If the SQL Server is down, all databases fail and the entire system is shut down.

NE Unavailability Scenario

If an NE becomes unavailable, all ASDLs to the unavailable NE queue up in the SARM. The SARM will not take new work orders (WOs).

If the asdl_timeout parameter is set for an ASDL, and the timeout parameter exceeds asdl_retry_number parameter, the WO to which the ASDL belongs fails and rolls back.

If the request_timeout parameter (an NE timeout parameter) is set for the WO to which an ASDL belongs, and the request_retry_number (an NE timeout parameter) exceeds, the WO fails and rolls back.

If neither ASDL nor NE timeout parameters are set, and the WO timeout parameters exceed, all orders with ASDLs going to the unavailable NE fail due to timeout.

SRP and SARM Failure Scenario

If both the SRP and SARM fail, each NEP completes its processing of ASDL requests currently in progress. The NEP then closes all connections to NEs and then remains idle.

SARM and NEP Failure Scenario

If the SARM and NEP fail, the SRP cannot send orders or receive notifications from the SARM. No order provisioning is possible.

Getting Help with ASAP Problems

If you can't resolve your ASAP problem, contact Oracle Global Support.

Before You Contact Global Support

Problems can often be fixed by shutting down ASAP and restarting the computer that the ASAP system runs on. To shut down and restart ASAP, see "Starting and Stopping ASAP".

If that doesn't solve the problem, the first troubleshooting step is to look at the error log for the application or process that reported the problem. See "Using Error Logs to Troubleshoot ASAP". Be sure to observe the checklist for resolving problems with ASAP before reporting the problem to Oracle Global Support. See "Troubleshooting Checklist".

Reporting Problems

If the checklist for resolving problems with ASAP does not help you to resolve the problem, write down the pertinent information:

A clear and concise description of the problem, including when it began to occur.
Relevant portions of the relevant log files.
Relevant configuration files.
Recent changes in your system, even if you don't think they are relevant.
List of all ASAP components and patches installed on your system.

When you are ready, report the problem to Oracle Global Support.