Troubleshooting OSM

12 Troubleshooting OSM

This chapter provides guidelines to help you troubleshoot problems with your Oracle Communications Order and Service Management (OSM) system.

Information You Need for Troubleshooting

When you are diagnosing and resolving problems, you must be able to obtain the following information:

Database AWR report for a particular period of time.
Database ASH report for a particular period of time.
Oracle WebLogic Server administration server logs and output files.
WebLogic Server managed server logs and output files.
WebLogic Server node manager's logs and output files.
JVM garbage collector logs.
JVM heap dumps.
JVM thread dumps (several in succession).
OSM model and a single order extracted from the database schema. For more information, see "Exporting and Importing the OSM Model and a Single Order."
Java Flight Recorder (JFR) recordings.

General Checklist for Resolving Problems

If you have a problem with your OSM system, go through the following checklist before you contact Oracle Technical Support:

What exactly is the problem? Can you isolate it? For example, if an order causes a problem on one computer, does it give the same result on another computer?

Oracle Technical Support needs a clear and concise description of the problem, including when it began to occur.
What do the log files say?

This is the first thing that Oracle Technical Support asks for. Check the error log for the OSM component you are having problems with.
Have you read the documentation?

Look through the list of common problems and their solutions in "Diagnosing Some Common Problems with OSM" and "Chapter 12 Debugging and Troubleshooting" in OSM Cloud Native Deployment Guide.
Has anything changed in the system? Did you install any new hardware or new software? Did the network change in any way? Does the problem resemble another one you had previously? Has your system usage recently jumped significantly?
Is the system otherwise operating normally? Has response time or the level of system resources changed? Are users complaining about additional or different problems?

Diagnosing Some Common Problems with OSM

This section describes common problems and their solutions.

Cannot Log in or Access Certain Functionality

If you cannot log in or access certain functionality, check the following possible causes:

Are you a valid user in the WebLogic Server security realm?
Is the OSM web application deployed?
Are all OSM Enterprise Java Beans (EJB) deployed?
Are the OSM database resources deployed?
Do you belong to the correct groups in the WebLogic Server security realm?
Do you belong to any OSM workgroup?

System Appears Slow

If the functionality of OSM appears to be present, but performance is slow, check the following possible causes:

The amount of memory being used (check the memory configuration in the WebLogic server startup script on the workstation where you have deployed OSM).
The CPU and disk usage on the machine hosting the OSM database.
The database performance (for example, using AWR reports).
For slow worklist access, check the number of flexible headers on your worklist. The number of flexible headers has a direct negative effect on worklist performance.

Error: "Java.lang.StackOverflowError" when Using Task Web Client

You may see the error "Java.lang.StackOverflowError" in the log files. If this happens, you can address the problem by tuning the thread stack size parameter.

Note:

The procedures below set the value to 2 MB. This is a suggested value to start with, but you should adjust the value if necessary, according to your needs.

In your instance, project or shape specification file, add or append the following parameter and adjust the value as necessary:

shape: 
   user_mem_args: "-Xss2m"

Unexpected Logout from Web Client

If your system is running on a WebLogic Server cluster, and the following conditions apply:

a user is viewing an order in the Order Management web client or Task web client
that order is hosted on a managed server that fails or is shut down

the user will be logged out of the web client and will have to log back in. See the discussion of order affinity in OSM Installation Guide for more information about orders being hosted on a particular managed server.

Error: "Login failed. Please try again."

If the error "Login failed. Please try again" is displayed when trying to log in through the web client and you have entered the correct user name and password, you probably do not belong to the correct groups in the WebLogic Server security realm.

To resolve this issue , log in to the WebLogic Administration Console using the administrator account. Make sure you have been added to the group OMS_client. Try to log in again.

Automation Plug-ins Are Not Getting Called

If the custom automation plug-ins are not getting called, check the following possible causes:

Is the Automation configuration deployed properly?
Are the JMS resources deployed?
Are the JMS destinations, queues, and topics configured properly?

Delayed JMS Messages

Clock synchronization issues may cause JMS request and response messages to remain in a queue in a delayed state.

When a message is sent, it is stamped with the time on the sender's machine. When the message arrives at the JMS destination, if the recipient's machine has a clock that is running several minutes slower, the timestamp is displayed as a future time and the WebLogic server decides to delay the message until the future time arrives.

To prevent this problem, use network time protocol (NTP) servers to synchronize clocks across all machines in a cluster.

Error Message For Events From a JMS Topic in a Cluster

The following message can occur when attempting to process an event from a JMS topic:

<Message-Driven EJB: YourCartridgeName_1.0.0.0.0_YourPluginName_orderCompleteEventMDB's transaction was rolled back. The transaction details are: ....

OSM does not support JMS topics within an OSM clustered environment. For more information about OSM queue configuration, see the discussion of OSM integration with external systems in OSM Installation Guide.

Unexpected Values for JMS Properties

There are some situations in which OSM may set the JMS properties on a message to values that you do not expect.

Sometimes messages in the queue that were received from external systems will have JMS properties that were not set by the external system.

This is because when OSM sends a request to an external system, any messages received in response must be correlated back to an appropriate automator. When a message is first received, before it is placed on the queue, OSM finds the handling task's context based on the correlation data in the message. OSM adds this context, and some additional data associated with the message-driven bean (MDB) that received the message, to the message using additional JMS message properties. You can see the context data by browsing the messages in the queue using the WebLogic Server Administration console.

When OSM runs in a cluster, this message is sometimes redirected to a message queue hosted on a different managed server from the one that sent the request.
Sometimes messages have values that seem wrong. For example, a message might have JMS properties with pluginJndiName=X and cartridgeNamespace=Y, but plug in X is not in a cartridge with namespace Y.

These property values are implementation details and will not necessarily match what you expect. For example, because different plug ins can share the same MDB, the pluginJndiName property may not contain the name of the plug in that actually handles the message.

Unable to Bring Up Managed Server After Database Failure

OSM fails to start if the Oracle database is unreachable. In addition, if the database is Oracle RAC, both database nodes specified by the data source URLs must be reachable. When WebLogic Server starts, failure of a database node is handled gracefully (processing fails over to the other node).

JBoss Cache Timeouts

Long full garbage collections can cause JBoss timeout errors to appear in the log files.

OSM Fails to Process Orders Because of Metadata Errors

Metadata errors can occur in any cartridge with orchestration model entities and can cause order processing failures. Search for the string Metadata Errors in the Console view of the Cartridge Management editor in Design Studio. If you are not using Design Studio to deploy cartridges, look in the WebLogic Server logs for the same string.

DataDictionary Expansion Level

If you are having issues deploying cartridges, the cause may be related to the DataDictionary expansion level. In Oracle Communications Design Studio, under Windows preferences, increase the DataDictionary expansion level to 10. In some cases, you may need to increase the level to more than 10.

Quick Fix Button Active During Order Template Conflicts in Design Studio

Conflicts can occur when order templates are created in Design Studio. Presently, Quick Fix does not work for order template conflicts, even if the Quick Fix button is active. All order template conflicts must be resolved manually.

Cannot Create New Orders on a New Cartridge Version

Order creation can fail on a new version of an existing cartridge, even after you have updated all required entities, and built and deployed the cartridge.

When the createOrder request fails, you receive a response like the following example:

<env:Envelope xmlns:env="http://schemas/soap/envelope/"> 
   <env:Header/> 
   <env:Body> 
      <env:Fault 
xmlns:ord="http://URL/communications/ordermanagement"> 
         <faultcode>ord:fault</faultcode> 
         <faultstring>Failed to create and start the order due to 
java.lang.RuntimeException: OMSException: encountered error 
starting orchestration caused by:Cannot find task for notification 
id</faultstring> 
         <faultactor>unknown</faultactor> 
         <detail> 
            <InvalidOrderSpecificationFault 
xmlns="http://URL/communications/ordermanagement"> 
               <Description>Failed to create and start the order due to 
java.lang.RuntimeException: OMSException: encountered error 
starting orchestration caused by:Cannot find task for notification 
id</Description> 
            </InvalidOrderSpecificationFault> 
         </detail> 
      </env:Fault> 
   </env:Body> 
</env:Envelope>

To resolve this issue:

Open the solution cartridge.
Click the Dependency tab of the model project.
Remove all the dependencies that are displayed for the project.
Re-add all the dependencies.
Restart Design Studio.

Error: "exact fetch returns more than requested number of rows"

You may see the error "exact fetch returns more than requested number of rows"" in the log files if there are memory issues relating very large orders causing contention issues in orchestration XQuery calls when multiple orchestration plans are running at the same time. The default orchestration plan concurrency level is 3. You can reduce this value as described below.

To resolve this issue, decrease the orchestration plan concurrency level in your project specification file.

export JAVA_OPTIONS="${JAVA_OPTIONS} -Doracle.communications.ordermanagement.orchestration.generation.model.ConcurrencyLevel=2

Error: "unique constraint violated"

You may see the "unique constraint violated" error in the log files if you retry to purge order data that you already tried to purge once but failed.

ORA-00001: unique constraint ...violated
ORA-06512: at "<database_schema>.OM_SQL_LOG_PKG", line 335
ORA-06512: at "<database_schema>.OM_PART_MAINTAIN", line 5012
ORA-06512: at "<database_schema>.OM_PART_MAINTAIN", line 5599
ORA-06512: at "<database_schema>.OM_PART_MAINTAIN", line 5778
ORA-06512: at "<database_schema>.OM_PART_MAINTAIN", line 6191
ORA-06512: at "<database_schema>.OM_PART_MAINTAIN", line 6360
ORA-06512: at "<database_schema>.OM_PART_MAINTAIN", line 6886
ORA-06512: at line 1

This error occurs because there are non-empty exchange tables that are created by the failed purge operation that you performed the first time.

To resolve this issue, you must purge the exchange tables manually before you retry purging.

Getting Help with OSM Problems

If you cannot resolve your problems with OSM, contact Oracle Technical Support.

Before You Contact Support

The first troubleshooting step is to look at the error log for the application or process that reported the problem. Consult "General Checklist for Resolving Problems" before reporting the problem to Oracle.

Reporting Problems

If "General Checklist for Resolving Problems" does not help you to resolve the problem, write down the pertinent information:

A clear and concise description of the problem, including when it began to occur.
Relevant portions of the log files.
Recent changes in your system, even if you do not think they are relevant.
List of all the OSM components and patches installed on your system.
Have ready all specification files (project, instance and shape) used to create the OSM instance.

When you are ready, report the problem to Oracle.