13 Troubleshooting Your Services Gatekeeper Implementation

This chapter provides guidelines to help you troubleshoot problems with your Oracle Communications Services Gatekeeper implementation. You can find information about interpreting error messages, diagnosing common problems, and contacting Oracle customer support.

Before you read this chapter, you should be familiar with how Services Gatekeeper works. See Services Gatekeeper Concepts for information.

For information on problems related to Services Gatekeeper performance, see "Handling Performance Issues".

General Checklist for Resolving Problems with Services Gatekeeper

When any problems occur with your Services Gatekeeper system, it is best to do some troubleshooting before you contact Oracle:

  • You know your installation better than Oracle does. You know if anything in the system has been changed, so you are more likely to know where to look first.

  • Troubleshooting skills are important. Relying on Oracle to research and solve all of your problems prevents you from being in full control of your system.

Oracle needs a clear and concise description of the problem, including when it began to occur. If you have a problem with your Services Gatekeeper system, ask yourself these questions first, because Oracle will ask them of you:

  • What exactly is the problem? Can you isolate it? For example, if users cannot authenticate, is it all services or just one service? Does it affect a specific NT server?

  • Is this a known issue?

    Before calling to report an issue, it is a good idea to see if the problem you have encountered is a known issue already. Known issues are listed in Services Gatekeeper Release Notes.

  • Do you have the log files?

    This is the first thing that Oracle will ask for. Check the error log for the Services Gatekeeper module with which you are having problems. Please keep the information handy when you contact Oracle. See "Using Error Logs to Troubleshoot Services Gatekeeper".

  • Is the problem related to external systems?

    Sometimes when there is an issue, the problem maybe related to the communication between Services Gatekeeper and an external system, such as a short message service center (SMSC) or a charging server.

    This information is very helpful to Oracle in resolving such an issue. Capture the network traffic between Services Gatekeeper and the external system and provide it to Oracle.

  • Have you read the documentation?

    Look through the list of common problems and their solutions in "Diagnosing Some Common Problems with Services Gatekeeper".

  • Has anything changed in the system? Did you install any new hardware or new software? Did the network change in any way? Does the problem resemble another one you had previously? Has your system usage recently jumped significantly?

  • Is the system otherwise operating normally? Has response time or the level of system resources changed? Are users complaining about additional or different problems?

  • If the system appears completely dead, check the basics: Can you access the system administration console for Services Gatekeeper? Are other processes on this hardware functioning normally?

  • Stay up-to-date (as much as possible) with the Services Gatekeeper patch set releases provided by Oracle.

    What is your current patch level? See "Finding the Current Patch Level of Your Services Gatekeeper System".

If the error message points to a configuration problem, check the configuration file for the associated module. If you find that the solution requires reconfiguring the module, change the configuration and verify if the problem was resolved.

If you still cannot resolve the problem, contact Oracle as described in "Getting Help for Problems with Services Gatekeeper".

Finding the Current Patch Level of Your Services Gatekeeper System

When you encounter an issue, then, before contacting Oracle, find the current patch level of your Services Gatekeeper system. Oracle will ask you for this information.

With the current patch level of your Services Gatekeeper system at hand, Oracle can tell you if your issue has been addressed in a later patch release. You can then easily solve the issue by upgrading your Services gatekeeper from its current level to that later patch level.

Listing What Is Currently Installed on Your Services Gatekeeper System

To list what is currently installed on your Services Gatekeeper system, use the lsinventory command from the OPatch utility.

OPatch is an Oracle-supplied utility that assists you with the process of applying interim patches to Oracle's software. The lsInventory command lists the inventory for a particular Oracle home, or displays all installations that can be found.

Running the OPatch lsinventory Command

To run the lsinventory command and obtain information on the patches that are applied currently on your Services Gatekeeper system:

Note:

The ORACLE_HOME variable needs to point to the installation on which opatch is to operate.
  1. Navigate to the directory in which Services Gatekeeper is installed.

  2. Set the WebLogic environment:

    source ./wlserver/server/bin/setWLSEnv.sh
    
  3. Set ORACLE_HOME to the current directory

    export ORACLE_HOME=$(pwd)
    
  4. Go to the subdirectory in which the OPatch utility resides.

    cd OPatch
    
  5. Enter the following command.

    ./opatch lsinventory -detail
    

For more information on the lsInventory Command for OUI-based Oracle homes and OPatch, please see Universal Installer and OPatch User's Guide located on the Oracle Help center website.

Example 13-1 shows a sample output from the lsInventory command, when the command was used without the -detail parameter.:

Example 13-1 Sample Output from lsInventory

Oracle Interim Patch Installer version 13.2.0.0.0
Copyright (c) 2014, Oracle Corporation.  All rights reserved.
 
 
Oracle Home       : /home/username/oracle/ocsg_6.0_build_361
Central Inventory : /home/username/prog/oui_11.2.0.2.0
   from           : /home/username/oracle/ocsg_6.0_build_361/oraInst.loc
OPatch version    : 13.2.0.0.0
OUI version       : 13.2.0.0.0
Log file location : /home/username/oracle/ocsg_6.0_build_361/cfgtoollogs/opatch/opatch2014-10-30_11-42-22AM_1.log
 
 
OPatch detects the Middleware Home as "/home/username/oracle/ocsg_6.0_build_361"
 
Oct 30, 2014 11:42:28 AM oracle.sysman.oii.oiii.OiiiInstallAreaControl initAreaControl
INFO: Install area Control created with access level  0
Lsinventory Output file location : /home/username/oracle/ocsg_6.0_build_361/cfgtoollogs/opatch/lsinv/lsinventory2014-10-30_11-42-22AM.txt
 
--------------------------------------------------------------------------------
 
Interim patches (1) :
 
Patch  19836145     : applied on Thu Oct 30 11:25:24 CET 2014
Unique Patch ID:  1414504829609
Patch description:  "[Patch Set v6.0.0.1.7] - Patch bug for patch set XYZ"
   Created on 28 Oct 2014, 15:00:35 hrs PST8PDT
   Bugs fixed:
     123413, 123412
 
 
 
--------------------------------------------------------------------------------
 
OPatch succeeded.

Note the description of the patch as given in the above output, provides you with the patch set number (v6.0.0.1.7), that indicates you received the seventh update release for the first patch release of the 6.0 major release of Services Gatekeeper. It lists the numbers of the issues that were fixed.

Other Usages of the lsinventory Command

With the lsinventory command from OPatch, you can

  • Group the inventory of all installed patches by the date they were installed in the Oracle home.

    ./opatch lsinventory -detail
    
  • Pipe the output like any other command.

    ./opatch > out.log
    
  • Redirect standard error (stderr) to standard output (stdout).

    ./opatch > out.log > 2>&1
    

Handling Performance Issues

Maintaining Services Gatekeeper performance levels is a complex task. If you find that your Services Gatekeeper system is not performing in an optimal manner, you may need to tune the underlying components to the requirements of your environment. For example:

  • WebLogic Server

    If you find that your Services Gatekeeper system is not performing in an optimal manner, tune the underlying WebLogic Server (WLS) to the requirements of your environment. For example, select the appropriate startup mode for your installation.

    For information about the default tuning values for WebLogic Server development and production modes, see Oracle Fusion Middleware Performance and Tuning for Oracle WebLogic Server.

  • Java Virtual Machine (JVM)

    How you tune your JVM affects the performance of WebLogic Server and your applications. For more information see the discussion on tuning Java Virtual Machines (JVMs) in Fusion Middleware Performance and Tuning for Oracle WebLogic Server on the Oracle Help Center website.

  • Persistence type for storage services

    If you find that your Services Gatekeeper system is not performing in an optimal manner, check on the caching technique you have implemented. Compare the techniques to configure one that better suits your requirement to storing and accessing the data.

    For example, the write-through caching technique has performance implications when compared to the write-behind technique. This is because, for write-through, the data input/output operation to cache and to the permanent storage location must complete first before a notification is sent to the host.

  • Latency

    If you find that your Services Gatekeeper system is not performing in an optimal manner, check the network latency and network performance between the application tier and the database tier. See Latency and Bandwidth Requirements for information on the requirements that Oracle recommends.

    The traffic between your application and your database could be a factor, especially in a multi-tiered environment.

As part of your discovery process on Service Gatekeeper performance, be sure to look at the log files that Services Gatekeeper provides.

Diagnosing Problems from Alarms

If Services gatekeeper encounters a problem that it recognizes, it sends an EDR alarm to help you diagnose the problem. See "Managing and Configuring EDRs, CDRs, and Alarms" for general information about alarms, and Alarms Handling Guide for details on the individual alarms organized by tagalarm number.

Using Error Logs to Troubleshoot Services Gatekeeper

If you are having a problem with Services Gatekeeper, look in the log files. Log files include errors that need to be managed, as well as errors that do not need immediate attention (for example, invalid logins).

To manage log files, you should make a list of the important errors for your system, as opposed to errors that do not need immediate attention.

About Error Log Files

Services Gatekeeper maintains a default.log file that contains logs from the modules specific to it. The error log files provide detailed information about system problems.

Additionally, look at the entries in the WLS server log files.

Finding Error Log Files

The Services Gatekeeper specific log file, default.log is located at:

domain_root_dir/servers/server_name/trace

The log files for the servers are located at:

domain_root_dir/servers/server_name/logs

By default, domain_root_dir represents the directory in which WebLogic Server domain is created and server_name is the name of the server.

Resolving Clusters of Error Messages

An error often produces a cluster of error messages in the log file. Some errors may tend to generate cascading messages. To resolve the error, try and locate the first one in the series.

Changing Log Levels in Services Gatekeeper

An easy and persistent way to change the logging level is to edit the log4j configuration file under Domain_Home/log4j/log4jconfig.xml.

To obtain a complete log, change the priority value:

  1. Go to the directory where the log4jconfig.xml configuration file is located.

    By default, it is in the Services Gatekeeper domain at Domain_Home/log4j.

  2. Open the log4jconfig.xml configuration file in an appropriate text editor.

  3. Locate priority value= entry.

  4. Set priority value to all, as shown below:

    <root>
            <priority value="all"/>
    </root>
    
  5. Save the file.

Collecting Log Data

Generally, server logs are important. Collect log information while the entries are fresh. If the log files are rotated, then eventually old logs will be overwritten by new ones.

Here is an example of how to collect Services Gatekeeper and WebLogic Server logs from a node. Copy and save the appropriate script to your Services Gatekeeper installation directory. Run the script from the same directory, repeating it for all nodes.

Use the script in Example 13-2 for Linux installations.

Example 13-2 Example of a Script to Collect Logs (Linux)

Linux version
#!/bin/sh
 
#This will collect all log, out, configuration and recording files
ROOT=`pwd`
ARCHIVE_DIR=/tmp
MACHINE=`hostname`
echo $MACHINE
ARCHIVE_FILE=${ARCHIVE_DIR}/`date +%F_%H_%M_%S`
TMP_FILE_LIST=${ARCHIVE_DIR}/tarinput
 
find $ROOT | grep -e".*\.log[\.,0-9]*$" > $TMP_FILE_LIST
find $ROOT | grep -e".*\.jfr$" >> $TMP_FILE_LIST
find $ROOT | grep -e".*\.xml$" >> $TMP_FILE_LIST
find $ROOT | grep -e".*\.out$" >> $TMP_FILE_LIST
tar cvf ${ARCHIVE_FILE}_$MACHINE.tar -T $TMP_FILE_LIST
gzip ${ARCHIVE_FILE}_$MACHINE.tar
echo "Created archive ${ARCHIVE_FILE}_$MACHINE.tar.gz"
 

Use the script in Example 13-3 for Solaris installations.

Example 13-3 Example of a Script to Collect Logs (Solaris)

Solaris version
#!/bin/sh
 
#This will collect all log, out, configuration and recording files
ROOT=`pwd`
ARCHIVE_DIR=/tmp
MACHINE=`hostname`
echo $MACHINE
ARCHIVE_FILE=${ARCHIVE_DIR}/`date +%F_%H_%M_%S`
TMP_FILE_LIST=${ARCHIVE_DIR}/tarinput
 
find $ROOT | grep ".*\.log[\.,0-9]*$" > $TMP_FILE_LIST
find $ROOT | grep ".*\.jfr$" >> $TMP_FILE_LIST
find $ROOT | grep ".*\.xml$" >> $TMP_FILE_LIST
find $ROOT | grep ".*\.out$" >> $TMP_FILE_LIST
tar cvf ${ARCHIVE_FILE}_$MACHINE.tar -I $TMP_FILE_LIST
gzip ${ARCHIVE_FILE}_$MACHINE.tar
echo "Created archive ${ARCHIVE_FILE}_$MACHINE.tar.

Diagnosing Some Common Problems with Services Gatekeeper

This section describes some of the common problems you may encounter in Services gatekeeper. It shows you how to diagnose the error messages and resolve the following issues.

Problem: The Server Will Not Start

The Services Gatekeeper server startup scripts work best with the Bash shell. If one of the server startup scripts fails with an error like this one:

./dbController.sh: 3: -/dbController.sh: Syntax Error: "(" unexpected

Edit the script, replacing the #!/bin/sh shebang with #!/bin/bash.

Problem: Reports Extension Installation Fails

If the Services Gatekeeper reports extension fails to install correctly check the Gatekeeper_home/tmp/log_xmf directory for error files with this syntax:

InstalltimeStamp.log

For example:

Install2016-03-28_04-27-59PM.log

If the problem is that the installation failed while executing the SQL scripts on the staging database you will see an error message like this one:

oracle.as.install.engine.modules.util.installaction.InstallActionException:
Initial database failed

In this case remove the reports extension by:

  1. Running OBI_home/oui/bindeinstall.sh script.

  2. Use the delete user username cascade command which deletes all entries related to that OBI database user.

  3. Recursively remove the Gatekeeper_home/ocsg_analytics folder.

  4. Recreate the OBI database user you deleted.

  5. Rerun the ocsg extn jar command to reinstall the OBI extension.

See ”Configuring Reports Data Source” in Services Gatekeeper Multi-tier Installation Guide for more information.

Problem: The Server is Hanging

A server (or node) may hang due to more than one reason.

When you find that a server is hanging, regardless of the actual cause, it is always a good idea to capture a thread dump while the node is hanging.

Note:

To identify slow-moving threads, be sure to take two thread dumps thirty (30) seconds apart.

If you capture the thread dump before you restart the node, you may find it easier to understand the reason why the node hanged. To store the thread dump in a log file, do one of the following: you will need to either use node manager or start the nodes so that standard out and standard error is forwarded to a file.

  • Use Node Manager log file

    Node Manager is a WebLogic Server utility that enables you to start, shut down, and restart Administration Server and Managed Server instances from a remote location. Although Node Manager is optional, it is recommended if your WebLogic Server environment hosts applications with high availability requirements.

    For more information, see the discussion on Log Files in Oracle Fusion Middleware Node Manager Administrator's Guide for Oracle WebLogic Server

  • Start the servers so that standard out and standard error is forwarded to a file. Run the starting script in the following way:

    startScript.sh > out.log 2>&1
    

If you have more than one thread dump, the results can be correlated to see if the states of the threads change. Example 13-4 shows how you can get a full list of the processes using the ps command.

Example 13-4 Obtaining Two thread Dumps

ps -ef | grep -e".*ocsg.*weblogic\.Server$"
 
#Oracle HotSpot Virtual Machine to print threads using jcmd
jcmd <pid> Thread.print > thread-dump.log   
#wait for 30 seconds and do another dump
jcmd <pid> Thread.print > thread-dump.log   
 
#Any JVM (output ends up on stderr)
kill -QUIT <pid from ps output>
#wait for 30 seconds and do another dump
kill -QUIT <pid from ps output>

Problem: Memory Issues

Garbage collection (GC) could result in long pauses that might affect performance.

To look for long GC pauses that might affect performance, add the -verbose:gc flag to your start script (setDomainEnv.sh for WebLogic-based servers). Example 13-5 shows the output seen when an example server is running with this flag.

Example 13-5 Example Garbage Collection Entries

[GC 307767K->235359K(375296K), 0.0803370 secs]
[GC 311327K->235207K(377024K), 0.0777140 secs]
[GC 313671K->216031K(344512K), 0.0520790 secs]
[GC 294495K->218928K(376448K), 0.0493060 secs]
[GC 295472K->218713K(341952K), 0.0441110 secs]

You can use the output to monitor the GC pauses while running traffic.

Problem: Enabling SSL on Admin Server Fails If All Local Addresses Used

If you use the Domain Configuration wizard to configure your domain, and you want to use SSL communication for Services Gatekeeper, do not use the All Local Addresses menu item for the listening port, and listening Port 7001. The wizard accepts this combination, but later when you attempt to enable port 7002 for SSL communication, the Administration Console hangs, with an error message like this one:

Timed out waiting for completion: Activate State: STATE_DISTRIBUTED Target Servers States: AdminSever STATE_COMMITTED WLNG_NT1 STATE_COMMITTED WLNG_NT2 STATE_COMMITTED, WLGN_AT1 STATE_COMMITTED WLNG_AT2 STATE_DISTRIBUTED 

Problem: Receiving an Internal Server Error and Incident ID

The HTTP client sending a request message to Services Gatekeeper may receive an error like this one:

HTTP/1.1 500 INTERNAL SERVER ERROR
Internal Server Error. Incident ID: E-1415b46a99b24b848290dcfed1ffba85

This error means that one of the actions in the action chain threw a runtime exception that stopped action chain processing. The result of any successful action is contained in EDRs. See "Understanding Action Chain EDR Handling" for details.

This error can occur in either request or response action chain processing. See ”Configuring Actions Chains to Manage API Traffic” in Services Gatekeeper API Management Guide and the API and Partner Portal Online Help for more information on the actions chain.

If you get one of these errors, you can:

  • Try removing or modifying your actions one at a time until you find the offending program.

  • Contact Oracle Support. They will be able to gather more information from the internal log files wi the Incident ID. See "Getting Help for Problems with Services Gatekeeper" for information.

Getting Help for Problems with Services Gatekeeper

If you cannot resolve your problems with Services Gatekeeper, contact Oracle.

Before You Contact Oracle

Problems can often be fixed simply by shutting down Services Gatekeeper and restarting the computer that the Services Gatekeeper system runs on. See ”Starting, Stopping, and Administering Servers” in Services Gatekeeper System Administrator's Guide.

Note:

Oracle will ask you for the relevant log files and thread dumps to troubleshoot an issue.

Therefore, before you shut down Services gatekeeper, be sure to obtain the relevant log files and thread dumps associated with the issue.

If that does not solve the problem, the first troubleshooting step is to look at the error log for the application or process that reported the problem. See "Using Error Logs to Troubleshoot Services Gatekeeper". Be sure to review "General Checklist for Resolving Problems with Services Gatekeeper" before reporting the problem to Oracle.

Reporting Problems to Oracle

If "General Checklist for Resolving Problems with Services Gatekeeper" does not help you resolve the problem, record the pertinent information:

  • A clear and concise description of the problem, including when it began to occur.

  • Relevant configuration files.

  • Any Internal Server Error Incident IDs. The incident ID is an internal Oracle code that Oracle Support personnel can use to help diagnose your problem.

  • Recent changes in your system, even if you do not think they are relevant.

  • List of all Services Gatekeeper components and patches installed on your system.

When you are ready, report the problem to Oracle.