14 Troubleshooting Your Services Gatekeeper Implementation

This chapter provides guidelines to help you troubleshoot problems with your Oracle Communications Services Gatekeeper implementation. You can find information about interpreting error messages, diagnosing common problems, and contacting Oracle customer support.

Before you read this chapter, you should be familiar with how Services Gatekeeper works. See Services Gatekeeper Concepts for information.

For information on problems related to Services Gatekeeper performance, see "Handling Performance Issues".

General Checklist for Resolving Problems with Services Gatekeeper

When any problems occur with your Services Gatekeeper system, it is best to do some troubleshooting before you contact Oracle:

  • You know your installation better than Oracle does. You know if anything in the system has been changed, so you are more likely to know where to look first.

  • Troubleshooting skills are important. Relying on Oracle to research and solve all of your problems prevents you from being in full control of your system.

Oracle needs a clear and concise description of the problem, including when it began to occur. If you have a problem with your Services Gatekeeper system, ask yourself these questions first, because Oracle will ask them of you:

  • What exactly is the problem? Can you isolate it? For example, if users cannot authenticate, is it all services or just one service? Does it affect a specific NT server?

  • Is this a known issue?

    Before calling to report an issue, it is a good idea to see if the problem you have encountered is a known issue already. Known issues are listed in Services Gatekeeper Release Notes.

  • Do you have the log files?

    This is the first thing that Oracle will ask for. Check the error log for the Services Gatekeeper module with which you are having problems. Please keep the information handy when you contact Oracle. See "Using Error Logs to Troubleshoot Services Gatekeeper".

  • Is the problem related to external systems?

    Sometimes when there is an issue, the problem maybe related to the communication between Services Gatekeeper and an external system, such as a short message service center (SMSC) or a charging server.

    This information is very helpful to Oracle in resolving such an issue. Capture the network traffic between Services Gatekeeper and the external system and provide it to Oracle.

  • Have you read the documentation?

    Look through the list of common problems and their solutions in "Diagnosing Some Common Problems with Services Gatekeeper".

  • Has anything changed in the system? Did you install any new hardware or new software? Did the network change in any way? Does the problem resemble another one you had previously? Has your system usage recently jumped significantly?

  • Is the system otherwise operating normally? Has response time or the level of system resources changed? Are users complaining about additional or different problems?

  • If the system appears completely dead, check the basics: Can you access the system administration console for Services Gatekeeper? Are other processes on this hardware functioning normally?

  • Stay up-to-date (as much as possible) with the Services Gatekeeper patch set releases provided by Oracle.

    What is your current patch level? See "Finding the Current Patch Level of Your Services Gatekeeper System".

If the error message points to a configuration problem, check the configuration file for the associated module. If you find that the solution requires reconfiguring the module, change the configuration and verify if the problem was resolved.

If you still cannot resolve the problem, contact Oracle as described in "Getting Help for Problems with Services Gatekeeper".

Finding the Current Patch Level of Your Services Gatekeeper System

When you encounter an issue, then, before contacting Oracle, find the current patch level of your Services Gatekeeper system. Oracle will ask you for this information.

With the current patch level of your Services Gatekeeper system at hand, Oracle can tell you if your issue has been addressed in a later patch release. You can then easily solve the issue by upgrading your Services gatekeeper from its current level to that later patch level.

Listing What Is Currently Installed on Your Services Gatekeeper System

To list what is currently installed on your Services Gatekeeper system, use the lsinventory command from the OPatch utility.

OPatch is an Oracle-supplied utility that assists you with the process of applying interim patches to Oracle's software. The lsInventory command lists the inventory for a particular Oracle home, or displays all installations that can be found.

Running the OPatch lsinventory Command

To run the lsinventory command and obtain information on the patches that are applied currently on your Services Gatekeeper system:

Note:

ORACLE_HOME needs to point to the installation on which opatch is to operate.
  1. Go to the directory in which Services Gatekeeper is installed.

    cd <install_dir>
    
  2. Set the WebLogic environment:

    source ./wlserver/server/bin/setWLSEnv.sh
    
  3. Set ORACLE_HOME to the current directory

    export ORACLE_HOME=$(pwd)
    
  4. Go to the subdirectory in which the OPatch utility resides.

    cd OPatch
    
  5. Enter the following command.

    ./opatch lsinventory -detail
    

For more information on the lsInventory Command for OUI-based Oracle homes and OPatch, please see Universal Installer and OPatch User's Guide located on the Oracle Help center website.

Example 14-1 shows a sample output from the lsInventory command, when the command was used without the -detail parameter.:

Example 14-1 Sample Output from lsInventory

Oracle Interim Patch Installer version 13.2.0.0.0
Copyright (c) 2014, Oracle Corporation.  All rights reserved.
 
 
Oracle Home       : /home/username/oracle/ocsg_6.0_build_361
Central Inventory : /home/username/prog/oui_11.2.0.2.0
   from           : /home/username/oracle/ocsg_6.0_build_361/oraInst.loc
OPatch version    : 13.2.0.0.0
OUI version       : 13.2.0.0.0
Log file location : /home/username/oracle/ocsg_6.0_build_361/cfgtoollogs/opatch/opatch2014-10-30_11-42-22AM_1.log
 
 
OPatch detects the Middleware Home as "/home/username/oracle/ocsg_6.0_build_361"
 
Oct 30, 2014 11:42:28 AM oracle.sysman.oii.oiii.OiiiInstallAreaControl initAreaControl
INFO: Install area Control created with access level  0
Lsinventory Output file location : /home/username/oracle/ocsg_6.0_build_361/cfgtoollogs/opatch/lsinv/lsinventory2014-10-30_11-42-22AM.txt
 
--------------------------------------------------------------------------------
 
Interim patches (1) :
 
Patch  19836145     : applied on Thu Oct 30 11:25:24 CET 2014
Unique Patch ID:  1414504829609
Patch description:  "[Patch Set v6.0.0.1.7] - Patch bug for patch set XYZ"
   Created on 28 Oct 2014, 15:00:35 hrs PST8PDT
   Bugs fixed:
     123413, 123412
 
 
 
--------------------------------------------------------------------------------
 
OPatch succeeded.

Note the description of the patch as given in the above output, provides you with the patch set number (v6.0.0.1.7), that indicates you received the seventh update release for the first patch release of the 6.0 major release of Services Gatekeeper. It lists the numbers of the issues that were fixed.

Other Usages of the lsinventory Command

With the lsinventory command from OPatch, you can

  • Group the inventory of all installed patches by the date they were installed in the Oracle home.

    ./opatch lsinventory -detail
    
  • Pipe the output like any other command.

    ./opatch > out.log
    
  • Redirect standard error (stderr) to standard output (stdout).

    ./opatch > out.log > 2>&1
    

Handling Performance Issues

Maintaining Services Gatekeeper performance levels is a complex task. If you find that your Services Gatekeeper system is not performing in an optimal manner, you may need to tune the underlying components to the requirements of your environment. For example:

  • WebLogic Server

    If you find that your Services Gatekeeper system is not performing in an optimal manner, tune the underlying WebLogic Server (WLS) to the requirements of your environment. For example, select the appropriate startup mode for your installation.

    For information about the default tuning values for WebLogic Server development and production modes, see Oracle Fusion Middleware Performance and Tuning for Oracle WebLogic Server.

  • Java Virtual Machine (JVM)

    How you tune your JVM affects the performance of WebLogic Server and your applications. For more information see the discussion on tuning Java Virtual Machines (JVMs) in Fusion Middleware Performance and Tuning for Oracle WebLogic Server on the Oracle Help Center website.

  • Persistence type for storage services

    If you find that your Services Gatekeeper system is not performing in an optimal manner, check on the caching technique you have implemented. Compare the techniques to configure one that better suits your requirement to storing and accessing the data.

    For example, the write-through caching technique has performance implications when compared to the write-behind technique. This is because, for write-through, the data input/output operation to cache and to the permanent storage location must complete first before a notification is sent to the host.

  • Latency

    If you find that your Services Gatekeeper system is not performing in an optimal manner, check the network latency and network performance between the application tier and the database tier. See Latency and Bandwidth Requirements for information on the requirements that Oracle recommends.

    The traffic between your application and your database could be a factor, especially in a multi-tiered environment.

As part of your discovery process on Service Gatekeeper performance, be sure to look at the log files that Services Gatekeeper provides.

Diagnosing Problems from Alarms

If Services gatekeeper encounters a problem that it recognizes, it sends an EDR alarm to help you diagnose the problem. See "Managing and Configuring EDRs, CDRs and Alarms" for general information about alarms, and Alarms Handling Guide for details on the individual alarms organized by tagalarm number.

Using Error Logs to Troubleshoot Services Gatekeeper

If you are having a problem with Services Gatekeeper, look in the log files. Log files include errors that need to be managed, as well as errors that do not need immediate attention (for example, invalid logins).

To manage log files, you should make a list of the important errors for your system, as opposed to errors that do not need immediate attention.

About Error Log Files

Services Gatekeeper maintains a default.log file that contains logs from the modules specific to it. The error log files provide detailed information about system problems.

Additionally, look at the entries in the WLS server log files.

Finding Error Log Files

The Services Gatekeeper specific log file, default.log is located at:

domain_root_dir/servers/server_name/trace

The log files for the servers are located at:

domain_root_dir/servers/server_name/logs

By default, domain_root_dir represents the directory in which WebLogic Server domain is created and server_name is the name of the server.

Resolving Clusters of Error Messages

An error often produces a cluster of error messages in the log file. Some errors may tend to generate cascading messages. To resolve the error, try and locate the first one in the series.

Changing Log Levels in Services Gatekeeper

An easy and persistent way to change the logging level is to edit the log4j configuration file under Domain_Home/log4j/log4jconfig.xml.

To obtain a complete log, change the priority value:

  1. Go to the directory where the log4jconfig.xml configuration file is located.

    By default, it is in the Services Gatekeeper domain at Domain_Home/log4j.

  2. Open the log4jconfig.xml configuration file in an appropriate text editor.

  3. Locate priority value= entry.

  4. Set priority value to all, as shown below:

    <root>
            <priority value="all"/>
    </root>
    
  5. Save the file.

Collecting Log Data

Generally, server logs are important. Collect log information while the entries are fresh. If the log files are rotated, then eventually old logs will be overwritten by new ones.

Here is an example of how to collect Services Gatekeeper and WebLogic Server logs from a node. Copy and save the appropriate script to your Services Gatekeeper installation directory. Run the script from the same directory, repeating it for all nodes.

Use the script in Example 14-2 for Linux installations.

Example 14-2 Example of a Script to Collect Logs (Linux)

Linux version
#!/bin/sh
 
#This will collect all log, out, configuration and recording files
ROOT=`pwd`
ARCHIVE_DIR=/tmp
MACHINE=`hostname`
echo $MACHINE
ARCHIVE_FILE=${ARCHIVE_DIR}/`date +%F_%H_%M_%S`
TMP_FILE_LIST=${ARCHIVE_DIR}/tarinput
 
find $ROOT | grep -e".*\.log[\.,0-9]*$" > $TMP_FILE_LIST
find $ROOT | grep -e".*\.jfr$" >> $TMP_FILE_LIST
find $ROOT | grep -e".*\.xml$" >> $TMP_FILE_LIST
find $ROOT | grep -e".*\.out$" >> $TMP_FILE_LIST
tar cvf ${ARCHIVE_FILE}_$MACHINE.tar -T $TMP_FILE_LIST
gzip ${ARCHIVE_FILE}_$MACHINE.tar
echo "Created archive ${ARCHIVE_FILE}_$MACHINE.tar.gz"
 

Use the script in Example 14-3 for Solaris installations.

Example 14-3 Example of a Script to Collect Logs (Solaris)

Solaris version
#!/bin/sh
 
#This will collect all log, out, configuration and recording files
ROOT=`pwd`
ARCHIVE_DIR=/tmp
MACHINE=`hostname`
echo $MACHINE
ARCHIVE_FILE=${ARCHIVE_DIR}/`date +%F_%H_%M_%S`
TMP_FILE_LIST=${ARCHIVE_DIR}/tarinput
 
find $ROOT | grep ".*\.log[\.,0-9]*$" > $TMP_FILE_LIST
find $ROOT | grep ".*\.jfr$" >> $TMP_FILE_LIST
find $ROOT | grep ".*\.xml$" >> $TMP_FILE_LIST
find $ROOT | grep ".*\.out$" >> $TMP_FILE_LIST
tar cvf ${ARCHIVE_FILE}_$MACHINE.tar -I $TMP_FILE_LIST
gzip ${ARCHIVE_FILE}_$MACHINE.tar
echo "Created archive ${ARCHIVE_FILE}_$MACHINE.tar.

Diagnosing Some Common Problems with Services Gatekeeper

This section describes some of the common problems you may encounter in Services gatekeeper. It shows you how to diagnose the error messages and resolve the following issues.

Problem: The Server Will Not Start

The Services Gatekeeper server startup scripts work best with the Bash shell. If one of the server startup scripts fails with an error like this one:

./dbController.sh: 3: -/dbController.sh: Syntax Error: "(" unexpected

Edit the script, replacing the #!/bin/sh shebang with #!/bin/bash.

Problem: The Server is Hanging

A server (or node) may hang due to more than one reason.

When you find that a server is hanging, regardless of the actual cause, it is always a good idea to capture a thread dump while the node is hanging.

Note:

To identify slow-moving threads, be sure to take two thread dumps thirty (30) seconds apart.

If you capture the thread dump before you restart the node, you may find it easier to understand the reason why the node hanged. To store the thread dump in a log file, do one of the following: you will need to either use node manager or start the nodes so that standard out and standard error is forwarded to a file.

  • Use Node Manager log file

    Node Manager is a WebLogic Server utility that enables you to start, shut down, and restart Administration Server and Managed Server instances from a remote location. Although Node Manager is optional, it is recommended if your WebLogic Server environment hosts applications with high availability requirements.

    For more information, see the discussion on Log Files in Oracle Fusion Middleware Node Manager Administrator's Guide for Oracle WebLogic Server

  • Start the servers so that standard out and standard error is forwarded to a file. Run the starting script in the following way:

    startScript.sh > out.log 2>&1
    

If you have more than one thread dump, the results can be correlated to see if the states of the threads change. Example 14-4 shows how you can get a full list of the processes using the ps command.

Example 14-4 Obtaining Two thread Dumps

ps -ef | grep -e".*ocsg.*weblogic\.Server$"
 
#Oracle HotSpot Virtual Machine to print threads using jcmd
jcmd <pid> Thread.print > thread-dump.log   
#wait for 30 seconds and do another dump
jcmd <pid> Thread.print > thread-dump.log   
 
#Any JVM (output ends up on stderr)
kill -QUIT <pid from ps output>
#wait for 30 seconds and do another dump
kill -QUIT <pid from ps output>

Problem: Memory Issues

Garbage collection (GC) could result in long pauses that might affect performance.

To look for long GC pauses that might affect performance, add the -verbose:gc flag to your start script (setDomainEnv.sh for WebLogic-based servers). Example 14-5 shows the output seen when an example server is running with this flag.

Example 14-5 Example Garbage Collection Entries

[GC 307767K->235359K(375296K), 0.0803370 secs]
[GC 311327K->235207K(377024K), 0.0777140 secs]
[GC 313671K->216031K(344512K), 0.0520790 secs]
[GC 294495K->218928K(376448K), 0.0493060 secs]
[GC 295472K->218713K(341952K), 0.0441110 secs]

You can use the output to monitor the GC pauses while running traffic.

Getting Help for Problems with Services Gatekeeper

If you cannot resolve your problems with Services Gatekeeper, contact Oracle.

Before You Contact Oracle

Problems can often be fixed simply by shutting down Services Gatekeeper and restarting the computer that the Services Gatekeeper system runs on. See ”Starting, Stopping, and Administering Servers” in Services Gatekeeper System Administrator's Guide.

Note:

Oracle will ask you for the relevant log files and thread dumps to troubleshoot an issue.

Therefore, before you shut down Services gatekeeper, be sure to obtain the relevant log files and thread dumps associated with the issue.

If that does not solve the problem, the first troubleshooting step is to look at the error log for the application or process that reported the problem. See "Using Error Logs to Troubleshoot Services Gatekeeper". Be sure to review "General Checklist for Resolving Problems with Services Gatekeeper" before reporting the problem to Oracle.

Reporting Problems

If "General Checklist for Resolving Problems with Services Gatekeeper" does not help you resolve the problem, write down the pertinent information:

  • A clear and concise description of the problem, including when it began to occur.

  • Relevant configuration files.

  • Recent changes in your system, even if you do not think they are relevant.

  • List of all Services Gatekeeper components and patches installed on your system.

When you are ready, report the problem to Oracle.