7 Troubleshooting

This chapter explains the important processes on each of the server components in Convergent Charging Controller, and describes a number of example troubleshooting methods that can help aid the troubleshooting process before you raise a support ticket.

Common Troubleshooting Procedures

To troubleshoot the product, first you must identify the system which is responsible for the service that needs troubleshooting.

As explained in the Product System Architecture section, there are three main server components in the Convergent Charging Controller:

  • Service Logic Controller (SLC)

The SLC is responsible for most real-time service processing (for example, voice/SMS/data). Call handling issues are likely to require troubleshooting on the SLC

  • Service Management System (SMS)

The SMS is responsible for provisioning, data warehousing and replication. Issues specific to certain subscribers, coinciding with important changes to rating, concerning EDRs or with external provisioning (via the Provisioning Interface (PI)) require troubleshooting on the SMS.

  • Voucher and Wallet Server (VWS)

The VWS is responsible for voucher redemption and call rating (this includes balance management and promotions tracking). Issues concerning subscribers’ balances, top-ups and vouchers are likely to require troubleshooting on the VWS.

Important notice

Please note that Convergent Charging Controller packages are complete versions and were tested as such.

If you have any questions or problems, please contact Oracle.

General tools

The following information is not specific to any particular type of node, and can be helpful when investigating any problem situation.

The list of processes is built from inittab, and will highlight any defined that are not running. If a SLEE is present, its configuration will be parsed, and SLEE processes included in the list.

Process status

There are a few basic checks that can be run on any of the machines, which are provided as part of the supportScp (SLC/VWS) or supportSms (SMS) packages. These give you a quick look at what processes are running.

Example - pslist

This example shows the pslist command used with no parameters.

Command:

$ pslist

Result:


------------------------ Thu Oct 24 04:56:53 GMT 2010 -------------------------- 
C APP  USER       PID PPID    STIME COMMAND 
1 ACS  acs_oper  1004    1   04-Oct N/service_packages/ACS/bin/acsCompilerDaemon 
1 ACS  acs_oper  1008    1   04-Oct /service_packages/ACS/bin/acsProfileCompiler 
1 ACS  acs_oper 13833    1 00:12:38 ice_packages/ACS/bin/acsStatisticsDBInserter 
1 OSD  acs_oper  1047    1   04-Oct /service_packages/OSD/bin/osdWsdlRegenerator 
1 CCS  ccs_oper  1011    1   04-Oct /IN/service_packages/CCS/bin/ccsCDRLoader 
1 CCS  ccs_oper  1033    1   04-Oct service_packages/CCS/bin/ccsCDRFileGenerator 
1 CCS  ccs_oper 11411    1   13-Oct /IN/service_packages/CCS/bin/ccsBeOrb 
2 CCS  ccs_oper  1406 1043   04-Oct IN/service_packages/CCS/bin/ccsProfileDaemon 
1 CCS  ccs_oper  9413    1   04-Oct /IN/service_packages/CCS/bin/ccsChangeDaemon 
1 EFM  smf_oper   995    1   04-Oct /IN/service_packages/EFM/bin/smsAlarmManager 
1 PI   smf_oper  1080    1   04-Oct /IN/service_packages/PI/bin/PImanager 
6 PI   smf_oper  1319 1080   04-Oct PIprocess 
1 PI   smf_oper  9186 1080   04-Oct PIbeClient 
2 SMS  smf_oper  6173    1   21-Oct /IN/service_packages/SMS/bin/smsMaster 
1 SMS  smf_oper   941    1   04-Oct /IN/service_packages/SMS/bin/smsAlarmRelay 
1 SMS  smf_oper   943    1   04-Oct /IN/service_packages/SMS/bin/smsNamingServer 
1 SMS  smf_oper   944    1   04-Oct IN/service_packages/SMS/bin/smsReportsDaemon 
1 SMS  smf_oper   946    1   04-Oct /service_packages/SMS/bin/smsReportScheduler 
1 SMS  smf_oper   947    1   04-Oct /IN/service_packages/SMS/bin/smsAlarmDaemon 
1 SMS  smf_oper   948    1   04-Oct N/service_packages/SMS/bin/smsStatsThreshold 
1 SMS  smf_oper   949    1   04-Oct /IN/service_packages/SMS/bin/smsTaskAgent 
1 SMS  smf_oper   969    1   04-Oct /IN/service_packages/SMS/bin/smsTrigDaemon 
2 SMS  smf_oper   979    1   04-Oct /IN/service_packages/SMS/bin/smsConfigDaemon 
1 SMS  smf_oper   980    1   04-Oct N/service_packages/SMS/bin/smsStatsDaemonRep 
total processes found = 32 [ 32 expected ] 
================================= run-level 3 ================================== 

Example - pslist -d

This example shows the pslist command used with the -d parameter. From time to time, processes will be added to or removed from inittab/SLEE. The -d parameter instructs pslist to reconstruct the list.

Command:

$ pslist -d

Result:

Scanning input file. 
[ /etc/inittab ] 
Scanning input file. 
[ /IN/service_packages/SLEE/etc/SLEE.cfg ] 
Info: Did not find SLEE config file [ /IN/service_packages/SLEE/etc/SLEE.cfg ] 
Does the SLEE application exist on this machine? 
<---- 
############################################################################ 
# pslist: default process list configuration (plc) file used to match and  # 
# display running processes.                                               #
 # File creation time: Thu Nov 13 04:19:29 GMT 2008                         # 
# Lines beginning with a hash (#) character are ignored.                   # 
# $1="grouped-apps name (max 5-char)" $2="regex of process" [$3+=comments] # 
############################################################################ 
ACS   acs_oper.*\/IN\/service_packages\/ACS\/bin\/acsCompilerDaemon            inittab  
ACS   acs_oper.*\/IN\/service_packages\/ACS\/bin\/acsProfileCompiler           inittab  
ACS   acs_oper.*\/IN\/service_packages\/ACS\/bin\/acsStatisticsDBInserter      inittab  
CCS   ccs_oper.*\/IN\/service_packages\/CCS\/bin\/ccsBeOrb                     inittab  
CCS   ccs_oper.*\/IN\/service_packages\/CCS\/bin\/ccsCDRFileGenerator          inittab  
CCS   ccs_oper.*\/IN\/service_packages\/CCS\/bin\/ccsCDRLoader                 inittab  
CCS   ccs_oper.*\/IN\/service_packages\/CCS\/bin\/ccsChangeDaemon              inittab  
CCS   ccs_oper.*\/IN\/service_packages\/CCS\/bin\/ccsProfileDaemon             inittab  
EFM   smf_oper.*\/IN\/service_packages\/EFM\/bin\/smsAlarmManager              
inittab  
OSD   acs_oper.*\/IN\/service_packages\/OSD\/bin\/osdWsdlRegenerator           inittab  
PI    smf_oper.*PIbeClient                                                     inittab: PI 
Manager child process 
PI    smf_oper.*PIprocess                                                      
inittab: PI 
Manager child process 
PI    smf_oper.*\/IN\/service_packages\/PI\/bin\/PImanager                     inittab  
SMS   smf_oper.*\/IN\/service_packages\/SMS\/bin\/smsAlarmDaemon               inittab  
SMS   
smf_oper.*\/IN\/service_packages\/SMS\/bin\/smsConfigDaemon              inittab  
SMS   smf_oper.*\/IN\/service_packages\/SMS\/bin\/smsMaster                    inittab  
SMS   smf_oper.*\/IN\/service_packages\/SMS\/bin\/smsNamingServer              inittab  
SMS   smf_oper.*\/IN\/service_packages\/SMS\/bin\/smsReportScheduler           inittab  
SMS   smf_oper.*\/IN\/service_packages\/SMS\/bin\/smsReportsDaemon             inittab  
SMS   smf_oper.*\/IN\/service_packages\/SMS\/bin\/smsStatsDaemonRep            
inittab  
SMS   smf_oper.*\/IN\/service_packages\/SMS\/bin\/smsStatsThreshold            inittab  
SMS   smf_oper.*\/IN\/service_packages\/SMS\/bin\/smsTaskAgent                 inittab  
SMS   smf_oper.*\/IN\/service_packages\/SMS\/bin\/smsTrigDaemon                
inittab  ----> 
Default process list configuration file created. 
[ /IN/service_packages/SMS/tmp/ps_processes.testusms.plc ]  

Process configuration

Configuration for Convergent Charging Controller products and processes are made almost exclusively in the file /IN/service_packages/eserv.config.

The file is broken down into sections and subsections, grouped together by {} brackets. Each product comes with an example eserv.config inside their respective <Product>/etc directories, and each configuration option is documented in the associated Technical Guide.

There are some exceptions, notably ACS and SLEE, which have some separate configuration files in /IN/service_packages/ACS/etc/acs.conf and /IN/service_packages/SLEE/etc/SLEE.cfg respectively.

Some NCA interface configuration is also housed in a separate file; for example, for SIGTRAN interfaces (sua_if/m3ua_if) the configuration is often specified in /IN/service_packages/SLEE/etc/sigtran.config or interface _service.config.

Note: Processes also have command line arguments, which are passed in the calling shell script - normally named /IN/service_packages/<Product>/bin/ ProcessName Startup.sh.

Remote Diagnostic Agent

Remote Diagnostic Agent (RDA) is an Oracle cross-product diagnostic tool used to help Oracle engineers in troubleshooting and analyzing issues.

RDA supports Oracle Communications Convergent Charging Controller.

For a more general usage guide of the Remote Diagnostic Agent tool, please refer to the references included in the following sections.

Installing RDA

To install RDA, please review My Oracle Support Note 314422.1.

For consistency across all platforms and Convergent Charging Controller nodes, upload the RDA package to the /IN/service_packages/SUPPORT/ directory and proceed with the installation from this location.

To install RDA on your Convergent Charging Controller nodes:

  1. Navigate to the directory where you downloaded the RDA package. For example, /IN/service_packages/SUPPORT/
  2. Uncompress the RDA file as the smf_oper user. This will create a subfolder named rda in the current folder, containing all files for RDA running.

    Note:

    Due to restrictive security policies, RDA should not be installed/run as the root user - smf_oper should have all accesses and permissions it needs.

Configuring RDA

To set up the RDA profile and activate the Convergent Charging Controller module, navigate to the RDA directory and use the following command:

smf_oper@server$./rda.sh -vdSp Com_NCC

The tool will prompt you with a few questions regarding your environment. Most default answers should be sufficient for your environment. However, you must select the Yes option to collect information from your Oracle database. Review each prompt ensuring that the responses are specific for your environment.

Additionally:

  • The prompt about ADDM, AWD, and ASH is necessary due to restricted use of these features for licensing reasons.
  • The system user should not connect as sysdba if the smf_oper user in your environment does not have sysdba permissions for your Convergent Charging Controller database.
  • Configuration, except passwords, is stored by the tool for future running of the RDA tool.

Example RDA Output

Here is an example RDA output:


bash-4.1$ ./rda.sh -vdSp Com_NCC 
Setting up ...
 ------------------------------------------------------------------------------- 
S000INI: Initializes the Data Collection ------------------------------------------------------------------------------- 
RDA uses the output file prefix to identify all files belonging to the same 
data collection. The prefix must start with a letter and must contain only 
alphanumeric characters. 
Enter the prefix to be used for all the generated files 
Hit 'Return' to accept the default (RDA) 
>  
Enter the directory used for all the files to be generated 
Hit 'Return' to accept the default (/IN/service_packages/SUPPORT/rda/output) 
>  
Do you want to keep report packages from previous runs (Y/N)? 
Hit 'Return' to accept the default (N) 
>  
Enter the Oracle home to be used for data analysis 
Hit 'Return' to accept the default (/u01/app/oracle/product/12.1.0) 
>  
Enter the network domain name for this server 
Hit 'Return' to accept the default (us.oracle.com) 
>  ------------------------------------------------------------------------------- 
S010CFG: Collects Key Configuration Information ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- 
S090OCM: Set up the Configuration Manager Interface ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- 
S909RDSP: Produces the Remote Data Collection Reports ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- 
S919LOAD: Produces the External Collection Reports ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- 
S999END: Finalizes the Data Collection -------------------------------------------------------------------------------
------------------------------------------------------------------------------- 
S100OS: Collects the Operating System Information ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- 
S105PROF: Collects the User Profile ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- 
S110PERF: Collects Performance Information ------------------------------------------------------------------------------- 
Can ADDM, AWR, and ASH be used (Y/N)? 
Hit 'Return' to accept the default (Y) 
>  ------------------------------------------------------------------------------- 
S120NET: Collects Network Information ------------------------------------------------------------------------------- 
Do you want RDA to perform the network ping tests (Y/N)? 
Hit 'Return' to accept the default (N) 
>  ------------------------------------------------------------------------------- 
S122ONET: Collects Oracle Net Information ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- 
S200DB: Controls Oracle RDBMS Data Collection ------------------------------------------------------------------------------- 
Is the database associated to the current Oracle home (Y/N)? 
Hit 'Return' to accept the default (Y) 
>  
Enter the Oracle SID to be analyzed 
Hit 'Return' to accept the default (SMF) 
>  
Is the INIT.ORA for the database to be analyzed located on this system? 
(Y/N) 
Hit 'Return' to accept the default (Y) 
>  
Enter the location of the spfile or the INIT.ORA (including the directory 
and 
file name) 
Hit 'Return' to accept the default 
(/u01/app/oracle/product/12.1.0/dbs/initSMF.ora)
 Enter an Oracle User ID (userid only) to view DBA_ and V$ tables. If RDA 
will 
be run under the Oracle software owner's ID, enter a forward slash (/) here, 
and enter Y at the SYSDBA prompt to avoid a prompt for the database password 
at runtime. 
Hit 'Return' to accept the default (system) 
>  
Is 'system' a SYSDBA user (will connect as SYSDBA) (Y/N)? 
Hit 'Return' to accept the default (N) 
>  ------------------------------------------------------------------------------- 
S201DBA: Collects Oracle RDBMS Information ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- 
S204LOG: Collects Oracle Database Trace and Log Files ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- 
S491NCC: Collects Network Charging and Control Information ------------------------------------------------------------------------------- 
Enter the full path of the Network Charging and Control home directory 
Hit 'Return' to accept the default (/IN/service_packages) 
>  
WARNING: RDBMS information is collected from Oracle Database only. 
Do you want to collect application information from an Oracle Database 
(Y/N)? 
Hit 'Return' to accept the default (N) 
> Y 
Enter the Oracle SID of the database 
Hit 'Return' to accept the default (SMF) 
> 
Enter an Oracle User ID (userid only) to view application specific 
database 
information 
Hit 'Return' to accept the default (smf) 
> smf ------------------------------------------------------------------------------- 
S990FLTR: Controls Report Content Filtering ------------------------------------------------------------------------------- 
Updating the setup file ...

Collecting Data

Use the following recommended flags (or adapt them according to your needs):

smf_oper@server$./rda.sh -vfCRP

where server is the Convergent Charging Controller node where RDA runs and the flags are defined as follows:

  • v: verbose
  • C: collect
  • R: render into html
  • P: Package contents of output directory into archive
  • f : force execution of all commands

The script will request the system password and may request a user with the statspack tool installed based on your selections during the configuration.

RDA generates multiple files in the output/ folder. A zip archive file containing all of the output files is also generated. Download only the zip file from the server where the RDA report was run from the output/ folder for submission. The script may take several minutes to complete.

Note: Subsequent RDA script execution overwrites the previous reports.

Using Output Immediately

After the archive file is uploaded to Oracle Support, post-processing of the data occurs. The post-processing does not add, remove nor modify the data, it only organizes and applies some formatting. Oracle recommends uploading RDA output files for post-processing. However, it is possible to unzip the file on any computer and directly browse the files.

To optionally view the RDA output immediately before sending the data to Oracle Support, completely unzip the archive and double click on the file named RDA__start.htm. This will open the RDA web interface in your default web browser.

Attaching the ZIP Archive to a Service Request

Upload the generated zip file a previously opened Service Request in My Support.

cmnPushFiles/cmnReceiveFiles

cmnPushFiles is responsible for monitoring a location on the SLC/VWS for new files, and will "push" the files to the SMS.

cmnPushFiles is called from inittab, and will run in run-level 3 and generally runs multiple instances.

Each instance will monitor the EDRs of a certain product or process (for example, MM EDRs created by xmsTrigger, ACS EDRs created by slee_acs), however it can also be used to push expiry messages or notifications between machines.

In order for cmnPushFiles to successfully "push" files to the SMS, the network service cmnReceiveFiles must be configured on the SMS in /etc/inetd.conf and /etc/services

cmnPushFiles is crucial to the EDR processing chain, and if it is not running or configured incorrectly, then files will build up on the SLC/VWS indefinitely until the system runs out of disk space.

Example - PushFiles

Consider this sample output from a VWS:

$ ps -ef | grep Push 
ebe_oper 12479 … cmnPushFiles -d /IN/service_packages/E2BE/logs/CDR-out -r /IN/service_packages/ 
ccs_oper 12519 … cmnPushFiles -d /IN/service_packages/CCS/logs/expiryMessage/ -r /IN/service_pac 
ccs_oper 12480 … cmnPushFiles -d /IN/service_packages/CCS/logs/wallet -r /IN/service_packages/CC 
ccs_oper 12482 … cmnPushFiles -d /IN/service_packages/CCS/logs/ccsNotificationWrite/ -r /IN/serv 

The command response shows there are four instances of cmnPushFiles running.

Using the arguments given to the process, what the process is responsible for can usually be determined:

$ pargs 12479 
12479:  cmnPushFiles -d /IN/service_packages/E2BE/logs/CDR-out -r /IN/service_packages/ 
argv[0]: cmnPushFiles 
argv[1]: -d 
argv[2]: /IN/service_packages/E2BE/logs/CDR-out 
argv[3]: -r 
argv[4]: /IN/service_packages/CCS/logs/CDR-in 
argv[5]: -h 
argv[6]: usms.CdrPush 
argv[7]: -F 

Here we see this cmnPushFiles is taking completed EDRs from CDR-out on the VWS and sending them to CDR-in on the SMS.

Space issues

If the cmnPushFiles log file (/IN/service_packages/E2BE/tmp/cmnPushFiles), or the syslog is reporting insufficient space, checking available space in CDR-out on the VWS and CDR-in on the SMS will be the first step to diagnosing the problem.

Core files

When monitoring a platform, or investigating issues, it is important to check for core files.

Processes running from inittab will be automatically restarted by Solaris, and processes running inside the SLEE will be restarted by the watchdog if they stop running.

If a process cores due to a recurring traffic scenario, it will be restarted and continue to core until the mount point runs out of disk space.

Core file location

The location of core files differs depending on configuration, and how the process was started.

The first thing to check is the output of coreadm, which specifies how the operating system will handle core files.

Multiple core locations

In this example, core files will write to the directory they were called from (in the case of SLEE processes, this will be /IN/service_packages/SLEE/bin), and will be named simply core. In this situation, the majority of /IN/service_packages will need to be checked for core files.

$ coreadm                                                                                                       
global core file pattern:  
init core file pattern: core 
global core dumps: disabled 
per-process core dumps: enabled 
global setid core dumps: disabled 
per-process setid core dumps: disabled 
global core dump logging: disabled 

Single core location

However, if configured as in this example, all core files will be written to one central location (often on a separate mount point). In this situation, only one directory/mount needs to be checked.

This can also reduce the risk of an important mount point getting filled up with core files.

$ coreadm 
global core file pattern: /var/crash/core-%n-%p-%f 
global core file content: default 
init core file pattern: core 
init core file content: default 
global core dumps: enabled 
per-process core dumps: disabled 
global setid core dumps: enabled 
per-process setid core dumps: disabled 
global core dump logging: enabled 

Diagnostic information

Processes that core can be a risk to the platform for many reasons, and should be dealt with as quickly as possible.

In general they indicate a software fault that will require investigation by Oracle Engineering, so it is important to collect the following diagnostic information:

Gdb backtrace

In order for Oracle Engineering to investigate a core file, the most important piece of information (apart from the core itself) is the gdb backtrace.

Follow these steps to collect the backtrace.
  1. If not possible from the filename itself, determine what process created the core, using the file command.
    $ file core 
    core: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, 
    from '/IN/service_packages/ACS/bin/slee_acs'
  2. Open the core using gdb, with the original binary and the core file as arguments.

    Note:

    The exact binaries and libraries that generated the core file are required. If the product version has changed, it is unlikely gdb will be able to interpret the core correctly.
    $ gdb /IN/service_packages/ACS/bin/slee_acs core 
    GNU gdb (Red Hat Enterprise Linux) 14.2-3.0.1.el9 
    Copyright (C) 2023 Free Software Foundation, Inc. 
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> 
    This is free software: you are free to change and redistribute it. 
    There is NO WARRANTY, to the extent permitted by law. 
    Type "show copying" and "show warranty" for details. 
    This GDB was configured as "x86_64-redhat-linux-gnu". 
    Type "show configuration" for configuration details. 
    For bug reporting instructions, please see: 
    <https://www.gnu.org/software/gdb/bugs/>. 
    Find the GDB manual and other documentation resources online at: 
    <http://www.gnu.org/software/gdb/documentation/>. 
    For help, type "help". 
    Type "apropos word" to search for commands related to "word"... 
    Reading symbols from slee_acs... 
    warning: Can't open file /usr/lib64/libgcc_s-11-20231218.so.1 during file-backed 
    mapping note processing 
    [New LWP 473485] 
    warning: Build-id of /lib64/libstdc++.so.6 does not match core file. 
    [Thread debugging using libthread_db enabled] 
    Using host libthread_db library "/lib64/libthread_db.so.1". 
    Core was generated by `/IN/service_packages/ACS/bin/slee_acs'

    Result: Eventually you will be presented with the most recent frame of the core, the signal which ended the process, and a (gdb) prompt.

    Program terminated with signal 10, Bus error. 
    #0  0xfe2d6328 in _smalloc () from /lib/libc.so.1 
    (gdb) 
  3. To view all frames in the core, initiate a summary backtrace by typing bt at the prompt, see Example summary backtrace.
    (gdb) bt 
  4. To view all frames and all their information in the core, initiate a full backtrace by typing bt full at the prompt, see Example full backtrace .
    (gdb) bt full 

    Note:

    This information will need to be provided to Oracle Support for further investigation.

Example summary backtrace

Initiate a summary backtrace by typing bt at the prompt, all frames in the core will be shown:

(gdb) bt 
#0  0xfe2d6328 in _smalloc () from /lib/libc.so.1 
#1  0xfe2d639c in malloc () from /lib/libc.so.1 
#2  0xfef63450 in operator new (sz=4) at new_op.cc:48 
#3  0xfd318fa4 in cmn::escher::Array::push_back (this=0x18c1ce8, val=@0xffbfd030) at 
cmnEscherEntry.hh:229 
#4  0xfdc55248 in ccs::Message::CDR::appendFromString (this=0xffbfd0d8, fields= 
{static npos = 4294967295, _M_dataplus = {<allocator<char>> = {<No data fields>}, _M_p = 
0x18c08c4 "CLI=10101010101|ACS_CUST_ID=12|PC_AC=1|PC_PRC=1|TZ=NZ|PC_SCD=D07"}, static 
_S_empty_rep_storage = {0, 0, 0, 0}}) at /volB/DEV_BASE/nondebug/CCS/include/ccsMessage.hh:1581 
#5  0xfdd030ac in fox::ExtendedWalletUpdate::doAction (this=0x1a13cb0, request=@0x1, 
responseRequired=@0xffbfeb40, actionResponse=0x29c00,
context=@0xffbfd0e0, serviceContext=@0x19eb4b8, parms=@0xffbfd2b0) at /opt/gcc
3.2.3/include/c++/3.2.3/bits/stl_alloc.h:664 
#6  0xfdc47fac in fox::FOXActionHandler::doAction (this=0x1a13cb0, request=@0xffbfeb40, 
responseRequired=@0xffbfd21f, actionResponse=0x1a8c890, 
context=@0x19eb4b8, parms=@0xffbfd2b0) at FOXActionHandler.cc:1891 
#7  0xff283128 in acsActionsAPI::ActionHandler::doAction (this=0x1a13cb0, parms=@0xffbfd2b0) at 
acsActionHandler.cc:271 
#8  0x000e7df4 in acsChassisInvokeAPluggableAction (event=0xffbfeb38, context=0x1a8c880, 
actionStack=0x18c6588, result=0x1a8c888, callEnded=0xffbfd404, 
waitingForExternal=0xffbfd400, logErrorIfNotFound=1) at acsPluggableChassisAction.cc:358 
#9  0x000e7670 in acsChassisInvokePluggableAction (event=0xffbfeb38, context=0x1a8c880, 
actionStack=0x18c6588, result=0x1a8c888, callEnded=0xffbfd404, 
waitingForExternal=0xffbfd400) at acsPluggableChassisAction.cc:253 
#10 0x00076a38 in acsSLEEChassis_t::doAction (this=0x18c6580, action=@0xffbfeb38, 
actionYields=@0xffbfe957, actionExpectsResponse=@0xffbfe956) 
at acsChassis.cc:3854 
#11 0x000723c4 in acsSLEEChassis_t::processCall (this=0x18c6580, context=0x1a8c880) at 
acsChassis.cc:2498 
#12 0x0006f9c8 in acsSLEEChassis_t::main (this=0x18c6580) at acsChassis.cc:1822 
#13 0x0005b3c8 in main (argc=1, argv=0xffbff80c) at slee_acs.cc:134

Example full backtrace

Initiate a full backtrace by typing bt full at the prompt; all frames and all information contained in them will be shown. This can sometimes be many pages, and can sometimes result in endless junk information - collect as much as appears useful. The example below causes gdb to crash after the 5th frame:

(gdb) bt full 
#0  0xfe2d6328 in _smalloc () from /lib/libc.so.1 
No symbol table info available. 
#1  0xfe2d639c in malloc () from /lib/libc.so.1 
No symbol table info available. 
#2  0xfef63450 in operator new (sz=4) at new_op.cc:48 
p = (void *) 0x4 
#3  0xfd318fa4 in cmn::escher::Array::push_back (this=0x18c1ce8, val=@0xffbfd030) at 
cmnEscherEntry.hh:229 
this = (Entry * const) 0x18c1ce8 
this = (class ArrayImpl * const) 0x18c1ce8 
val = (const Map &) @0xffbfd030: {pimpl = {rep = 0x0}} 
#4  0xfdc55248 in ccs::Message::CDR::appendFromString (this=0xffbfd0d8, fields= 
{static npos = 4294967295, _M_dataplus = {<allocator<char>> = {<No data fields>}, _M_p = 
0x18c08c4 "CLI=10101010101|ACS_CUST_ID=12|PC_AC=1|PC_PRC=1|TZ=NZ|PC_SCD=D07"}, static 
_S_empty_rep_storage = {0, 0, 0, 0}}) at /volB/DEV_BASE/nondebug/CCS/include/ccsMessage.hh:1581 
field = {pimpl = {rep = 0x1a25af0}} 
key = {static npos = 4294967295, _M_dataplus = {<allocator<char>> = {<No data fields>}, 
_M_p = 0x1a80f94 "PC_SCD"}, static _S_empty_rep_storage = { 
0, 0, 0, 0}} 
val = {static npos = 4294967295, _M_dataplus = {<allocator<char>> = {<No data fields>}, 
_M_p = 0x1aa7d3c "D07"}, static _S_empty_rep_storage = {0, 
0, 0, 0}} 
cdrEntry = {static npos = 4294967295, _M_dataplus = {<allocator<char>> = {<No data 
fields>}, _M_p = 0x1a80eec "PC_SCD=D07"}, 
static _S_empty_rep_storage = {0, 0, 0, 0}} 
equals = 4290760752 
start = 4290760768 
end = 64 
#5  0xfdd030ac in fox::ExtendedWalletUpdate::doAction (this=0x1a13cb0, request=@0x1, 
responseRequired=@0xffbfeb40, actionResponse=0x29c00, 
context=@0xffbfd0e0, serviceContext=@0x19eb4b8, parms=@0xffbfd2b0) at /opt/gcc
3.2.3/include/c++/3.2.3/bits/stl_alloc.h:664 
cdr = {<Array> = {pimpl = {rep = 0x1a28d40}}, <No data fields>} 
parms = (acsChassisActionParms &) @0x1: <error reading variable> 
ewur = (class ExtendedWalletUpdateRequest *) 0xffbfeb40 
balanceInfoArray = {<Array> = {pimpl = {rep = 0x1a28cb8}}, <No data fields>} 
addBalanceInfoArray = true 
sbbia = (class SmallBalanceBucketInfoArray 
Segmentation Fault (core dumped)

Memory leaks

While monitoring the platform, it may be determined that a certain process is constantly increasing in memory, indicating a memory leak.

Memory leaks can be a great risk to the platform, as other processes will struggle to run if the machine does not have a sufficient amount of free memory. In low memory situations the OS will start paging information in and out of memory, causing a performance impact, and system instability.

A slow leak may pose little danger to the platform; however it is prudent to investigate sooner rather than later. In general leaks indicate a software fault that will require investigation by Oracle Engineering, so it is important to collect the following diagnostic information as soon as possible:

Diagnosing Memory Libraries

To check the memory libraries:

  1. Log in as the root user.
  2. Open the startup script. Add the following entries:
    
    MALLOC_CHECK=3 
    export MALLOC_CHECK_ 

    Result: The process will abort with a core file when a memory check fails.

  3. Log out of the root user.

Log files

All Convergent Charging Controller processes write to their own log file, usually /IN/service_packages/<Product>/tmp/ Process .log.

They will also write errors to the syslog, which generally has a longer retention period than log files. Log files are maintained by smsLogCleaner, which runs from each user’s crontab using configuration in /IN/service_packages /Product /etc/logjob.conf usually once per hour.

Logs are archived to /IN/service_packages/ Product /tmp/archive/ and usually kept for seven days (configurable on the command line).

When a process is put in debug, this extra information is written to the log file only.

Note: Files archived by smsLogCleaner can have their names changed.

Debug

All Convergent Charging Controller processes contain debug flags, which can be used to collect useful diagnostic information in the event of issues.

This is done in two main ways:

  1. By specifying debug flags in the startup script - which results in debug for all processing as long as the process is up.
  2. By setting tracing parameters inside configuration files.

The first is available to all Convergent Charging Controller processes, the second to a select few traffic handling applications which require more targeted debugging.

Startup flags

After locating the process startup script, debug flags can be specified via environment variable (debug statement highlighted):

$ vi slee_acs.sh 
#!/usr/bin/ksh 
DEBUG=all,-COMMON_escher,-COMMON_escher_detail,-COMMON_FD,-COMMON_Utils,-slee_api 
export DEBUG 
exec /IN/service_packages/ACS/bin/slee_acs  >> 
/IN/service_packages/ACS/tmp/slee_acs.log 2>&1 

The flags available differ by process, and generally Oracle Support will advise the flags required. DEBUG=all covers all debug defined in the process, but will be quite verbose so should be limited.

Flags can be subtracted from "all" or individual flags specified.

Note:

You can change the time zone for debug message timestamps by setting the environment variable in each associated startup script. Example:

DEBUG_TZ=America/Costa_Rica

export DEBUG_TZ

Available flags

To find out all the options available to a specific process, use the strings command along with grep.

For example type:

$ strings slee_acs | grep cmnDebug_FLAG

Result: All the flags available are listed.

cmnDebug_FLAG_Engine 
cmnDebug_FLAG_Chassis 
cmnDebug_FLAG_ACS_Chassis_CdrWrite 
cmnDebug_FLAG_slee_acs 
cmnDebug_FLAG_misc 
cmnDebug_FLAG_COMMON_Utils 
cmnDebug_FLAG_COMMON_Utils_cmnUnit 
cmnDebug_FLAG_acsChassisSLEE 
cmnDebug_FLAG_acsNOA 
cmnDebug_FLAG_acsAWOL 
cmnDebug_FLAG_acsCommon 
cmnDebug_FLAG_acsCdr 
cmnDebug_FLAG_Config 
cmnDebug_FLAG_ConfigFileImpl 
cmnDebug_FLAG_cmnPrefixTree 
cmnDebug_FLAG_COMMON_cmnTime 
cmnDebug_FLAG_cmnAssert 
cmnDebug_FLAG_ACS_NotifIF 

Note:

The cmnDebug_FLAG_ prefix part is assumed by debug so can be left off when configuring the Debug command.

Flags to avoid

The following flags are used by the majority of processes, and result in a lot of debug.

They are recommended to be removed unless otherwise requested.

  • COMMON_escher[_detail]
  • COMMON_FD
  • COMMON_Utils
  • slee_api

Selective tracing

Selective debug is available to some of the more important real-time traffic handling processes. These include:

  • slee_acs
  • beVWARS
  • xmsTrigger

In each case, a configurable tracing section contains a list of criteria for tracing (A-party and B-party for slee_acs, walletid for beVWARS), and will temporarily switch to debug for the duration of the triggering event.

Configuration can be made in eserv.config in the tracing{} section of the process, which is explained in full detail in the technical guides.

Once set, the process can be sent a SIGHUP signal to re-read its configuration, including the tracing section.

Tracing example

For example, here is an ACS tracing{} section for slee_acs:

tracing = { 
# Is tracing enabled? (default false) 
enabled = true 
# Originating Addresses that we want to trace 
origAddress = [ 
"12345" 
] 
# Destination Addresses that we want to trace 
destAddress = [ 
"12345" 
] 
# What debug level should the tracing be at? 
traceDebugLevel = "all" 
} 

xmsTrigger tracing

xmsTrigger tracing is set in the same fashion; however the resulting information goes to a separate file xmsTrigger.trc, does not contain debug, but does capture all the major decision points in a transaction.

Trace points are defined as:

Input

  1. Message received from network With which addresses?
  2. Message decoding information
    • Do we allow alternate delivery?
    • Which protocol version is this?
    • What was the message text (if showPrivate)?
  3. Message passed to Messaging Manager
    • Result from ParentContext::handleSMSubmit?
  4. Response received from MM
  5. Response sent to network

Output

  1. SMSubmit received from Messaging Manager
    • Is the delivery type SME or MC?
    • Do we need to consult a third party (for example, HLR) for any reason?
    • What are the addresses involved?
  2. Outgoing encoding information
    • Which protocol version are we using?
  3. Message sent to network
  4. Response received from network
  5. Response sent to Messaging Manager

Snoop traces

When dealing with issues related to real-time traffic handling, it is imperative to have reference snoop traces to observe the behavior of the Convergent Charging Controller software at the network/signaling level.

This information allows analysis of incoming messages, the responses sent back and the timing. Each standard is thoroughly documented and must conform to the appropriate specifications.

Snoop traces allow there to be no uncertainty about the conversation between the Convergent Charging Controller platform and external components.

Running a snoop trace

Snoops are initiated as the root user. Command line arguments give the user a fair amount of control over what gets collected; from the interface to the port and transport protocol.

At a rudimentary level, snoop can be instructed to display all incoming traffic for an interface. However, it is more useful to first determine what traffic is required (the more detail the better) and save to a file for analysis in a trace interpreter.

To see a list of all the snoop command line parameters, type:

$ man snoop

This gives a full list, with definitions.

Snoop example

In this example, diameterControlAgent has a handle on the local address 172.21.153.142 on port 3868. Using ifconfig, this is shown to be on interface e1000g1.

Note:

Network Connectivity Agents (NCAs) commonly use more than one interface for receiving/sending information. There are failover and loadsharing scenarios where this is required. The groupname specified will sometimes indicate the type of traffic, for example, "SIG-A" and "SIG-B" shows that more than one interface is used for SIGTRAN.

First, determine the interface the target process is attached to. This can be achieved by checking the output of ifconfig, inspecting the process with pfiles and cross-checking the results as highlighted:

$ ps -ef | grep diameterControlAgent 
acs_oper   160     1   0   Oct 20 ?         251:34 diameterControlAgent 
$ pfiles 160 | grep sock 
sockname: AF_UNIX /tmp/dcaIf-0.0.112.20101020123758 
sockname: AF_INET 0.0.0.0  port: 3868 
sockname: AF_INET 172.21.153.142  port: 3868 
sockname: AF_INET 172.21.153.142  port: 3868 
$ ifconfig -a 
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000  
e1000g0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2 
inet 172.21.153.82 netmask ffffffc0 broadcast 172.21.153.127 
groupname mgmt 
e1000g0:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 
index 2 
inet 172.21.153.80 netmask ffffffc0 broadcast 172.21.153.127 
e1000g1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3 
inet 172.21.153.142 netmask ffffffe0 broadcast 172.21.153.159 
groupname chrg 
e1000g1:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 
index 3 
inet 172.21.153.140 netmask ffffffe0 broadcast 172.21.153.159 
e1000g2: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4 
inet 172.21.5.100 netmask ffffff00 broadcast 172.21.5.255 
groupname sig 
e1000g2:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 
index 4 
inet 172.21.5.104 netmask ffffff00 broadcast 172.21.5.255 
e1000g3: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 5 
inet 172.21.205.27 netmask ffffff00 broadcast 172.21.205.255 
nxge0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 6 
inet 172.21.153.81 netmask ffffffc0 broadcast 172.21.153.127 
groupname mgmt 
nxge1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 7 
inet 172.21.153.141 netmask ffffffe0 broadcast 172.21.153.159 
groupname chrg 
nxge2: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 8 
inet 172.21.5.101 netmask ffffff00 broadcast 172.21.5.255 
groupname sig 

Information level of detail

To collect all information on the example interface, we can use the -d argument along with -o to get an output snoop file for our interpreter to use:

$ snoop -d e1000g1 -o diameterControlAgent.snoop

However, to target the snoop even more, we can also restrict to port 3868 using the -c argument.

$ snoop -d e1000g1 -c tcp port 3868 -o diameterControlAgent.snoop

Note:

tcp is assumed as Diameter is a tcp protocol.

To run a snoop for an extended period of time, it can be called with nohup or suffixed with & to have it run in the background. In this situation it is recommended to also use the -q argument, which suppresses the packet count.

Snoop interpreter

Once a snoop has been collected, an interpreter can be used to view the packets in a graphical interface.

Wireshark is one such widely used protocol analyzer, and contains plugins for decoding many telephony protocols, including:

  • INAP
  • Camel
  • MAP
  • Diameter.

Wireshark contains many useful features, which are outside of the scope of this document. In general, it will work quite well out of the box, automatically recognizing and decoding protocols without need for special configuration. For more information, see the Wireshark website www.wireshark.org.

Process failure

You can check whether a process is restarting using the SMS Alarms subsystem.

Processes raise alarms when they are stopped or started. The alarms include:

  • Their name
  • The time the alarm was logged
  • Some other information about why the event may have occurred

Further information about the specific alarm can be found in the application's alarms guide.

Alarms can be accessed from the:

  • Syslog on the local machine and the SMS(s). For more information, see SMS Technical Guide.
  • Alarms tab in the SMS Alarms Management screen. For more information, see SMS User's Guide.

Checking installed packages

To check the details of an installed package, use the pkginfo command.

Example command:
pkginfo -l
smsSms

Example output: This is an example of the output of the example command above.

PKGINST: smsSms NAME: Oracle smsSms CATEGORY: application
ARCH: sun4u VERSION: 3.1.0 VENDOR: Oracle PSTAMP:
smsNode20041020104925 INSTDATE: Oct 20 2004 13:15 EMAIL:
support@oracle.com STATUS: completely installed FILES: 348
installed pathnames 39 directories 89 executables 152448 blocks
used (approx)

For more information about the pkginfo utility, see the system documentation.

Checking access to Oracle database

A number of services and functions rely on access to the Oracle database. To check that the Oracle database is available to a service, check the following:

  1. Use sqlplus to check that you can log in to the Oracle database with the username and password the service is using to connect.

    Example command:

    sqlplus smf/smf
  2. Where the tables required for a service are known, use SQL queries to check that:
    • The tables exist
    • The tables have appropriate content

For more information about SQL queries, see the Oracle documentation.

Checking network connectivity

Network connectivity will affect any process which requires communication between two different network addresses.

Network connectivity should support SSH sessions between the two machines experiencing the problem.

If you can open an SSH session between the two machines, check the following before contacting Level 1 support with details:

  • If the address of either of the machines specified in the Node Management screens is a hostname, check that the hostnames used in the SSH sessions are the hostnames specified in the Node Management screen.

If you cannot SSH, check the following before contacting Level 1 support with details:

  • Check that the hostname is resolving correctly in the DNS.
  • Check that the physical network connection is working correctly.
  • Check that the inetd and sshd are running.
  • Check that sshd is listening on the expected port.
  • Check that the smf_oper and acs_oper accounts are not locked, and that the username and password combinations being used are correct.

Replication

Replication may be failing for the following reasons:

  • SSH keys have not been correctly set up between origin and destination machines.
  • The destination node has been incorrectly set up in the Node Management screens of the SMS Java screens.
  • Oracle is not running correctly.
  • A new replication.cfg file has not been created after a change.
  • replication.cfg may not be successfully copying to the destination machine (an error should display when the Create Config File button on the Node Management screens is clicked).
  • The partition on the destination machine where the data is being replicated to may be full.
  • The updateLoader on the destination machine may be running incorrectly.
  • The destination database may be substantially out of sync with the SMF. Run a resync.