7 Troubleshooting
This chapter explains the important processes on each of the server components in Convergent Charging Controller, and describes a number of example troubleshooting methods that can help aid the troubleshooting process before you raise a support ticket.
Common Troubleshooting Procedures
To troubleshoot the product, first you must identify the system which is responsible for the service that needs troubleshooting.
As explained in the Product System Architecture section, there are three main server components in the Convergent Charging Controller:
- Service Logic Controller (SLC)
The SLC is responsible for most real-time service processing (for example, voice/SMS/data). Call handling issues are likely to require troubleshooting on the SLC
- Service Management System (SMS)
The SMS is responsible for provisioning, data warehousing and replication. Issues specific to certain subscribers, coinciding with important changes to rating, concerning EDRs or with external provisioning (via the Provisioning Interface (PI)) require troubleshooting on the SMS.
- Voucher and Wallet Server (VWS)
The VWS is responsible for voucher redemption and call rating (this includes balance management and promotions tracking). Issues concerning subscribers’ balances, top-ups and vouchers are likely to require troubleshooting on the VWS.
Important notice
Please note that Convergent Charging Controller packages are complete versions and were tested as such.
If you have any questions or problems, please contact Oracle.
General tools
The following information is not specific to any particular type of node, and can be helpful when investigating any problem situation.
The list of processes is built from inittab, and will highlight any defined that are not running. If a SLEE is present, its configuration will be parsed, and SLEE processes included in the list.
Process status
There are a few basic checks that can be run on any of the machines, which are provided as part of the supportScp (SLC/VWS) or supportSms (SMS) packages. These give you a quick look at what processes are running.
Example - pslist
This example shows the pslist command used with no parameters.
Command:
$ pslist
Result:
------------------------ Thu Oct 24 04:56:53 GMT 2010 --------------------------
C APP USER PID PPID STIME COMMAND
1 ACS acs_oper 1004 1 04-Oct N/service_packages/ACS/bin/acsCompilerDaemon
1 ACS acs_oper 1008 1 04-Oct /service_packages/ACS/bin/acsProfileCompiler
1 ACS acs_oper 13833 1 00:12:38 ice_packages/ACS/bin/acsStatisticsDBInserter
1 OSD acs_oper 1047 1 04-Oct /service_packages/OSD/bin/osdWsdlRegenerator
1 CCS ccs_oper 1011 1 04-Oct /IN/service_packages/CCS/bin/ccsCDRLoader
1 CCS ccs_oper 1033 1 04-Oct service_packages/CCS/bin/ccsCDRFileGenerator
1 CCS ccs_oper 11411 1 13-Oct /IN/service_packages/CCS/bin/ccsBeOrb
2 CCS ccs_oper 1406 1043 04-Oct IN/service_packages/CCS/bin/ccsProfileDaemon
1 CCS ccs_oper 9413 1 04-Oct /IN/service_packages/CCS/bin/ccsChangeDaemon
1 EFM smf_oper 995 1 04-Oct /IN/service_packages/EFM/bin/smsAlarmManager
1 PI smf_oper 1080 1 04-Oct /IN/service_packages/PI/bin/PImanager
6 PI smf_oper 1319 1080 04-Oct PIprocess
1 PI smf_oper 9186 1080 04-Oct PIbeClient
2 SMS smf_oper 6173 1 21-Oct /IN/service_packages/SMS/bin/smsMaster
1 SMS smf_oper 941 1 04-Oct /IN/service_packages/SMS/bin/smsAlarmRelay
1 SMS smf_oper 943 1 04-Oct /IN/service_packages/SMS/bin/smsNamingServer
1 SMS smf_oper 944 1 04-Oct IN/service_packages/SMS/bin/smsReportsDaemon
1 SMS smf_oper 946 1 04-Oct /service_packages/SMS/bin/smsReportScheduler
1 SMS smf_oper 947 1 04-Oct /IN/service_packages/SMS/bin/smsAlarmDaemon
1 SMS smf_oper 948 1 04-Oct N/service_packages/SMS/bin/smsStatsThreshold
1 SMS smf_oper 949 1 04-Oct /IN/service_packages/SMS/bin/smsTaskAgent
1 SMS smf_oper 969 1 04-Oct /IN/service_packages/SMS/bin/smsTrigDaemon
2 SMS smf_oper 979 1 04-Oct /IN/service_packages/SMS/bin/smsConfigDaemon
1 SMS smf_oper 980 1 04-Oct N/service_packages/SMS/bin/smsStatsDaemonRep
total processes found = 32 [ 32 expected ]
================================= run-level 3 ==================================
Example - pslist -d
This example shows the pslist command used with the -d parameter. From time to time, processes will be added to or removed from inittab/SLEE. The -d parameter instructs pslist to reconstruct the list.
Command:
$ pslist -d
Result:
Scanning input file.
[ /etc/inittab ]
Scanning input file.
[ /IN/service_packages/SLEE/etc/SLEE.cfg ]
Info: Did not find SLEE config file [ /IN/service_packages/SLEE/etc/SLEE.cfg ]
Does the SLEE application exist on this machine?
<----
############################################################################
# pslist: default process list configuration (plc) file used to match and #
# display running processes. #
# File creation time: Thu Nov 13 04:19:29 GMT 2008 #
# Lines beginning with a hash (#) character are ignored. #
# $1="grouped-apps name (max 5-char)" $2="regex of process" [$3+=comments] #
############################################################################
ACS acs_oper.*\/IN\/service_packages\/ACS\/bin\/acsCompilerDaemon inittab
ACS acs_oper.*\/IN\/service_packages\/ACS\/bin\/acsProfileCompiler inittab
ACS acs_oper.*\/IN\/service_packages\/ACS\/bin\/acsStatisticsDBInserter inittab
CCS ccs_oper.*\/IN\/service_packages\/CCS\/bin\/ccsBeOrb inittab
CCS ccs_oper.*\/IN\/service_packages\/CCS\/bin\/ccsCDRFileGenerator inittab
CCS ccs_oper.*\/IN\/service_packages\/CCS\/bin\/ccsCDRLoader inittab
CCS ccs_oper.*\/IN\/service_packages\/CCS\/bin\/ccsChangeDaemon inittab
CCS ccs_oper.*\/IN\/service_packages\/CCS\/bin\/ccsProfileDaemon inittab
EFM smf_oper.*\/IN\/service_packages\/EFM\/bin\/smsAlarmManager
inittab
OSD acs_oper.*\/IN\/service_packages\/OSD\/bin\/osdWsdlRegenerator inittab
PI smf_oper.*PIbeClient inittab: PI
Manager child process
PI smf_oper.*PIprocess
inittab: PI
Manager child process
PI smf_oper.*\/IN\/service_packages\/PI\/bin\/PImanager inittab
SMS smf_oper.*\/IN\/service_packages\/SMS\/bin\/smsAlarmDaemon inittab
SMS
smf_oper.*\/IN\/service_packages\/SMS\/bin\/smsConfigDaemon inittab
SMS smf_oper.*\/IN\/service_packages\/SMS\/bin\/smsMaster inittab
SMS smf_oper.*\/IN\/service_packages\/SMS\/bin\/smsNamingServer inittab
SMS smf_oper.*\/IN\/service_packages\/SMS\/bin\/smsReportScheduler inittab
SMS smf_oper.*\/IN\/service_packages\/SMS\/bin\/smsReportsDaemon inittab
SMS smf_oper.*\/IN\/service_packages\/SMS\/bin\/smsStatsDaemonRep
inittab
SMS smf_oper.*\/IN\/service_packages\/SMS\/bin\/smsStatsThreshold inittab
SMS smf_oper.*\/IN\/service_packages\/SMS\/bin\/smsTaskAgent inittab
SMS smf_oper.*\/IN\/service_packages\/SMS\/bin\/smsTrigDaemon
inittab ---->
Default process list configuration file created.
[ /IN/service_packages/SMS/tmp/ps_processes.testusms.plc ]
Process configuration
Configuration for Convergent Charging Controller products and processes are made almost exclusively in the file /IN/service_packages/eserv.config.
The file is broken down into sections and subsections, grouped together by {} brackets. Each product comes with an example eserv.config inside their respective <Product>/etc directories, and each configuration option is documented in the associated Technical Guide.
There are some exceptions, notably ACS and SLEE, which have some separate configuration files in /IN/service_packages/ACS/etc/acs.conf and /IN/service_packages/SLEE/etc/SLEE.cfg respectively.
Some NCA interface configuration is also housed in a separate file; for example, for SIGTRAN interfaces (sua_if/m3ua_if) the configuration is often specified in /IN/service_packages/SLEE/etc/sigtran.config or interface _service.config.
Note: Processes also have command line arguments, which are passed in the calling shell script - normally named /IN/service_packages/<Product>/bin/ ProcessName Startup.sh.
Remote Diagnostic Agent
Remote Diagnostic Agent (RDA) is an Oracle cross-product diagnostic tool used to help Oracle engineers in troubleshooting and analyzing issues.
RDA supports Oracle Communications Convergent Charging Controller.
For a more general usage guide of the Remote Diagnostic Agent tool, please refer to the references included in the following sections.
Installing RDA
To install RDA, please review My Oracle Support Note 314422.1.
For consistency across all platforms and Convergent Charging Controller nodes, upload the RDA package to the /IN/service_packages/SUPPORT/ directory and proceed with the installation from this location.
To install RDA on your Convergent Charging Controller nodes:
- Navigate to the directory where you downloaded the RDA package. For example, /IN/service_packages/SUPPORT/
- Uncompress the RDA file as the smf_oper user. This will create a
subfolder named rda in the current folder, containing all files for RDA
running.
Note:
Due to restrictive security policies, RDA should not be installed/run as the root user - smf_oper should have all accesses and permissions it needs.
Configuring RDA
To set up the RDA profile and activate the Convergent Charging Controller module, navigate to the RDA directory and use the following command:
smf_oper@server$./rda.sh -vdSp Com_NCC
The tool will prompt you with a few questions regarding your environment. Most default answers should be sufficient for your environment. However, you must select the Yes option to collect information from your Oracle database. Review each prompt ensuring that the responses are specific for your environment.
Additionally:
- The prompt about ADDM, AWD, and ASH is necessary due to restricted use of these features for licensing reasons.
- The system user should not connect as sysdba if the smf_oper user in your environment does not have sysdba permissions for your Convergent Charging Controller database.
- Configuration, except passwords, is stored by the tool for future running of the RDA tool.
Example RDA Output
Here is an example RDA output:
bash-4.1$ ./rda.sh -vdSp Com_NCC
Setting up ...
-------------------------------------------------------------------------------
S000INI: Initializes the Data Collection -------------------------------------------------------------------------------
RDA uses the output file prefix to identify all files belonging to the same
data collection. The prefix must start with a letter and must contain only
alphanumeric characters.
Enter the prefix to be used for all the generated files
Hit 'Return' to accept the default (RDA)
>
Enter the directory used for all the files to be generated
Hit 'Return' to accept the default (/IN/service_packages/SUPPORT/rda/output)
>
Do you want to keep report packages from previous runs (Y/N)?
Hit 'Return' to accept the default (N)
>
Enter the Oracle home to be used for data analysis
Hit 'Return' to accept the default (/u01/app/oracle/product/12.1.0)
>
Enter the network domain name for this server
Hit 'Return' to accept the default (us.oracle.com)
> -------------------------------------------------------------------------------
S010CFG: Collects Key Configuration Information ------------------------------------------------------------------------------- -------------------------------------------------------------------------------
S090OCM: Set up the Configuration Manager Interface ------------------------------------------------------------------------------- -------------------------------------------------------------------------------
S909RDSP: Produces the Remote Data Collection Reports ------------------------------------------------------------------------------- -------------------------------------------------------------------------------
S919LOAD: Produces the External Collection Reports ------------------------------------------------------------------------------- -------------------------------------------------------------------------------
S999END: Finalizes the Data Collection -------------------------------------------------------------------------------
-------------------------------------------------------------------------------
S100OS: Collects the Operating System Information ------------------------------------------------------------------------------- -------------------------------------------------------------------------------
S105PROF: Collects the User Profile ------------------------------------------------------------------------------- -------------------------------------------------------------------------------
S110PERF: Collects Performance Information -------------------------------------------------------------------------------
Can ADDM, AWR, and ASH be used (Y/N)?
Hit 'Return' to accept the default (Y)
> -------------------------------------------------------------------------------
S120NET: Collects Network Information -------------------------------------------------------------------------------
Do you want RDA to perform the network ping tests (Y/N)?
Hit 'Return' to accept the default (N)
> -------------------------------------------------------------------------------
S122ONET: Collects Oracle Net Information ------------------------------------------------------------------------------- -------------------------------------------------------------------------------
S200DB: Controls Oracle RDBMS Data Collection -------------------------------------------------------------------------------
Is the database associated to the current Oracle home (Y/N)?
Hit 'Return' to accept the default (Y)
>
Enter the Oracle SID to be analyzed
Hit 'Return' to accept the default (SMF)
>
Is the INIT.ORA for the database to be analyzed located on this system?
(Y/N)
Hit 'Return' to accept the default (Y)
>
Enter the location of the spfile or the INIT.ORA (including the directory
and
file name)
Hit 'Return' to accept the default
(/u01/app/oracle/product/12.1.0/dbs/initSMF.ora)
Enter an Oracle User ID (userid only) to view DBA_ and V$ tables. If RDA
will
be run under the Oracle software owner's ID, enter a forward slash (/) here,
and enter Y at the SYSDBA prompt to avoid a prompt for the database password
at runtime.
Hit 'Return' to accept the default (system)
>
Is 'system' a SYSDBA user (will connect as SYSDBA) (Y/N)?
Hit 'Return' to accept the default (N)
> -------------------------------------------------------------------------------
S201DBA: Collects Oracle RDBMS Information ------------------------------------------------------------------------------- -------------------------------------------------------------------------------
S204LOG: Collects Oracle Database Trace and Log Files ------------------------------------------------------------------------------- -------------------------------------------------------------------------------
S491NCC: Collects Network Charging and Control Information -------------------------------------------------------------------------------
Enter the full path of the Network Charging and Control home directory
Hit 'Return' to accept the default (/IN/service_packages)
>
WARNING: RDBMS information is collected from Oracle Database only.
Do you want to collect application information from an Oracle Database
(Y/N)?
Hit 'Return' to accept the default (N)
> Y
Enter the Oracle SID of the database
Hit 'Return' to accept the default (SMF)
>
Enter an Oracle User ID (userid only) to view application specific
database
information
Hit 'Return' to accept the default (smf)
> smf -------------------------------------------------------------------------------
S990FLTR: Controls Report Content Filtering -------------------------------------------------------------------------------
Updating the setup file ...Collecting Data
Use the following recommended flags (or adapt them according to your needs):
smf_oper@server$./rda.sh -vfCRP
where server is the Convergent Charging Controller node where RDA runs and the flags are defined as follows:
- v: verbose
- C: collect
- R: render into html
- P: Package contents of output directory into archive
- f : force execution of all commands
The script will request the system password and may request a user with the statspack tool installed based on your selections during the configuration.
RDA generates multiple files in the output/ folder. A zip archive file containing all of the output files is also generated. Download only the zip file from the server where the RDA report was run from the output/ folder for submission. The script may take several minutes to complete.
Note: Subsequent RDA script execution overwrites the previous reports.
Using Output Immediately
After the archive file is uploaded to Oracle Support, post-processing of the data occurs. The post-processing does not add, remove nor modify the data, it only organizes and applies some formatting. Oracle recommends uploading RDA output files for post-processing. However, it is possible to unzip the file on any computer and directly browse the files.
To optionally view the RDA output immediately before sending the data to Oracle Support, completely unzip the archive and double click on the file named RDA__start.htm. This will open the RDA web interface in your default web browser.
Attaching the ZIP Archive to a Service Request
Upload the generated zip file a previously opened Service Request in My Support.
cmnPushFiles/cmnReceiveFiles
cmnPushFiles is responsible for monitoring a location on the SLC/VWS for new files, and will "push" the files to the SMS.
cmnPushFiles is called from inittab, and will run in run-level 3 and generally runs multiple instances.
Each instance will monitor the EDRs of a certain product or process (for example, MM EDRs created by xmsTrigger, ACS EDRs created by slee_acs), however it can also be used to push expiry messages or notifications between machines.
In order for cmnPushFiles to successfully "push" files to the SMS, the network service cmnReceiveFiles must be configured on the SMS in /etc/inetd.conf and /etc/services
cmnPushFiles is crucial to the EDR processing chain, and if it is not running or configured incorrectly, then files will build up on the SLC/VWS indefinitely until the system runs out of disk space.
Example - PushFiles
Consider this sample output from a VWS:
$ ps -ef | grep Push
ebe_oper 12479 … cmnPushFiles -d /IN/service_packages/E2BE/logs/CDR-out -r /IN/service_packages/
ccs_oper 12519 … cmnPushFiles -d /IN/service_packages/CCS/logs/expiryMessage/ -r /IN/service_pac
ccs_oper 12480 … cmnPushFiles -d /IN/service_packages/CCS/logs/wallet -r /IN/service_packages/CC
ccs_oper 12482 … cmnPushFiles -d /IN/service_packages/CCS/logs/ccsNotificationWrite/ -r /IN/serv
The command response shows there are four instances of cmnPushFiles running.
Using the arguments given to the process, what the process is responsible for can usually be determined:
$ pargs 12479
12479: cmnPushFiles -d /IN/service_packages/E2BE/logs/CDR-out -r /IN/service_packages/
argv[0]: cmnPushFiles
argv[1]: -d
argv[2]: /IN/service_packages/E2BE/logs/CDR-out
argv[3]: -r
argv[4]: /IN/service_packages/CCS/logs/CDR-in
argv[5]: -h
argv[6]: usms.CdrPush
argv[7]: -F
Here we see this cmnPushFiles is taking completed EDRs from CDR-out on the VWS and sending them to CDR-in on the SMS.
Space issues
If the cmnPushFiles log file (/IN/service_packages/E2BE/tmp/cmnPushFiles), or the syslog is reporting insufficient space, checking available space in CDR-out on the VWS and CDR-in on the SMS will be the first step to diagnosing the problem.
Core files
When monitoring a platform, or investigating issues, it is important to check for core files.
Processes running from inittab will be automatically restarted by Solaris, and processes running inside the SLEE will be restarted by the watchdog if they stop running.
If a process cores due to a recurring traffic scenario, it will be restarted and continue to core until the mount point runs out of disk space.
Core file location
The location of core files differs depending on configuration, and how the process was started.
The first thing to check is the output of coreadm, which specifies how the operating system will handle core files.
Multiple core locations
In this example, core files will write to the directory they were called from (in the case of SLEE processes, this will be /IN/service_packages/SLEE/bin), and will be named simply core. In this situation, the majority of /IN/service_packages will need to be checked for core files.
$ coreadm
global core file pattern:
init core file pattern: core
global core dumps: disabled
per-process core dumps: enabled
global setid core dumps: disabled
per-process setid core dumps: disabled
global core dump logging: disabled
Single core location
However, if configured as in this example, all core files will be written to one central location (often on a separate mount point). In this situation, only one directory/mount needs to be checked.
This can also reduce the risk of an important mount point getting filled up with core files.
$ coreadm
global core file pattern: /var/crash/core-%n-%p-%f
global core file content: default
init core file pattern: core
init core file content: default
global core dumps: enabled
per-process core dumps: disabled
global setid core dumps: enabled
per-process setid core dumps: disabled
global core dump logging: enabled
Diagnostic information
Processes that core can be a risk to the platform for many reasons, and should be dealt with as quickly as possible.
In general they indicate a software fault that will require investigation by Oracle Engineering, so it is important to collect the following diagnostic information:
Gdb backtrace
In order for Oracle Engineering to investigate a core file, the most important piece of information (apart from the core itself) is the gdb backtrace.
Follow these steps to collect the backtrace.- If not possible from the filename itself, determine what process created
the core, using the file
command.
$ file core core: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from '/IN/service_packages/ACS/bin/slee_acs' - Open the core using gdb, with the original binary and the core file as
arguments.
Note:
The exact binaries and libraries that generated the core file are required. If the product version has changed, it is unlikely gdb will be able to interpret the core correctly.$ gdb /IN/service_packages/ACS/bin/slee_acs core GNU gdb (Red Hat Enterprise Linux) 14.2-3.0.1.el9 Copyright (C) 2023 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from slee_acs... warning: Can't open file /usr/lib64/libgcc_s-11-20231218.so.1 during file-backed mapping note processing [New LWP 473485] warning: Build-id of /lib64/libstdc++.so.6 does not match core file. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/IN/service_packages/ACS/bin/slee_acs'Result: Eventually you will be presented with the most recent frame of the core, the signal which ended the process, and a (gdb) prompt.
Program terminated with signal 10, Bus error. #0 0xfe2d6328 in _smalloc () from /lib/libc.so.1 (gdb) - To view all frames in the core, initiate a summary backtrace by typing
btat the prompt, see Example summary backtrace.(gdb) bt - To view all frames and all their information in the core, initiate a full
backtrace by typing
bt fullat the prompt, see Example full backtrace .(gdb) bt fullNote:
This information will need to be provided to Oracle Support for further investigation.
Example summary backtrace
Initiate a summary backtrace by typing bt at the
prompt, all frames in the core will be shown:
(gdb) bt
#0 0xfe2d6328 in _smalloc () from /lib/libc.so.1
#1 0xfe2d639c in malloc () from /lib/libc.so.1
#2 0xfef63450 in operator new (sz=4) at new_op.cc:48
#3 0xfd318fa4 in cmn::escher::Array::push_back (this=0x18c1ce8, val=@0xffbfd030) at
cmnEscherEntry.hh:229
#4 0xfdc55248 in ccs::Message::CDR::appendFromString (this=0xffbfd0d8, fields=
{static npos = 4294967295, _M_dataplus = {<allocator<char>> = {<No data fields>}, _M_p =
0x18c08c4 "CLI=10101010101|ACS_CUST_ID=12|PC_AC=1|PC_PRC=1|TZ=NZ|PC_SCD=D07"}, static
_S_empty_rep_storage = {0, 0, 0, 0}}) at /volB/DEV_BASE/nondebug/CCS/include/ccsMessage.hh:1581
#5 0xfdd030ac in fox::ExtendedWalletUpdate::doAction (this=0x1a13cb0, request=@0x1,
responseRequired=@0xffbfeb40, actionResponse=0x29c00,
context=@0xffbfd0e0, serviceContext=@0x19eb4b8, parms=@0xffbfd2b0) at /opt/gcc
3.2.3/include/c++/3.2.3/bits/stl_alloc.h:664
#6 0xfdc47fac in fox::FOXActionHandler::doAction (this=0x1a13cb0, request=@0xffbfeb40,
responseRequired=@0xffbfd21f, actionResponse=0x1a8c890,
context=@0x19eb4b8, parms=@0xffbfd2b0) at FOXActionHandler.cc:1891
#7 0xff283128 in acsActionsAPI::ActionHandler::doAction (this=0x1a13cb0, parms=@0xffbfd2b0) at
acsActionHandler.cc:271
#8 0x000e7df4 in acsChassisInvokeAPluggableAction (event=0xffbfeb38, context=0x1a8c880,
actionStack=0x18c6588, result=0x1a8c888, callEnded=0xffbfd404,
waitingForExternal=0xffbfd400, logErrorIfNotFound=1) at acsPluggableChassisAction.cc:358
#9 0x000e7670 in acsChassisInvokePluggableAction (event=0xffbfeb38, context=0x1a8c880,
actionStack=0x18c6588, result=0x1a8c888, callEnded=0xffbfd404,
waitingForExternal=0xffbfd400) at acsPluggableChassisAction.cc:253
#10 0x00076a38 in acsSLEEChassis_t::doAction (this=0x18c6580, action=@0xffbfeb38,
actionYields=@0xffbfe957, actionExpectsResponse=@0xffbfe956)
at acsChassis.cc:3854
#11 0x000723c4 in acsSLEEChassis_t::processCall (this=0x18c6580, context=0x1a8c880) at
acsChassis.cc:2498
#12 0x0006f9c8 in acsSLEEChassis_t::main (this=0x18c6580) at acsChassis.cc:1822
#13 0x0005b3c8 in main (argc=1, argv=0xffbff80c) at slee_acs.cc:134
Example full backtrace
Initiate a full backtrace by typing bt full at the
prompt; all frames and all information contained in them will be
shown. This can sometimes be many pages, and can sometimes result
in endless junk information - collect as much as appears useful.
The example below causes gdb to crash after the 5th frame:
(gdb) bt full
#0 0xfe2d6328 in _smalloc () from /lib/libc.so.1
No symbol table info available.
#1 0xfe2d639c in malloc () from /lib/libc.so.1
No symbol table info available.
#2 0xfef63450 in operator new (sz=4) at new_op.cc:48
p = (void *) 0x4
#3 0xfd318fa4 in cmn::escher::Array::push_back (this=0x18c1ce8, val=@0xffbfd030) at
cmnEscherEntry.hh:229
this = (Entry * const) 0x18c1ce8
this = (class ArrayImpl * const) 0x18c1ce8
val = (const Map &) @0xffbfd030: {pimpl = {rep = 0x0}}
#4 0xfdc55248 in ccs::Message::CDR::appendFromString (this=0xffbfd0d8, fields=
{static npos = 4294967295, _M_dataplus = {<allocator<char>> = {<No data fields>}, _M_p =
0x18c08c4 "CLI=10101010101|ACS_CUST_ID=12|PC_AC=1|PC_PRC=1|TZ=NZ|PC_SCD=D07"}, static
_S_empty_rep_storage = {0, 0, 0, 0}}) at /volB/DEV_BASE/nondebug/CCS/include/ccsMessage.hh:1581
field = {pimpl = {rep = 0x1a25af0}}
key = {static npos = 4294967295, _M_dataplus = {<allocator<char>> = {<No data fields>},
_M_p = 0x1a80f94 "PC_SCD"}, static _S_empty_rep_storage = {
0, 0, 0, 0}}
val = {static npos = 4294967295, _M_dataplus = {<allocator<char>> = {<No data fields>},
_M_p = 0x1aa7d3c "D07"}, static _S_empty_rep_storage = {0,
0, 0, 0}}
cdrEntry = {static npos = 4294967295, _M_dataplus = {<allocator<char>> = {<No data
fields>}, _M_p = 0x1a80eec "PC_SCD=D07"},
static _S_empty_rep_storage = {0, 0, 0, 0}}
equals = 4290760752
start = 4290760768
end = 64
#5 0xfdd030ac in fox::ExtendedWalletUpdate::doAction (this=0x1a13cb0, request=@0x1,
responseRequired=@0xffbfeb40, actionResponse=0x29c00,
context=@0xffbfd0e0, serviceContext=@0x19eb4b8, parms=@0xffbfd2b0) at /opt/gcc
3.2.3/include/c++/3.2.3/bits/stl_alloc.h:664
cdr = {<Array> = {pimpl = {rep = 0x1a28d40}}, <No data fields>}
parms = (acsChassisActionParms &) @0x1: <error reading variable>
ewur = (class ExtendedWalletUpdateRequest *) 0xffbfeb40
balanceInfoArray = {<Array> = {pimpl = {rep = 0x1a28cb8}}, <No data fields>}
addBalanceInfoArray = true
sbbia = (class SmallBalanceBucketInfoArray
Segmentation Fault (core dumped)
Memory leaks
While monitoring the platform, it may be determined that a certain process is constantly increasing in memory, indicating a memory leak.
Memory leaks can be a great risk to the platform, as other processes will struggle to run if the machine does not have a sufficient amount of free memory. In low memory situations the OS will start paging information in and out of memory, causing a performance impact, and system instability.
A slow leak may pose little danger to the platform; however it is prudent to investigate sooner rather than later. In general leaks indicate a software fault that will require investigation by Oracle Engineering, so it is important to collect the following diagnostic information as soon as possible:
Diagnosing Memory Libraries
To check the memory libraries:
- Log in as the root user.
- Open the startup script. Add the following entries:
MALLOC_CHECK=3 export MALLOC_CHECK_Result: The process will abort with a core file when a memory check fails.
- Log out of the root user.
Log files
All Convergent Charging Controller processes write to their own log file, usually /IN/service_packages/<Product>/tmp/ Process .log.
They will also write errors to the syslog, which generally has a longer retention period than log files. Log files are maintained by smsLogCleaner, which runs from each user’s crontab using configuration in /IN/service_packages /Product /etc/logjob.conf usually once per hour.
Logs are archived to /IN/service_packages/ Product /tmp/archive/ and usually kept for seven days (configurable on the command line).
When a process is put in debug, this extra information is written to the log file only.
Note: Files archived by smsLogCleaner can have their names changed.
Debug
All Convergent Charging Controller processes contain debug flags, which can be used to collect useful diagnostic information in the event of issues.
This is done in two main ways:
- By specifying debug flags in the startup script - which results in debug for all processing as long as the process is up.
- By setting tracing parameters inside configuration files.
The first is available to all Convergent Charging Controller processes, the second to a select few traffic handling applications which require more targeted debugging.
Startup flags
After locating the process startup script, debug flags can be specified via environment variable (debug statement highlighted):
$ vi slee_acs.sh
#!/usr/bin/ksh
DEBUG=all,-COMMON_escher,-COMMON_escher_detail,-COMMON_FD,-COMMON_Utils,-slee_api
export DEBUG
exec /IN/service_packages/ACS/bin/slee_acs >>
/IN/service_packages/ACS/tmp/slee_acs.log 2>&1
The flags available differ by process, and generally Oracle Support will advise the flags required. DEBUG=all covers all debug defined in the process, but will be quite verbose so should be limited.
Flags can be subtracted from "all" or individual flags specified.
Note:
You can change the time zone for debug message timestamps by setting the environment variable in each associated startup script. Example:
DEBUG_TZ=America/Costa_Rica
export DEBUG_TZ
Available flags
To find out all the options available to a specific process, use the strings command along with grep.
For example type:
$ strings slee_acs | grep cmnDebug_FLAG
Result: All the flags available are listed.
cmnDebug_FLAG_Engine
cmnDebug_FLAG_Chassis
cmnDebug_FLAG_ACS_Chassis_CdrWrite
cmnDebug_FLAG_slee_acs
cmnDebug_FLAG_misc
cmnDebug_FLAG_COMMON_Utils
cmnDebug_FLAG_COMMON_Utils_cmnUnit
cmnDebug_FLAG_acsChassisSLEE
cmnDebug_FLAG_acsNOA
cmnDebug_FLAG_acsAWOL
cmnDebug_FLAG_acsCommon
cmnDebug_FLAG_acsCdr
cmnDebug_FLAG_Config
cmnDebug_FLAG_ConfigFileImpl
cmnDebug_FLAG_cmnPrefixTree
cmnDebug_FLAG_COMMON_cmnTime
cmnDebug_FLAG_cmnAssert
cmnDebug_FLAG_ACS_NotifIF
Note:
The cmnDebug_FLAG_ prefix part is assumed by debug so can be left off when configuring the Debug command.Flags to avoid
The following flags are used by the majority of processes, and result in a lot of debug.
They are recommended to be removed unless otherwise requested.
- COMMON_escher[_detail]
- COMMON_FD
- COMMON_Utils
- slee_api
Selective tracing
Selective debug is available to some of the more important real-time traffic handling processes. These include:
- slee_acs
- beVWARS
- xmsTrigger
In each case, a configurable tracing section contains a list of criteria for tracing (A-party and B-party for slee_acs, walletid for beVWARS), and will temporarily switch to debug for the duration of the triggering event.
Configuration can be made in eserv.config in
the tracing{} section of the process, which is
explained in full detail in the technical guides.
Once set, the process can be sent a SIGHUP signal to re-read its configuration, including the tracing section.
Tracing example
For example, here is an ACS tracing{} section for slee_acs:
tracing = {
# Is tracing enabled? (default false)
enabled = true
# Originating Addresses that we want to trace
origAddress = [
"12345"
]
# Destination Addresses that we want to trace
destAddress = [
"12345"
]
# What debug level should the tracing be at?
traceDebugLevel = "all"
}
xmsTrigger tracing
xmsTrigger tracing is set in the same fashion; however the resulting information goes to a separate file xmsTrigger.trc, does not contain debug, but does capture all the major decision points in a transaction.
Trace points are defined as:
Input
- Message received from network With which addresses?
- Message decoding information
- Do we allow alternate delivery?
- Which protocol version is this?
- What was the message text (if showPrivate)?
- Message passed to Messaging Manager
- Result from ParentContext::handleSMSubmit?
- Response received from MM
- Response sent to network
Output
- SMSubmit received from Messaging Manager
- Is the delivery type SME or MC?
- Do we need to consult a third party (for example, HLR) for any reason?
- What are the addresses involved?
- Outgoing encoding information
- Which protocol version are we using?
- Message sent to network
- Response received from network
- Response sent to Messaging Manager
Snoop traces
When dealing with issues related to real-time traffic handling, it is imperative to have reference snoop traces to observe the behavior of the Convergent Charging Controller software at the network/signaling level.
This information allows analysis of incoming messages, the responses sent back and the timing. Each standard is thoroughly documented and must conform to the appropriate specifications.
Snoop traces allow there to be no uncertainty about the conversation between the Convergent Charging Controller platform and external components.
Running a snoop trace
Snoops are initiated as the root user. Command line arguments give the user a fair amount of control over what gets collected; from the interface to the port and transport protocol.
At a rudimentary level, snoop can be instructed to display all incoming traffic for an interface. However, it is more useful to first determine what traffic is required (the more detail the better) and save to a file for analysis in a trace interpreter.
To see a list of all the snoop command line parameters, type:
$ man snoop
This gives a full list, with definitions.
Snoop example
In this example, diameterControlAgent has a handle on the local address 172.21.153.142 on port 3868. Using ifconfig, this is shown to be on interface e1000g1.
Note:
Network Connectivity Agents (NCAs) commonly use more than one interface for receiving/sending information. There are failover and loadsharing scenarios where this is required. The groupname specified will sometimes indicate the type of traffic, for example, "SIG-A" and "SIG-B" shows that more than one interface is used for SIGTRAN.First, determine the interface the target process is attached to. This can be achieved by checking the output of ifconfig, inspecting the process with pfiles and cross-checking the results as highlighted:
$ ps -ef | grep diameterControlAgent
acs_oper 160 1 0 Oct 20 ? 251:34 diameterControlAgent
$ pfiles 160 | grep sock
sockname: AF_UNIX /tmp/dcaIf-0.0.112.20101020123758
sockname: AF_INET 0.0.0.0 port: 3868
sockname: AF_INET 172.21.153.142 port: 3868
sockname: AF_INET 172.21.153.142 port: 3868
$ ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 172.21.153.82 netmask ffffffc0 broadcast 172.21.153.127
groupname mgmt
e1000g0:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500
index 2
inet 172.21.153.80 netmask ffffffc0 broadcast 172.21.153.127
e1000g1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
inet 172.21.153.142 netmask ffffffe0 broadcast 172.21.153.159
groupname chrg
e1000g1:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500
index 3
inet 172.21.153.140 netmask ffffffe0 broadcast 172.21.153.159
e1000g2: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4
inet 172.21.5.100 netmask ffffff00 broadcast 172.21.5.255
groupname sig
e1000g2:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500
index 4
inet 172.21.5.104 netmask ffffff00 broadcast 172.21.5.255
e1000g3: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 5
inet 172.21.205.27 netmask ffffff00 broadcast 172.21.205.255
nxge0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 6
inet 172.21.153.81 netmask ffffffc0 broadcast 172.21.153.127
groupname mgmt
nxge1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 7
inet 172.21.153.141 netmask ffffffe0 broadcast 172.21.153.159
groupname chrg
nxge2: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 8
inet 172.21.5.101 netmask ffffff00 broadcast 172.21.5.255
groupname sig
Information level of detail
To collect all information on the example interface, we can use
the -d argument along with -o to get an
output snoop file for our interpreter to use:
$ snoop -d e1000g1 -o diameterControlAgent.snoop
However, to target the snoop even more, we can also restrict to port 3868 using the -c argument.
$ snoop -d e1000g1 -c tcp port 3868 -o diameterControlAgent.snoop
Note:
tcp is assumed as Diameter is a tcp protocol.To run a snoop for an extended period of time, it can be called
with nohup or suffixed with & to have
it run in the background. In this situation it is recommended to
also use the -q argument, which suppresses the packet
count.
Snoop interpreter
Once a snoop has been collected, an interpreter can be used to view the packets in a graphical interface.
Wireshark is one such widely used protocol analyzer, and contains plugins for decoding many telephony protocols, including:
- INAP
- Camel
- MAP
- Diameter.
Wireshark contains many useful features, which are outside of the scope of this document. In general, it will work quite well out of the box, automatically recognizing and decoding protocols without need for special configuration. For more information, see the Wireshark website www.wireshark.org.
Process failure
You can check whether a process is restarting using the SMS Alarms subsystem.
Processes raise alarms when they are stopped or started. The alarms include:
- Their name
- The time the alarm was logged
- Some other information about why the event may have occurred
Further information about the specific alarm can be found in the application's alarms guide.
Alarms can be accessed from the:
- Syslog on the local machine and the SMS(s). For more information, see SMS Technical Guide.
- Alarms tab in the SMS Alarms Management screen. For more information, see SMS User's Guide.
Checking installed packages
To check the details of an installed package, use the
pkginfo command.
pkginfo -l
smsSms
Example output: This is an example of the output of the example command above.
PKGINST: smsSms NAME: Oracle smsSms CATEGORY: application
ARCH: sun4u VERSION: 3.1.0 VENDOR: Oracle PSTAMP:
smsNode20041020104925 INSTDATE: Oct 20 2004 13:15 EMAIL:
support@oracle.com STATUS: completely installed FILES: 348
installed pathnames 39 directories 89 executables 152448 blocks
used (approx)For more information about the pkginfo utility, see
the system documentation.
Checking access to Oracle database
A number of services and functions rely on access to the Oracle database. To check that the Oracle database is available to a service, check the following:
- Use sqlplus to check that you can log in to the Oracle database with the username
and password the service is using to connect.
Example command:
sqlplus smf/smf
- Where the tables required for a service are known, use SQL
queries to check that:
- The tables exist
- The tables have appropriate content
For more information about SQL queries, see the Oracle documentation.
Checking network connectivity
Network connectivity will affect any process which requires communication between two different network addresses.
Network connectivity should support SSH sessions between the two machines experiencing the problem.
If you can open an SSH session between the two machines, check the following before contacting Level 1 support with details:
- If the address of either of the machines specified in the Node Management screens is a hostname, check that the hostnames used in the SSH sessions are the hostnames specified in the Node Management screen.
If you cannot SSH, check the following before contacting Level 1 support with details:
- Check that the hostname is resolving correctly in the DNS.
- Check that the physical network connection is working correctly.
- Check that the inetd and sshd are running.
- Check that sshd is listening on the expected port.
- Check that the smf_oper and acs_oper accounts are not locked, and that the username and password combinations being used are correct.
Replication
Replication may be failing for the following reasons:
- SSH keys have not been correctly set up between origin and destination machines.
- The destination node has been incorrectly set up in the Node Management screens of the SMS Java screens.
- Oracle is not running correctly.
- A new replication.cfg file has not been created after a change.
- replication.cfg may not be successfully copying to the destination machine (an error should display when the Create Config File button on the Node Management screens is clicked).
- The partition on the destination machine where the data is being replicated to may be full.
- The updateLoader on the destination machine may be running incorrectly.
- The destination database may be substantially out of sync with the SMF. Run a resync.