Troubleshooting ECE

76 Troubleshooting ECE

Learn how to troubleshoot Oracle Communications Billing and Revenue Management Elastic Charging Engine (ECE).

Topics in this document:

ECE Troubleshooting Checklist

Check the following to determine the state of the ECE system:

Verify nodes are running by using a JMX editor such as JConsole.
Verify the state machine state of nodes.

To start processing requests, ECE goes through different states, such as startup, configuration, update, and processing and so on. A node that is in a state other than USAGE_PROCESSING will not process usage requests.
Verify the charging-server health threshold.

The number of running nodes may have gone below the threshold. See "Configuring the Charging-Server Health Threshold" for information.

Collecting Diagnostic Information

For collecting diagnostic information, turn on the ECC feedback mode which produces extra information when running commands. For example:

ecc:000>set feedback true

The feedback mode setting is saved in your local profile so you do not need to set it every time you start ECC.

Collecting Log Files for Sending to Oracle Technical Support

Use the infoCollector ECC command to collect log files from your ECE system. The command copies log files from all server machines and the driver machine and stores them in a TAR file, which you can send to Oracle technical support.

The infoCollector command does not collect files from clients such as Offline Mediation Controller.

To collect log files:

(Optional) Create a directory to put the collected log files.

If you do not specify a directory, the command puts the collected log files in ECE_home.
Log on to the driver machine.
Go to ECE_home/bin directory.
Run the following command:
```
./ecc
```
From the command prompt, run the infoCollector command.
```
infoCollector
```
See "infoCollector Syntax" for information about syntax and arguments for running the command.

If you run the infoCollector command with no arguments, the following occurs:
- From each server machine in your topology, the following file and directories are copied to the following location on the driver machine (where user_home is the user home directory of the driver machine and server_host is the IP address or host name of the server machine as it is defined in the ECE_home/config/eceTopology.conf file):
  - user_home/server_host/VERSION
  - user_home/server_host/logs
  - user_home/server_host/config
  - user_home/server_host/brm_config
  - user_home/server_host/odi_transformation
- From the driver machine, the following file and directories are copied to the following location on the driver machine (where user_home is the user home directory of the driver machine and driver_host is the IP address or host name of the driver machine as it is defined in the ECE_home/config/eceTopology.conf file):
  - user_home/driver_host/VERSION
  - user_home/driver_host/config
  - user_home/driver_host/brm_config
- A compressed TAR file is created with the extension tar.gz in the user home directory of the driver machine (for example, user_home/info_collector.tar.gz).

For ECE processes started by running the ecc command, the GC debug logs are enabled by default. The GC debug log files are stored in the ECE logs directory (ECE_home/logs).

If you suspect a problem with these processes, you can look in the ECE_home/logs/Instance_Name_GC.log files for errors, where Instance_Name is the name of the ECE process-node instance (the name of the process you defined in the ECE topology file) that you need to troubleshoot. For example, look in emGateway1_GC.log. By default, four GC log files are made available in the directory, you can change the number of GC log files available by setting the numberOfGCLogFiles in the ECE_home/config/ece.properties file.

You can also disable the generation of GC log files by setting the enableGCLogs entry in the ECE_home/config/ece.properties file.

infoCollector Syntax

The infoCollector command syntax is:

infoCollector [-v] [-nc] [-l] [-gc][-d dir] [-t]["SUBSCRIPTION|SUBSCRIPTION.SESSION IDENTIFIER", "..."] [-td][-s] [-e "file_filter", "file_filter2", "..."]

where:

-v outputs information pertaining to the collected files.
-nc does not compress the resulting directory into a TAR file.
-l includes all log files into the collection. If -l is not used, the command only collects those log files that match the node-name with the .log suffix that you specify.
-gc collects all the GC log files.
-d dir specifies the directory to hold the data.
-t adds all the trace files matching the subscription session identifiers to the collection.
-t "SUBSCRIPTION|SUBSCRIPTION.SESSION IDENTIFIER", "..." adds all the trace files matching the subscription session identifiers to the collection.
-td dir collects all the trace files from the provided directory or location.
-s includes all files from the ECE_home/sample_data directory.
-e "file_filter" is the path name or its pattern of custom directories or files you want to collect and include in the compressed TAR file.

Separate multiple filters with a comma.

For example, assume in your ECE_home directory you have the files notes.txt, comments_1.txt, comments_2.txt, and comments_3.txt, and you also have a directory named observations that contains the files observation1.txt, observation2.txt, and observation3.txt.

The path name pattern can be either an explicit file name such as notes.txt, an entire directory tree such as observations, or a wildcard * such as shown for the files that begin with comments in the following example:
```
infoCollector -e "/home/example/ECE_11.2.0.2/notes.txt", 
"/home/example/ECE_11.2.0.2/observations", "/home/example/ECE_11.2.0.2/comments*"
```
The preceding infoCollector command would put all custom files and directories into a directory named extra_files and the resulting collection of files would have the following directory structure:
```
infoCollector_2014-05-07T11:02:10
  localhost-DRIVER
    extra_files
      observations
        observation3.txt
        observation2.txt
        observation1.txt
      notes.txt
      comments_3.txt
      comments_2.txt
      comments_1.txt
    config
    brm_config
    VERSION
```

Troubleshooting Performance Issues by Using Coherence JMX Metrics

ECE provides Coherence metrics that can help you troubleshoot performance problems and performance tuning, and isolate hardware issues.

To troubleshoot performance issues by using Coherence JMX metrics:

Access the ECE configuration MBeans in a JMX editor, such as JConsole. See "Accessing ECE Configuration MBeans".
Expand the Coherence node.
Expand Service.
View the Coherence JMX metrics that apply to your troubleshooting scenario:
- Expand InvocationService, select the appropriate node, and expand Attributes.
  
  The TaskCount attribute specifies the number of request batches received by the node.
  
  The TaskAverageDuration attribute specifies the average request batch latency for the node.
- Expand BRMDistributedCache, select the appropriate node, and expand Attributes.
  
  The RequestTotalCount attribute specifies the following, depending on the request type:
  - For Initiate, Update, Terminate, and Cancel requests, it specifies the number of entry processor invocations.
  - For Auth Query requests, it specifies the number of get() operations.
- The RequestAverageDuration attribute specifies the following, depending on the request type:
  - For Initiate, Update, and Terminate requests, it specifies entry processor latency.
  - For Auth Query requests, it specifies get latency.

To reset the attribute values, expand Operations, select the resetStatistics operation, and then click the resetStatistics button.

Troubleshooting Failed Usage Requests

ECE may occasionally fail to process usage requests. For example, a usage request could fail because the customer does not own a relevant charge offer. To help you troubleshoot the reason for a failure, ECE, HTTP Gateway, and Diameter Gateway use log4j to collect information about failed usage requests, such as:

Reason for the failure
Customer identifier
Session identifier

For example, log4j could log the following information about a failed usage request in Diameter Gateway:

ERROR -  -  -  - Failing Usage Request for subscriber ID : <<PUID>>, session ID :<<Session ID>, 
reasons : [NO_QUALIFIED_CHARGE_OFFERS, ZERO_RUM_QUANTITY]

You can use these log files to determine the reason for a failure, so you can fix any issues before reprocessing the usage request.

Note:

Log4j logs usage request errors for ECE, HTTP Gateway, and Diameter Gateway even when debug mode is not enabled.

Troubleshooting Problems with Rating

ECE returns a failure message with a new reason code, NO_RATING_GRAPH_CONFIGURED, if the graph for the RUM configured for rating could not be found in the path specified. However, in multiple RUMs scenario, if the graph could not be found for any of the RUMs configured for rating, ECE uses the NO_RATED_QUANTITY reason code to indicate the rating failure.

By default, if the graph is missing, ECE considers it as an error and returns a failure message with the NO_RATING_GRAPH_CONFIGURED reason code. However, you can configure ECE to not consider this as an error by setting the treatNoRatingGraphAsError attribute in the ECE_home/config/management/charging-settings.xml file to false. When set to false, ECE reports the NO_RATING_GRAPH_CONFIGURED reason code in its response and continues rating.

To skip missing graph and continue rating:

Access the ECE configuration MBeans in a JMX editor, such as JConsole. See "Accessing ECE Configuration MBeans".
Expand the ECE Configuration node.
Expand charging.server.
Expand Attributes.
Set the treatNoRatingGraphAsError attribute to false.

Troubleshooting Problems with Rerating

Rerating errors are handled as follows for each rerating stage:

In the prepare to rerate stage

Errors are logged in the CustomerUpdater.log file. No acknowledgement is sent so the acknowledgement queue is empty.
During rerating

Errors are logged in the emGateway.log.
In the rerate complete stage

Errors are logged in the CustomerUpdater.log file. ECE sends a notification to the BRM server using BRM Gateway to create a new rerate job.

Diameter Gateway Error Codes

This appendix describes Oracle Communications Billing and Revenue Management Elastic Charging Engine (ECE) Diameter Gateway error codes.

Topics in this document:

For more information, see the Result-Code AVP section in the IETF website at: https://tools.ietf.org/html/rfc3588#section-7.1.

Troubleshooting a Corrupted ECE Configuration File

When you configure ECE charging, subscriber preferences, and so on, ECE stores the information in the ECE_home/config/management/charging-settings.xml file. This file is loaded into cache when ECE is started, and run-time changes can be made using a JMX editor, such as JConsole.

If the charging-settings.xml file is either accidentally deleted or corrupted while the ECE system is up and running, you can rebuild the file from the data available in cache. The rebuilt file will contain the original configuration with any run-time configuration changes.

To rebuild the charging-settings.xml file from the data available in ECE cache:

Access the ECE configuration MBeans in a JMX editor, such as JConsole. See "Accessing ECE Configuration MBeans".
Expand the ECE Configuration node.
Expand systemAdmin.
Expand Operations.
Click rebuildChargingSettingsFile.

The operation rebuilds the charging-settings.xml file and saves it to the path ECE_home/config/management/.

Troubleshooting JVM and Coherence

When troubleshooting ECE, you may need to troubleshoot JVM and Coherence. For troubleshooting JVM issues, note the following:

See the Java Help Center website:

https://www.java.com/en/download/help/index.html
Java VM arguments and properties are defined in the ECE_home/config/ece.properties file.
Java tuning profiles for nodes are defined in the ECE_home/config/defaultTuningProfile.properties file.
When ECE is running, you can use JConsole to access JMX information provided by the JVM. See "Using JConsole" in Java Platform, Standard Edition Monitoring and Management Guide for more information.

For troubleshooting Coherence issues, note the following:

You can access runtime information about Coherence through JMX, specifically through the Coherence MBeans (which are used to manage and monitor different parts of Coherence).

See "Oracle Coherence MBeans Reference" in Oracle Fusion Middleware Managing Oracle Coherence for more information.
See the Oracle Coherence Knowledge Base website:

https://coherence.java.net/

Troubleshooting Failed Diameter-Message Processing in Diameter Gateway

If you suspect a problem with how Diameter Gateway nodes are processing Diameter messages, look in the ECE_home/logs/instance_name.log files for errors, where instance_name is the name of the Diameter Gateway-node instance (a name you defined in the ECE topology file) that you need to troubleshoot. For example, look in diameterGateway1.log.

To set log levels for Diameter Gateway nodes, obtain the Diameter Gateway module names in the ECE_home/config/log4j2.xml file, and then set the log levels by module as described in "Reading Log Files".

Diameter Gateway returns all Diameter result codes (Result-Codes) as part of the Credit Control Answer (CCA) message. When an error occurs, the error ID and name are returned in the result code. For example, if the CCR was missing an Event-Timestamp AVP, the error would be:

DiameterTalk Answers =[
Diameter Message: CCA
Version: 1
Msg Length: 144
Cmd Flags: PXY
Cmd Code: 272
App-Id: 4
Hop-By-Hop-Id: 1497412149
End-To-End-Id: 734750287
  Session-Id (263,M,l=11) = 111
  Result-Code (268,M,l=12) = DIAMETER_MISSING_AVP (5005)
  Origin-Host (264,M,l=24) = dgw1.example.com
  Origin-Realm (296,M,l=19) = example.com
  Auth-Application-Id (258,M,l=12) = 4
  CC-Request-Type (416,M,l=12) = INITIAL_REQUEST (1)
  CC-Request-Number (415,M,l=12) = 0
  Failed-AVP (279,M,l=20) = 
    Event-Timestamp (55,M,l=12) = 3627391363 (Fri Dec 12 08:42:43 PST 2014)

For the error, an error message is written to the diametergatewayInstance_Name.log file that indicates the nature and stack trace for the error.

For information about Diameter Gateway result codes, see "Diameter Gateway Error Codes".

Diameter Gateway nodes must be started after the customer data is loaded into the ECE grid; otherwise, they cannot process Diameter requests.

Troubleshooting Failed RADIUS-Message Processing in RADIUS Gateway

If you suspect a problem with how RADIUS Gateway nodes are processing RADIUS messages, see the following files for errors that you need to troubleshoot:

ECE_home/logs/Instance_Name.log files, where Instance_Name is the name of the RADIUS Gateway-node instance (a name you defined in the ECE topology file); for example, ECE_home/logs/radiusGateway1.log.
Charging-server node log files; for example, ECE_home/logs/ecs1.log.

To set log levels for RADIUS Gateway nodes, obtain the RADIUS Gateway module names in the ECE_home/config/log4j2.xml file, and then set the log levels by module as described in "Reading Log Files".

RADIUS Gateway returns all the results as part of the reply-message attribute-value pair (AVP) in the RADIUS response. For example, if the user password in the authentication request is incorrect, the following error message is returned in the RADIUS response:

Session_Timeout AVP after Deletion : null2016-03-07 23:37:58.896 PST DEBUG -  -  -  - ECE Radius server - Sending the response to client
 Code: Access-Reject(3)
 Identifier: 0
 Length: 20
 Authenticator: 0x00000000000000000000000000000000
 Reply-Message: RadiusGatewayMessagesBundle-31015: Incorrect password from User
 User-Name: 0049100033