36 Troubleshooting Performance

Learn how to troubleshoot performance problems in Oracle Communications Billing and Revenue Management (BRM).

Topics in this document:

Troubleshooting Poor Performance

When troubleshooting poor performance, first consider the following:

  • Under-configured hardware.

  • Inefficient table layout.

  • Database bottlenecks.

  • Inefficient custom application code.

  • Repeated runtime errors resulting from configuration problems.

In addition, you can look for different problems depending on whether CPU utilization is high or low.

Low Performance with High CPU Utilization

If performance is low and CPU utilization is high, or if there are performance spikes, there is probably a configuration or indexing issue. Check the following:

  • Hardware limitations.

  • Table/volume layout.

  • Spin count is too high.

  • Lack of proper indexes. This can show up as very high CPU utilization with no other apparent problems except for a high number of processes. Find which columns are being accessed in the operation being performed and ensure that they are properly indexed.

  • Not enough database buffers.

  • Swapping.

  • Kernel parameters too low.

Low Performance with Low CPU Utilization

If performance is low and CPU utilization is low, check for a bottleneck between different system tiers (for example, between the DM and the database).

  • Use the database monitoring tools to analyze the performance of the database system.

  • Use SQL tracing and timing to check for inefficient application code.

  • Check for an under-configured BRM system, which could be one of the following:

    • CM Proxy with a low number of children.

    • DMs with a low number of back ends.

    • System logging level is too high.

Monitor the DM system utilization and Oracle system utilization and tune the number of DM back ends accordingly. A good starting point for DM back-end numbers is eight times the number of processors.

For more information, see "Improving Data Manager and Queue Manager Performance".

Quick Troubleshooting Steps

  • Run quick timing tests by using the testnap utility with op_timing turned on to ping each CM and DM (with the PCM_OP_TEST_LOOPBACK opcode). If the operations are relatively slow, it indicates a problem in the basic configuration.

  • Run the system with a log level of DEBUG on the CM and DM and analyze log files.

  • Check for network collisions and usage data.

  • Check if you have logging (debugging) turned on in the CM. Logging is good for troubleshooting, but it should not be turned on in a production environment because it reduces performance.

  • Performance parameters in pin.conf files should be large enough to handle the load. The most likely problems are in the DM entries.

  • Check if you have enough DM back ends to handle your transaction load.

  • Try putting tables and indexes on different disks.

  • Check the size of redo and rollback logs and database configuration parameters.

  • Send a few kill -USR1 commands to the DMs and CMs that seem to be having problems. This causes them to dump their state to the BRM error log files. Snapshots should be up to 20 minutes apart. These log files may contain information that indicates the nature of the problem.

  • Turn on SQL tracing and analyze query plans. Look for full table scans. Ensure that indexes are on the appropriate columns for the query being run. Especially verify for any customizations.

  • Turn on the timed_statistics parameter. Look for unusually long execution times for SQL commands.

  • Monitor hardware activity:

    • On Linux systems, use vmstat, netstat, and sar.

    • Drill down to the storage device level by using sar with the -d parameter. This should help you find the source of the problem.

      Note:

      If the file systems are configured from logical volumes that are comprised of physical disks, different file systems could be sharing the same underlying disk. It is important to unravel who owns what in order to isolate potential contention (waiting on I/O).

  • Problems such as intermittent daemon failures can be indicated by core files. Try the following command to locate them:

    % find BRM_home -name core -exec file {} \;
    

    If there are no core files, try turning on maximal debugging. You do not want to do this for very long, especially on a production system, because the log files fill up rapidly.

    % pin_ctl stop cm
    % setenv CMAP_DEBUG to 0x1331f3
    % setenv CM_DEBUG to 0x0001
    % setenv cm_loglevel to 3
    % pin_ctl start cm

    System level tracing can also be useful:

    # ps -ef | grep cm
    # truss -p cm_pid