Improving BRM Performance

24 Improving BRM Performance

Learn how to improve performance in your Oracle Communications Billing and Revenue Management (BRM) system.

Topics in this document:

BRM Account and Rating Performance Considerations

Certain aspects of basic BRM functionality can affect performance; other aspects have no effect:

The number of accounts does not affect performance.
There is no performance difference when using different payment methods, such as invoice or credit card.
BRM client applications, such as Billing Care, have little impact on system performance.
Cycle events are rated faster than usage events.
Performance decreases when accounts own a large number of charge offers.
Performance may decrease when the database contains a large number of ratable usage metrics (RUMs). In addition to computing the RUM for the specific event, BRM computes the RUMs for all base types of that event. For example, if you configured a RUM for the /a/b/c/d event, BRM also computes RUMs configured for the /a/b/c and /a/b events. You can increase performance by removing unused RUMs from your database.

About Benchmarking

To determine the best possible performance for your system, you need to identify the desired transaction capacity at each tier and ensure that the system handles several times that capacity. The maximum capacity threshold or the transaction capacity threshold can be determined by running benchmark scenarios.

The primary goal is to achieve full utilization of the database system. This is best accomplished by measuring system performance as the following operations are carried out:

Increase the load to get maximum throughput.
Increase the number of DM back ends to get maximum RDBMS utilization for a given workload.
Increase database utilization for a given number of DM back ends.
Slowly reduce the load to keep the same performance but with faster response time.
Multiple iterations of the above steps.

The general process for benchmarking is:

Create a workload on the system. The best choice to do this is by running your own program. The second best choice is to use the ITM-C workloads.
Measure the results. Most programs need a ramp-up period of 200 seconds to reach a steady-state condition during which actual measurements should take place. Running the program for 10 to 20 minutes should produce the same results as running the program for hours, with the exception of the amount of disk space used. If you run the system for a long time, indexes might become imbalanced and need to be rebuilt, especially before billing.

Use monitoring tools for the systems on which BRM is running to determine system load and identify performance issues. Be sure to turn monitoring tools off for your final test runs. When you start the system, turn on level 3 debugging in all BRM processes and ensure that there are no error messages while running the benchmark programs. When there are no more errors, turn off logging.
Monitor the hardware, operating system, and BRM utilization.

BRM Performance Diagnosis Checklist

When troubleshooting performance, use this checklist to help you look for problems. For more information, see the following:

You can also use this checklist to gather information that Support needs when diagnosing trouble tickets. If you submit a performance issue to technical support, you should also include the following:

All applicable error log files (for example, log files—or portions of log files—for the CM, DM, and client applications).
Operating system settings such as maximum shared memory segment size and number of processes. Provide a full list:
- Solaris: /etc/system
- Linux: /etc/sysctl.conf
- ```
lsattr -E -l sys0
```
Administrative tools for managing systems:
- Solaris: Admintool
- Linux: Webmin and Easilix
The pin.conf files for CMs, DMs, and clients.
For Oracle, the init.ora file.

Describe the Problem

What part of the system is experiencing the problem?
What operation or application is running (for example, billing or credit card processing)?
What is the actual and expected performance?

What appears to be the problem?
What are the error messages?

Describe the Configuration

Provide Support information about the following configurations:

Hardware Configuration

For each system in the configuration:

What is the manufacturer, model, number, and types of CPUs, and amount of RAM?
What is the swap size?

For the database server system:

What is the RAID level?
How many disks, and what is their size?
How are logical volumes configured?

Operating System Configuration

What is the operating system version?
What are the operating system settings for maximum shared memory segment size, number of processes, and so forth?
Which patches have been applied?

BRM Configuration

Which release of BRM are you using?
Which systems do the following components run on? Which systems have multiple components, and which components run on multiple systems? How are the pin.conf files configured?
- CMMP
- CM Proxy
- CM
- DM
- BRM client applications
- Custom applications
- Billing utilities
Which BRM operations are slow?
Which PCM_OPs are those slow operations associated with? (This can be found by using log level 3.)
What is the estimated number of accounts in the database?
What is the average number of charge offers per account?
What is the largest quantity of charge offers owned by one account?
What percentage of accounts use which payment method (for example, credit card or invoice)?
What is the estimated number of events in the database?

Network Configuration

How are the systems connected (for example, 10BaseT or 100BaseT)?
Are separate networks used for each DM database connection?

Database Server Configuration

What are the index and data file sizes?
What are the database hot spots?
What is the disk layout?
What is the assignment of tablespaces to logical devices?
Are disk volumes used?
Are redo logs on their own disk?

Oracle Configuration

What is the Oracle version?
How is the init.ora file configured?

The following init.ora Oracle parameters are particularly important.
- db_block_buffers
- shared_pool_size
- use_aysnc_io
- db_block_size
- max_rollback_segments
- processes
- dml_locks
- log_buffer
Compare how your parameters are configured to those in the example BRM performance configurations.
- Does the SGA roughly equal half the physical RAM?
- What are the sizes and number of rollbacks?
- Is check-pointing or archiving enabled?
- Index and table fragmentation?
- Number of extents, next extent size?
- Run the query select index_name from user_indexes to view indexes. Check the indexes vs. columns in the WHERE clause.
- Which optimizer option is being used (CHOOSE or RULE)?

Describe the Activity

Are there any messages in any error logs (CM, DM, application)?
Are there any operating system or database system error messages?
Are there any bad blocks?
Are you using any nonstandard resources, custom code (especially in the CM), or debugging aids such as writing log records to files that might result in contention or bottlenecks?
Is there enough free swap space?
What is the CPU utilization on servers used for BRM processes?
Database system:
- What are I/Os per disk per second, size of disk queues, disk service time, and percent of time waiting for I/O?
- What is the CPU utilization on the database system?

Troubleshooting Poor Performance

When troubleshooting poor performance, first consider the following:

Under-configured hardware.
Inefficient table layout.
Database bottlenecks.
Inefficient custom application code.
Repeated runtime errors resulting from configuration problems.

In addition, you can look for different problems depending on whether CPU utilization is high or low.

Low Performance with High CPU Utilization

If performance is low and CPU utilization is high, or if there are performance spikes, there is probably a configuration or indexing issue. Check the following:

Hardware limitations.
Table/volume layout.
Spin count is too high.
Lack of proper indexes. This can show up as very high CPU utilization with no other apparent problems except for a high number of processes. Find which columns are being accessed in the operation being performed and ensure that they are properly indexed.
Not enough database buffers.
Swapping.
Kernel parameters too low.

Low Performance with Low CPU Utilization

If performance is low and CPU utilization is low, check for a bottleneck between different system tiers (for example, between the DM and the database).

Use the database monitoring tools to analyze the performance of the database system.
Use SQL tracing and timing to check for inefficient application code.
Check for an under-configured BRM system, which could be one of the following:
- CM Proxy with a low number of children.
- DMs with a low number of back ends.
- System logging level is too high.

Monitor the DM system utilization and Oracle system utilization and tune the number of DM back ends accordingly. A good starting point for DM back-end numbers is eight times the number of processors.

For more information, see "Improving Data Manager and Queue Manager Performance".

Quick Troubleshooting Steps

Run quick timing tests by using the testnap utility with op_timing turned on to ping each CM and DM (with the PCM_OP_TEST_LOOPBACK opcode). If the operations are relatively slow, it indicates a problem in the basic configuration.
Run the system with a log level of DEBUG on the CM and DM and analyze log files.
Check for network collisions and usage data.
Check if you have logging (debugging) turned on in the CM. Logging is good for troubleshooting, but it should not be turned on in a production environment because it reduces performance.
Performance parameters in pin.conf files should be large enough to handle the load. The most likely problems are in the DM entries.
Check if you have enough DM back ends to handle your transaction load.
Try putting tables and indexes on different disks.
Check the size of redo and rollback logs and database configuration parameters.
Send a few kill -USR1 commands to the DMs and CMs that seem to be having problems. This causes them to dump their state to the BRM error log files. Snapshots should be up to 20 minutes apart. These log files may contain information that indicates the nature of the problem.
Turn on SQL tracing and analyze query plans. Look for full table scans. Ensure that indexes are on the appropriate columns for the query being run. Especially verify for any customizations.
Turn on the timed_statistics parameter. Look for unusually long execution times for SQL commands.
Monitor hardware activity:
- On Linux, and Solaris systems, use vmstat, netstat, and sar.
- Drill down to the storage device level by using sar with the -d parameter. This should help you find the source of the problem.
  
  Note:
  
  If the file systems are configured from logical volumes that are comprised of physical disks, different file systems could be sharing the same underlying disk. It is important to unravel who owns what in order to isolate potential contention (waiting on I/O).
Problems such as intermittent daemon failures can be indicated by core files. Try the following command to locate them:
```
% find BRM_home -name core -exec file {} \;
```
If there are no core files, try turning on maximal debugging. You do not want to do this for very long, especially on a production system, because the log files fill up rapidly.
```
% pin_ctl stop cm
% setenv CMAP_DEBUG to 0x1331f3
% setenv CM_DEBUG to 0x0001
% setenv cm_loglevel to 3
% pin_ctl start cm
```
System level tracing can also be useful:
```
# ps -ef | grep cm
# truss -p cm_pid
```