10 Improving IP Service Activator Performance

This chapter explains performance considerations and tuning techniques that can improve the performance of Oracle Communications IP Service Activator in a particular deployment.

Performance Considerations

This section discusses important performance considerations to keep in mind while planning how to deploy IP Service Activator.

Note:

You must regularly monitor system resource usage in order to ensure ongoing system performance and stability.

Table 10-1 highlights various network and hardware parameters and how they affect various IP Service Activator components.

Table 10-1 IP Service Activator Network and Hardware Parameters

Metric Most Influential Component Influential Attributes Other Factors

GUI commit times

GUI workstation

CPU speed

Available RAM, workstation process load

End-to-End response times - single device

Policy Server

CPU speed

Proxy Server CPU speed, Proxy Server RAM

GUI login time

GUI workstation

CPU speed, sufficient RAM

Policy Server CPU speed

OIM commit time

OIM Server

CPU speed, sufficient RAM

Policy Server CPU speed

End-to-End response times - global change (configuration to all devices)

Proxy Server or Network Processor

CPU speed, sufficient RAM

Policy Server CPU speed, Oracle Database Server resources, network speed

Time to restart Proxy Server

Proxy Server

CPU speed, sufficient RAM

Policy Server CPU speed

Time to restart Network Processor

Network Processor

CPU speed, sufficient RAM

Policy Server CPU speed, Oracle Database Server resources, network speed

Time to restart Policy Server

Policy Server

CPU speed, sufficient RAM

Oracle Database Server resources, network speed


Running Troubleshooting Scripts

When the memory requirements of the IP Service Activator components are larger than real memory, responsiveness and throughput drop significantly. The IP Service Activator processes using virtual memory behave as reliably as if residing in real memory, but suffer a significant degradation in performance. The most notable examples are the time required to start the Proxy Server, and the time required to propagate a system-wide (global) configuration change on a Proxy Server or Network Processor.

In the case of the global configuration change, the time required to propagate in a swapping system is up to eight times longer than that of a system residing in real memory. The capacity and efficiency of the network through which the IP Service Activator components are connected will also have an impact on the performance of the IP Service Activator deployment.

When determining how many Proxy Servers and/or Network Processors will be required to effectively support a network, attention must be paid to the memory footprint of the IP Service Activator processes on each server. See Hardware requirements in IP Service Activator Installation Guide for network ranges, and contact Oracle Global Customer Support for assistance.

IP Service Activator can use large amounts of system memory. It is important to be aware of operating system limits regarding the memory footprint of a single process. Under Solaris (versions 8, 9, and 10 - 32 bit applications), the per-process memory limit is 4 GB. This may impact the design of a very large system with respect to the number of Proxy Agents or Network Processors deployed.

Even though the Network Processor instances can be configured to access a database schema different than that of the Policy Server, there is no performance benefit in deploying in this fashion. The Network Processor instances of an IP Service Activator deployment should share the database schema (userid and password) with the Policy Server of that deployment. Failure to do so can result in the two schemas becoming out of synchronization, and may impede newer IP Service Activator features which logically join data from the Network Processor components of the data model with those of the Policy Server.

Tuning Techniques

Careful bench-marking should be conducted to evaluate the relative merit of each tuning adjustment.

Deploying on Solaris

Impacts: Improved IP Service Activator throughput.

Reasoning: The threading libraries in Solaris have improved, which has a positive impact on the performance of both the Policy Server and the Network Processor.

Re-ordering IP Service Activator Component Initiation

Impacts: Reduced system start-up times on systems where the Policy Sever and Network Processor are co-located on the same host.

Reasoning: At IP Service Activator system start-up, the Policy Server will load the Object Model and then 'push' those portions of the model that pertain to each individual Network Processor. This Proxy Push stage is what consumes most of the start-up time, and after a Proxy Push begins, another cannot initiate until the current one has completed.

With this in mind, it is advantageous to ensure that all Network Processor instances in an IP Service Activator deployment have started up, and are ready to receive a Proxy Push, prior to the Policy Server beginning its Proxy Push stage of start-up. This ensures that all Network Processors receive the Proxy Push data simultaneously.

Action: Move the Network Processor start-up command line, in the Component Manager configuration file, such that it is listed above the Policy Server start-up command line (in the same file). The component Manager configuration file can be found at /opt/OracleCommunications/ServiceActivator/Config/cman.cfg. The Network Processor part of this file begins with the phrase #NetworkProcessorEntry and ends with the phrase #End. Simply cut the phrases, and all the content between them, and move it to just above the phrase #PolicyServerEntry. No cman.cfg file editing is required on machines or containers that are only hosting the Network Processor component.

Reducing the Number of Stored Transactions

Impacts: GUI Login, Policy Server Startup, and OM Memory Consumption

Reasoning: Every committed transaction performed by IP Service Activator (either using the GUI or the OIM), is stored in the Object Model, and hence the database. Over a short period of time, this store can grow very large. By reducing the number of saved transactions, the Object Model can be loaded faster as both the GUI and the Policy Server require less memory with which to store their copies of the Object Model.

Action: Modify the Transaction Archive Limit.

From the Tools menu, select Options to access the Options dialog box. In the Transaction archive limit field, specify a new archive limit. Set the value to 200 (default) or less.

See "Setting the Archive Limit for Transactions" for more details.

Reducing the Impact of Discovery Actions on the IP Service Activator GUI

Impacts: The responsiveness of all active clients (GUIs) during a network discovery operation.

Reasoning: When a new object is discovered, the Policy Server sends an Object Model update notification to all active clients (both GUI and OIM). If an active GUI has a Map pane displayed, and the new object(s) resides in that map, the GUI will be unresponsive for up to 60 seconds while the Map is updated.

By working within the detailed or compact device lists for a domain, customer, or VPN objects the Object Model can be updated and the list regenerated, but in a much shorter time (2 to 4 seconds).

The impact on GUI responsiveness not only affects the GUI that launched the discovery operation, but all active GUIs.

Action: Avoid displaying map panes unless there is an operational need to do so. Detailed or compact lists can accommodate most workflow requirements.

Apply Policies at the Highest Granularity Possible

Impacts: GUI performance, transaction propagation time.

Reasoning: All the clients (GUIs and OIM clients) are related to each other through the Policy Server. When one client updates and commits an object model change, the object models on each of the other clients must be updated also. When the object model update occurs, it can manifest itself as the GUI not responding, or appearing to be 'frozen'. Observing the CPU usage of the workstation (using the Task Manager,) will confirm that the GUI is functioning, but that it is consuming CPU cycles as it updates its internal Object Model.

More granular policies reduce the number of devices impacted by a configuration change, and thus reduce the impact on the Policy Server and cartridges.

Comparing the extremes (policy applied to a device versus a domain), the difference in propagation time can be up to two orders of magnitude.

Additionally, policy applied at the device level causes update and commit commands to be set only to the Network Processor that manages the target devices. While policy applied at the domain, customer, or network level causes the same commit commands to be propagated to all Network Processors in the IP Service Activator instance.

The transaction is not complete until all Network Processors have processed the commit commands. Applying policy at higher levels of the hierarchy, than the device, impacts the throughput of the IP Service Activator instance.

Action: In order to mitigate this behavior it is recommended that policies be applied at the device level as much as possible. This will reduce the scope of any object model change requests in the clients and at the cartridges.

Reducing the Component Manager Shutdown Timeout

Impacts: The length of time that the Component Manager will wait, after the shutdown signal has been sent, on devices to shutdown gracefully before killing the processes.

Reasoning: The default component manager shutdown timeout period is 60 seconds (1 minute). That means that the component manager will wait for 16 minutes after first initiating a component shutdown before forcefully shutting down any components that have not done so gracefully. Most components will take far less time to shut down if they are able to do so and it should not take 16 minutes to determine that a component must be shut down forcefully. By reducing this time period, a system can be shut down faster.

Conversely, forcefully shutting down components that are attempting to shut down gracefully will result in program core files being generated.

Action: Leave the default setting unless there is a strict need for a reduced system shutdown time. If faster shutdown is required, reduce the ShutdownTimeout setting using the Configuration GUI tool from 960 seconds to a range between 210 and 480 seconds.

See "Setting the Component Manager Shutdown Timeout Period" for more information.

Increasing the ORB TCP Timeout Threshold

Impacts: Increases the amount of time that the Configuration Management Server can wait for a response from the OSS Integration Manager (OIM) when conducting a device search on a very large IP Service Activator network.

Reasoning: When a network managed by IP Service Activator becomes very large (over 10,000 devices), the OIM may take longer to respond to a Configuration Management requested device search than the ORB default message timeout value. Increasing the ORBTCPReadTimeouts setting will ensure that Configuration Management client receives the proper data from an initiated device search, and not a time-out notification.

Action: edit the Configuration Management server start-up file (startWebLogic.sh), typically found at {$BEA_HOME}/user_projects/domains/CMDomain/bin/ and change the line

JAVA_OPTIONS="${SAVE_JAVAJ_OPTIONS}"

to

JAVA_OPTIONS="${SAVE_JAVAJ_OPTIONS} -Dcom.sun.CORBA.transport.ORBTCPReadTimeouts=1:15000:300:1"

The Configuration Management server must be restarted after this file has been edited in order for the change to take effect.

Avoiding Errors When Disabling and Re-enabling CTM

The action of disabling CTM happens automatically. It is not done manually by issuing commands on cartridges, such as Network Processor managed devices. When CTM is re-enabled, commands are sent back to the device. On some devices and configurations, these are accepted without warnings being raised. For example, on Cisco IOS devices, you can re-enable CTM without any issues. On some device types, including Huawei, the device may report configuration errors when you try to re-enable CTM and existing configuration is resent to it.

To avoid such errors a cleanup script must be run as often as possible.

Additionally, do the following to ensure error free re-enabling of CTM:

  • Ensure a CTM server cleanup of generic policies after successful installation.

  • Optimize the interaction with the router. If the command set is empty (when deleting a configlet generic policy that only performs a login, conf t, exit), then the Network Processor will not perform the device login.

Tuning the Network Processor

The following sections provide information on how to increase Network Processor performance and stability.

Setting optimizeFirstCommit

Impacts: Improved Network Processor restart times.

Reasoning: By default, the Network Processor checks the status and performs configuration work on every device assigned to a particular Network Processor instance, every time the Network Processor is started or restarted. This work is performed even if there have been no changes to a device (as tracked by the Policy Server) since the last shutdown of the Network Processor.

This default behavior can be modified such that only devices that have been the target of service or network changes, since the Network Processor was brought down, will have the device configuration processing applied. This can reduce the start-up workload of the Network Processor by several orders of magnitude and hence reduce the time required to start or restart the Network Processor.

Action: Edit the Network Processor default.properties file (usually found at /opt/OracleCommunications/ServiceActivator/Config/networkProcessor/com/serviceactivator/networkprocessor/default.properties). Change the optimizeFirstCommit processing flag from false to true.

Note:

Prior to IP Service Activator 5.2.3, the optimizeFirstCommit flag was called optimizeCommit.

Increasing the Allocated Memory of the Network Processor

Impacts: Improved Network Processor throughput.

Reasoning: The Network Processor loads and retains portions of device models in memory as it processes the devices it is assigned. The size of the allocated memory can be controlled by altering the size of the Java Virtual Machine (JVM) heap size that is established at Network Processor start-up. A larger heap will allow for more device models to be concurrently stored in memory and thus reduce the number of database fetches that Network Processor must request.

Action: Edit the networkprocessor initiation script (usually found at /opt/Oracle Communications/ServiceActivator/bin/networkprocessor). The default JVM heap size default setting is Xmx2048m (2 GB). The heap size should be large enough to allow for caching and buffering of the managed devices, but not so large as to cause excessive Java garbage collection. Additionally, this heap size should not be set to any value larger than the real memory capacity of the Network Processor host machine.

Increasing the CORBA Message Size for the Network Processor

This parameter is part of the default installation.

Impacts: Improved Network Processor Stability on large networks, when Configuration Management is deployed.

Reasoning: When Configuration Management is deployed, large messages are sometimes passed between the Policy Server and Network Processor relating to device lists and audit file lists. In order to insure that these messages can be sent in full, the parameter ORBgiopMaxMsgSize must be set to its highest level.

Action: Edit the Component Manager configuration script (/opt/OracleCommunications/ServiceActivator/Config/cman.cfg) and add the parameter and value ORBgiopMaxMsgSize 4294967295 to the Network Processor startup command.

Limiting the Growth of Caches Within the Network Processor Buffer

Impacts: Increased device managing capacity of the Network Processor.

Reasoning: The Network Processor has a fixed amount of space, established as the JVM heap size, but some of the caches residing within the heap can be unbounded, or very large. It is important to set a limit on the size of the transaction queuing cache. So, in the event of receiving a very large transaction or group of transactions, the Network Processor and the Policy Server co-ordinate the transfer of the transactions to the Network Processor queue, rather than having the queue being boundless and overflow the JVM heap.

Action: Edit the Network Processor default.properties file (usually found at /opt/OracleCommunications/ServiceActivator/Config/networkProcessor/com/serviceactivator/networkprocessor/default.properties). Add the parameter cacheCapacity processing flag and set the value to 20000000.

Increasing the Processor Thread Limit of the Network Processor

Impacts: Improved Network Processor throughput.

Reasoning: The Network Processor is a multi-threaded application, and as such can take advantage of available CPU-core resources on the host server. By default, a Network Processor instance is configured to utilize two CPU-cores. This can be expanded to utilize all available CPU-cores on the host server. Care must be taken not to configure a Network Processor instance to utilize more CPU-cores than are actually available, as this will impede performance.

Action: Edit the Network Processor default.properties file (usually found at /opt/OracleCommunications/ServiceActivator/Config/networkProcessor/com/serviceactivator/networkprocessor/default.properties). The processing flag to change is the maxThreads flag. Change this setting from 2 to 4 times the actual number of available CPU-cores. For example, if a Network Processor is deployed on a Sun T2000 (1x8 core CPU), and there are no other applications requiring resources on that host machine, the Oracle recommended maxThreads setting would be 32 (4 x 8 CPU-cores).

Additionally, adding the default.properties parameter, maxSchedulerThreads, and setting it to the same level improves Network Processor throughput.