Tuning a BEA Tuxedo ATMI Application

The MSSQ scheme offers additional load balancing in BEA Tuxedo ATMI environments. One queue is accommodated by several servers offering identical services at all times. If the server queue to which a request is sent is part of an MSSQ set, the message is dequeued to the first available server. Thus load balancing is provided at the individual queue level.

When a server is part of an MSSQ set, it must be configured with its own reply queue. When the server makes requests to other servers, the replies must be returned to the original requesting server; they must not be dequeued by other servers in the MSSQ set.

You can configure MSSQ sets to be dynamic so they automatically spawn and eliminate servers based upon a queue load.

You Should Use MSSQ Sets If . . .	You Should Not Use MSSQ Sets If . . .
You have between 2 and 12 servers.	There are many servers. (A compromise is to use many MSSQ sets.)
Buffer sizes are not too large, that is, large enough to exhaust a queue.	Buffer sizes are large enough to exhaust one queue.
All servers offer identical sets of services.	Each server offers different services.
Messages are relatively small.	Large messages are being passed to the services, causing the queue to be exhausted. When a queue is exhausted, either nonblocking sends fail or blocking sends block.
Optimization and consistency of service turnaround time are paramount.

A situation analogous to the appropriate use of MSSQ sets can be found in a bank at which several tellers performing identical services handle a single line of customers. The next available teller always takes the next person in line. In this scenario, each teller must be able to perform all customer services. In a BEA Tuxedo environment, all servers set up to share a single queue must offer an identical set of services at all times. The advantage of MSSQ sets is that they offer a second form of load balancing at the individual queue level.
A supermarket at which different cashiers accept different forms of payment (some accept credit cards, while others accept only cash) is similar to a BEA Tuxedo application in which MSSQ sets should not be used.

How to Enable Load Balancing

To alleviate the performance degradation resulting from heavy system traffic, you may want to implement a load balancing algorithm on your entire application. With load balancing, a load factor is applied to each service within the system, and you can track the total load on every server. Every service request is sent to the qualified server that is least loaded.

Run your application for an extended period of time.
Note the average amount of time it takes for each service to be performed.
In the RESOURCES section of the configuration file:

Set LDBAL to Y.
Assign a LOAD value of 50 (LOAD=50) to any service that takes approximately the average amount of time.
For any service taking longer than the average amount of time, set LOAD>50; for any service taking less than the average amount of time, set LOAD<50.

How to Measure Service Performance Time

Administratively—in the configuration file, you can arrange to have a log of services that are performed to be written to standard error. In the SERVICES section, specify:

servopts -r

To analyze the information in the log, run the txrpt(1) command.

For details about servopts(5) and txrpt(1), see the File Formats, Data Descriptions, MIBs, and System Processes Reference and BEA Tuxedo Command Reference, respectively.

Programmatically—insert a call to time() at the beginning and end of a service routine. Services that take the longest time receive the highest load; those that take the shortest time receive the lowest load. (For details about time(), see the documentation for your C language libraries.)

How to Assign Priorities to Interfaces or Services

Assigning priorities enables you to exert significant control over the flow of data in an application, provide faster service to the most important requests, and provide slower service to the less important requests. You can also give priority to specific users—at all times or in specific circumstances.

Administratively—in the SERVICES section of the configuration file, specify the PRIO parameter for each service named.
Programmatically—add calls to the tpsprio() function to the appropriate client and server applications, to allow designated clients and servers to change a priority dynamically. Only preferred clients should be able to increase the service priority. In a system on which servers perform service requests, the server can call tpsprio() to increase the priority of its interface or service calls so the user does not wait in line for every interface or service request that is required.

Example of Using Priorities

Server 1 offers Interfaces A, B, and C. Interfaces A and B have a priority of 50; Interface C, a priority of 70. An interface requested for C is always dequeued before a request for A or B. Requests for A and B are dequeued equally with respect to one another. The system dequeues every tenth request in first-in, first-out (FIFO) order to prevent a message from waiting indefinitely on the queue.

Using the PRIO Parameter to Enhance Performance

The PRIO parameter determines the priority of an interface or a service on a server's queue. It should be used cautiously. Once priorities are assigned, it may take longer for some messages to be dequeued. Depending on the order of messages on the queue (for example, A, B, and C), some (such as A and B) are dequeued only one in ten times when there are more than 10 requests for C. This means reduced performance and potential slow turnaround time for some services.

When you are deciding whether to use the PRIO parameter, keep the following implications in mind:

Bundling Services into Servers

The easiest way to package services into servers is to avoid packaging them at all. Unfortunately, if you do not package services, the number of servers, message queues, and semaphores rises beyond an acceptable level. Thus there is a trade-off between no bundling and too much bundling.

When to Bundle Services

We recommend that you bundle services if you have one of the situations or requirements described in the following list.

Functional similarity—if multiple services play a similar role in the application, you can bundle them in the same server. The application can offer all or none of them at a given time. In the bankapp application, for example, the WITHDRAW, DEPOSIT, and INQUIRY services are all operations that can be grouped together in a "bank teller operations" server. Administration of services is simplified when functionally similar services are bundled.
Similar libraries—less disk space is required if you bundle services that use the same libraries. For example, if you have three services that use the same 100K library and three services that use different 100K libraries, bundling the first three services saves 200K. Functionally equivalent services often use similar libraries.
Filling the queue—bundle only as many services into a server as the queue can handle. Each service added to an unfilled MSSQ set may add relatively little to the size of an executable, and nothing to the number of queues in the system. Once the queue is filled, however, system performance is degraded and you must create more executables to compensate.

Do not put two or more services that call each other, that is, call-dependent services, in the same server. If you do so, the server issues a call to itself, causing a deadlock.

Enhancing Overall System Performance

The following performance enhancement controls can be applied to BEA Tuxedo release 8.0 or later.

Service and Interface Caching

BEA Tuxedo release 8.0 or later allows you to cache service and interface entries, and to use the cached copies of the service or interface without locking the bulletin board. This feature represents a significant performance improvement, especially in systems with large numbers of clients and only a few services.

The SICACHEENTRIESMAX option has been added to the MACHINES and SERVERS sections of the configuration file to allow you to define the maximum number of service cache entries that any process and/or server can hold.

Since caching may not be useful for every client or every application, the TMSICACHEENTRIESMAX environment variable has been added to control the cache size. The default value for TMSICACHEENTRIESMAX is preconfigured so that no administrative changes are necessary when upgrading from previous releases. TMSICACHEENTRIESMAX can also control the number of cache entries, since it is not desirable for clients to grow too large.

Service Caching Limitations

Removing Authorization and Auditing Security

For BEA Tuxedo release 7.1, the AAA (authentication, authorization, and auditing) security features were added so that implementations using the AAA plug-in functions would not need to base security on the BEA Tuxedo administrative option. As a result, the BEA Engine AAA security functions are always called in the main BEA Tuxedo 7.1 code path. Since many applications do not use security, they should not pay the overhead price of these BEA Engine security calls.

For BEA Tuxedo release 8.0 or later, the NO_AA option has been added to the OPTIONS parameter in the RESOURCES section of the configuration file. The NO_AA option will circumvent the calling of the authorization and auditing security functions. Since most applications need authentication, this feature cannot be turned off.

If the NO_AA option is enabled, the following SECURITY parameters may be affected:

The parameters NONE, APP_PW, and USER_AUTH parameters will continue to work properly—except that no authorization or auditing will be done.
The ACL and MANDATORY_ACL parameters will continue to work properly, but will only use the default BEA security mechanism.

Using the Multithreaded Bridge

Because only one Bridge process is running per host machine in a multiple machine Tuxedo domain, all traffic from a host machine passes through a single Bridge process to all other host machines in the domain. The Bridge process supports both single-threaded and multithreaded execution capabilities. The availability of multithreaded Bridge processing improves the data throughput potential. To enable multithreaded Bridge processing, you can configure the BRTHREADS parameter in the MACHINES section of the UBBCONFIG file.

Setting BRTHREADS=Y configures the Bridge process for multithreaded execution. Setting BRTHREADS=N or accepting the default N, configures the Bridge process for single-threaded execution.

Configurations with BRTHREADS=Y on the local machine and BRTHREADS=N on the remote machine are allowed, but the thoughput between the machines will not be greater than that for the single-threaded Bridge process.

Setting BRTHREADS=Y makes sense only if a machine has multiple CPUs; however, having multiple CPUs id not a prerequisite for setting BRTHREADS=Y.
If the MODEL parameter in the RESOURCES section of the UBBCONFIG file is set to SHM, the BRTHREADS parameter has no effect and is ignored.
If BRTHREADS=Y and the Bridge environment contains TMNOTHREADS=Y, the Bridge starts up in threaded mode and logs a warning message. Basically, BRTHREADS overrides TMNOTHREADS and the warning message states that the Bridge is ignoring the TMNOTHREADS setting.

Turning Off Multithreaded Processing

BEA Tuxedo has a generalized threading feature. Due to the generality of the architecture, all ATMI calls must call mutexing functions in order to protect sensitive state information. Furthermore, the layering of the engine and caching schemes used in the libraries cause more mutexing. For applications that do not use threads, turning them off can result in significant performance improvements without making changes to the application code.

To turn off multithreaded processing use the TMNOTHREADS environment variable. With this setting, individual processes can turn threads on and off without introducing a new API or flag in order to do so.

Turning Off XA Transactions

Although not all BEA Tuxedo applications use XA transactions, all processes pay the cost of transactional semantics by calling internal transactional verbs. To boost performance for applications that don't use XA transactions for BEA Tuxedo release 8.0 or later, the NO_XA flag has been has been added to the OPTIONS parameter in the RESOURCES section of the configuration file.

No XA transactions are allowed when the NO_XA flag is set. It is important to remember though, that any attempt to configure TMS services in the GROUPS section will fail if the NO_XA option has been specified.

Determining Your System IPC Requirements

The IPC requirements for your system are determined by the values of several system parameters:

You can use the tmboot -c command to display the minimum IPC requirements of your configuration.

Table 8-1 Parameters for Tuning IPC Resources
Parameter(s)	Description
`MAXACCESSSERS`	Equals the number of semaphores. Number of message queues is almost equal to `MAXACCESSERS` + number of servers with reply queues (number of servers in `MSSQ` set * number of `MSSQ` sets).
`MAXSERVERS`, `MAXSERVICES`, and `MAXGTT`	While `MAXSERVERS`, `MAXSERVICES`, `MAXGTT`, and the overall size of the `ROUTING`, `GROUP`, and `NETWORK` sections affect the size of shared memory, an attempt to devise formulas that correlate these parameters can become complex. Instead, simply run `tmboot -c` or `tmloadcf` `-c` to calculate the minimum IPC resource requirements for your application.
Queue-related kernel parameters	Need to be tuned to manage the flow of buffer traffic between clients and servers. The maximum total size (in bytes) of a queue must be large enough to handle the largest message in the application. A typical queue is not more than 75 to 85 percent full. Using a smaller percentage of a queue is wasteful; using a larger percentage causes message sends to block too frequently. Set the maximum size for a message to handle the largest buffer that the application sends. The maximum queue length (the largest number of messages that are allowed to sit on a queue at once) must be adequate for the application's operations. Simulate or run the application to measure the average fullness of a queue or its average length. This process may require a lot of trial and error; you may need to estimate values for your tunables before running the application, and then adjust them after running under performance analysis. For a large system, analyze the effects of parameter settings on the size of the operating system kernel. If they are unacceptable, reduce the number of application processes or distribute the application across more machines to reduce `MAXACCESSERS`.

Tuning IPC Parameters

The following application parameters enable you to enhance the efficiency of your system:

MAXACCESSERS, MAXSERVERS, MAXINTERFACES, and MAXSERVICES
MAXGTT, MAXBUFTYPE, and MAXBUFSTYPE
SANITYSCAN, BLOCKTIME, and individual transaction timeouts
BBLQUERY and DBBLWAIT

Setting the MAXACCESSERS, MAXSERVERS, MAXINTERFACES, and MAXSERVICES Parameters

The MAXACCESSERS, MAXSERVERS, MAXINTERFACES, and MAXSERVICES parameters increase semaphore and shared memory costs, so you should carefully weigh these costs against the expected benefits before using these parameters, and choose the values that best satisfy the needs of your system. You should take into account any increased resources your system may require for a potential migration. You should also allow for variation in the number of clients accessing the system simultaneously. Defaults may be appropriate for a generous allocation of IPC resources; however, it is prudent to set these parameters to the lowest appropriate values for the application.

Setting the MAXGTT, MAXBUFTYPE, and MAXBUFSTYPE Parameters

To determine whether the default is adequate for your application, multiply the number of clients in the system times the percentage of time they are committing a transaction. If the product of this multiplication is close to 100, you should increase the value of the MAXGTT parameter. As a result of increasing MAXGTT:

To limit the number of buffer types and subtypes allowed in the application, set the MAXBUFTYPE and MAXBUFSTYPE parameters, respectively. The current default for MAXBUFTYPE is 16. If you plan to create eight or more user-defined buffer types, you should set MAXBUFTYPE to a higher value. Otherwise, you do not need to specify this parameter; the default value is used.

The current default for MAXBUFSTYPE is 32. You may want to set this parameter to a higher value if you intend to use many different VIEW subtypes.

Tuning with the SANITYSCAN, BLOCKTIME, BBLQUERY, and DBBLWAIT Parameters

If a system is running on slow processors (for example, due to heavy usage), you can increase the timing parameters: SANITYCAN, BLOCKTIME, and individual transaction timeouts.

If networking is slow, you can increase the value of the BLOCKTIME, BBLQUERY, and DBBLWAIT parameters.

Recommended Values for Tuning-related Parameters

In the following table are recommended values for the parameters available for tuning an application.

Use These Parameters . . .	To . . .
`MAXACCESSERS`, `MAXSERVERS`, `MAXINTERFACES`, and `MAXSERVICES`	Set the smallest satisfactory value because of IPC cost. (Allow for extra clients.)
`MAXGTT`, `MAXBUFTYPE`, and `MAXBUFSTYPE`	Increase MAXGTT for many clients; set MAXGTT to 0 for nontransactional applications. Use MAXBUFTYPE only if you create eight or more user-defined buffer types. Increase the value of MAXBUFSTYPE if you use many different `VIEW` subtypes.
`BLOCKTIME`, `TRANTIME`, and `SANITYSCAN`	Increase the values if the system is slow.
`BLOCKTIME`, `TRANTIME`, `BBLQUERY`, and `DBBLWAIT`	Increase the values if networking is slow.

Measuring System Traffic

As on any road that supports a lot of traffic, bottlenecks can occur in your system. On a highway, cars can be counted with a cable strung across the road, that causes a counter to be incremented each time a car drives over it.

You can use a similar method to measure service traffic. For example, when a server is started (that is, when tpsvrinit() is invoked), you can initialize a global counter and record a starting time. Subsequently, each time a particular service is called, the counter is incremented. When the server is shut down (through the tpsvrdone() function), the final count and the ending time are recorded. This mechanism allows you to determine how busy a particular service is over a specified period of time.

In the BEA Tuxedo system, bottlenecks can originate from problematic data flow patterns. The quickest way to detect bottlenecks is to measure the amount of time required by relevant services from the client's point of view.

Example of Detecting a System Bottleneck

Client 1 requires 4 seconds to display the results. Calls to time() determine that the tpcall to service A is the culprit with a 3.7-second delay. Service A is monitored at the top and bottom and takes 0.5 seconds. This finding implies that a queue may be clogged, a situation that can be verified by running the pq command in tmadmin.

On the other hand, suppose service A takes 3.2 seconds. The individual parts of service A can be bracketed and measured. Perhaps service A issues a tpcall to service B, which requires 2.8 seconds. Knowing this, you should then be able to isolate queue time or message send blocking time. Once the relevant amount of time has been identified, the application can be retuned to handle the traffic.

Detecting Bottlenecks on UNIX Platforms

The UNIX system sar(1) command provides valuable performance information that can be used to find system bottlenecks. You can run sar(1) to do the following:

Use This Option...	To...
`-u`	Gather CPU utilization numbers, including percentages of time during which the system: runs in user mode, runs in system mode, remains idle with some process waiting for block I/O, and otherwise remains idle.
`-b`	Report buffer activity, including number of data transfers, per second, between system buffers and disk (or other block devices).
`-c`	Report activity of system calls of all types, as well as specific system calls, such as `fork`(2) and `exec`(2).
`-w`	Monitor system swapping activity, including the number of transfers for swapins and swapouts.
`-q`	Report average queue lengths while queues are occupied, and the percentage of time they are occupied.
`-m`	Report message and system semaphore activities, including the number of primitives per second.
`-p`	Report paging activity, including the number of address translation page faults, page faults and protection errors, and valid pages reclaimed for free lists.
`-r`	Report the number of unused memory pages and disk blocks, including the average number of pages available to user processes and disk blocks available for process swapping.

Detecting Bottlenecks on Windows Platforms

On Windows platforms, you can use the Performance Monitor to collect system information and detect bottlenecks. To open the Performance Monitor, select the following options from the Start menu:

Administering a BEA Tuxedo Application at Run Time