Chapter 6 Tuning for High-Availability

The high-availability database (HADB) used for storing persistent session state must be tuned according to the load of the Application Server. The data volume, transaction frequency, and size of each transaction may affect the performance of the HADB, and consequently the performance of Application Server.

Disk Usage

This section discusses how to calculate HADB data device size and explains the use of separate disks for multiple data devices.

Calculating HADB Data Device Size

When the HADB database is created, you need to specify the number, and size of each data device. These devices must have room for all the user data to be stored. In addition, extra space must be allocated to account for internal overhead as discussed in the following section.

If the database runs out of device space, the HADB returns error codes 4593 or 4592 to the Application Server.

These error messages are also written to the history file(s). In this case, the HADB blocks any client requests to insert, or update data. However, Delete operations are accepted.

In the Application Server context, the HADB is used for storing session states as binary data. The session state is serialized, and stored as a BLOB (binary large object). Each BLOB is split into chunks of approximately, 7KB each. Each chunk is stored as a database row (context row is synonymous with tuple, or record). Database rows are stored in pages, of size 16KB.

There is a small internal overhead for each row stored (approximately 30 bytes for the Application Server). With the most compact allocation of rows (BLOB chunks) two rows can be stored in a page. Internal fragmentation in the B-tree storage structure may result in only one row per page. On average, 50% of each page contains user data.

For availability in case of node failure, HADB always replicates user data. The HADB node stores its own data, plus a copy of the data from its mirror node. Hence, all data is stored twice. Since 50% of the space on a node is user data (on average), and each node is mirrored, the data devices must have space for at least four times the volume of the user data.

In the case of data refragmentation, HADB keeps both the old and the new versions of a table while the refragmentation operation is running. All application requests are performed on the old table, while the new table is being created. Assuming that the database is primarily used for one huge table containing BLOB data for session states, this means we must multiply the device space requirement with another factor of two. Consequently, if we want to be able to add nodes to a running database, and refragment the data to use all nodes, we actually need eight times the volume of user data available.


Note	See the Application Server Error Message Reference for more information on these error messages.

In addition to the factors described above, you must also account for the device space that the HADB reserves for its internal use. (Four times that of the LogBufferSize). This space on the disk is used for temporary storage of the log buffer (see hadbm deviceinfo --details) during high load conditions.

Tuning DataDeviceSize

The hadbm restarts all the nodes, one by one, for the change to take effect. For more information on using this command, see the Application Server Administrator's Guide, “Configuring the High-Availability Database” chapter.

Placing HADB files on Physical Disks

For best performance, data devices should be allocated on separate physical disks. This applies if there are nodes with more than one data device, or if there are multiple nodes on the same host.

Placing devices belonging to different nodes on different devices is strongly recommended, in general. But it is especially recommended for Red Hat AS 2.1, because the HADB nodes have been observed to wait for asynchronous I/O when the same disk is used for devices that belong to more than one node.

An HADB node writes information, warnings and errors to the history file synchronously, rather than using the asynchronous writes normally associated with output devices. Therefore, any disk waits when writing to the history file affect HADB behavior and performance. This situation is indicated by the following message in the history file:

To avoid this problem, keep the HADB executables and the history files on physical disks other than those keeping the data devices.

Memory Allocation

It is essential to ensure that HADB is allocated sufficient memory, especially when HADB is co-located with other processes.


Note	The current version of hadbm does not add data devices to a running database instance.

The HADB Node Supervisor Process (NSUP) tracks the time elapsed since the last time it did some monitoring work. If that time duration exceeds a specified maximum (2500 ms, by default), NSUP concludes that it was blocked too long and restarts the node. The situation is especially likely arise when there are other processes in the system that compete for memory, which produces extensive swapping and multiple page faults as processes are rescheduled.

Then, when the blocked node restarts, all active transactions on that node are aborted.

If you see that the Application Server throughput slows down and requests abort or time out, make sure that swapping is not the cause. To monitor swapping activity, use this command on Unix systems:

In addition, look for this message in the HADB history files. It is written when the HADB node is restarted, where M is greater than N:

The presence of aborted transactions will be signaled by the error message HADB00224: Transaction timed out or HADB00208: Transaction aborted.

Performance

For best performance, all HADB processes (clu_xxx_srv) should fit in the physical memory. They should not be paged or swapped. The same applies for shared memory segments in use. You can configure the size of some of the shared memory segments. If these segments are too small, performance suffers – user transactions are delayed, or even aborted. If the segments are too large then the physical memory is wasted.

DataBufferPoolSize

HADB stores data on data devices, which are allocated on disks. The data must be in the main memory before it can be processed. The HADB node allocates a portion of shared memory for this purpose. If the allocated database buffer is small compared to the data being processed, a lot of processing capacity is wasted on the disk input/output. In a system with write-intensive operations (for example, session states that are frequently updated) the database buffer must be big enough so that the processing capacity used on the disk input/output does not hamper the request processing.

The database buffer is similar to a cache in a file system. For good performance the cache must be utilized as much as possible, so that there is no need to wait for a disk read operation. For best performance, the entire database contents should fit in the database buffer. In most settings, this is not feasible. You must aim to have the “working set” of the client applications in the buffer. Disk input/output should also be monitored. If HADB performs many disk read operations, this means that the database is low on buffer space. The database buffer is partitioned into blocks of size 16KB. The same block size is used on the disk. HADB schedules multiple blocks for read/write in one input/output operation. HADBM can be used to monitor disk usage:

Table 6-1 hadbm deviceinfo --details
Node No.	TotalSize	Free Size	Usage
0	512	504	1%
1	512	504	1%

Table 6-2 hadbm resourceinfo --databuf
Node No.	Avail	Free	Access	Misses	Copy-on-Write
0	32	0	205910260	8342738	400330
1	32	0	218908192	8642222	403466


TotalSize	Size of device, in megabytes
FreeSize	Free size, in megabytes
Usage	Percent used.
Avail	Size of buffer, in megabytes.
Free	Free size, in megabytes, when the data volume is larger than the buffer. (The entire buffer is used at all times.)
Access	Number of times blocks that have been accessed in the buffer.
Misses	Number of block requests that "missed the cache" that is, the user had to wait for a disk read.
Copy-on-Write	Number of times the block has been modified while it is being written to disk.

For a well tuned system, the number of misses (and hence the number of reads) must be very small compared to the number of writes. The example numbers above show a miss rate of about 4% (200 million access, and 8 million misses). The acceptability of these figures depends on the client application requirements.

Tuning DataBufferPoolSize

LogbufferSize

User requests consist of operations such as inserting, deleting, updating or reading data. The HADB logs all operations which modify the database before executing them. Log records describing the operations are placed in a portion of shared memory, referred to as the (tuple) log buffer. These log records are used for undoing operations in case transactions are aborted, for recovery in case of node crash, and for replication between mirror nodes.

The log records remain in the buffer until they are processed locally, and shipped to the mirror node. The log records are kept until the outcome (commit or abort) of the transaction is certain. If the HADB node runs low on tuple log, the user transactions are delayed, and possibly timed out.

Tuning the Attribute

Begin with the default value. Look for HIGH LOAD informational messages in the history files. All the relevant messages will contain tuple log or simply log, and a description of the internal resource contention that occurred.

Under normal operation the log is reported as 70-80% full. This is because space reclamation is said to be "lazy". HADB requires as much data in the log as possible, in case there is a node crash and recovery needs to be performed.

Table 6-3 hadbm resourceinfo --logbuf
Node No.	Avail	Free Size
0	44	42
1	44	42

Legend


Node No.	The node number.
Avail	Size of buffer, in megabytes.
Free Size	Free size, in megabytes when the data volume is larger than the buffer. The entire buffer is used at all times.

The hadbm restarts all the nodes one by one, for the change to take effect. For more information on using this command, see the Application Server Administrator's Guide, “Configuring the High-Availability Database” chapter.

InternalLogbufferSize

The node internal log (nilog) contains information about physical (as opposed to logical, row level) operations at the local node. For example, it provides information on whether there are disk block allocations and deallocations, and B-tree block splits. This buffer is maintained in shared memory, and is also checked to disk (a separate log device) at regular intervals. The page size of this buffer, and the associated data device is 4096 bytes.

Large BLOBs necessarily allocate many disk blocks, and thus create a high load on the node internal log. This is normally not a problem, since each entry in the nilog is small.

Tuning the Attribute

Begin with the default value. Look out for HIGH LOAD informational messages in the history files. The relevant messages will contain nilog, and a description of the internal resource contention that occurred.

Table 6-4 hadbm resourceinfo --nilogbuf
Node No.	Avail	Free Size
0	11	11
1	11	11

Legend


Node No.	The node number.
Avail	Size of buffer, in megabytes.
Free Size	Free size, in megabytes when the data volume is larger than the buffer. The entire buffer will be used at all times.



Note	If you change the size of the nilog buffer, the associated log device (located in the same directory as the data devices) also changes. The size of the internal log buffer must be equal to the size of the internal log device The command hadbm set InternalLogBufferSize ensures this requirement. It stops a node, increases the InternalLogBufferSize, re initializes the internal log device, and brings up the node. This sequence is performed on all nodes.

NumberOfLocks

Each row level operation requires a lock in the database. Locks are held until transaction commits or rolls back. Locks are set at the row (BLOB chunk) level, which means that a large session state requires many locks. Locks are needed for both primary, and mirror node operations. Hence, a BLOB operation allocates the same number of locks on two HADB nodes.

When a table refragmentation is performed, HADB needs extra lock resources. Thus, ordinary user transactions can only acquire half of the locks allocated.

If the HADB node has no lock objects available, errors are written to the log file. See error codes HADB02080 and HADB02096 in the Application Server Error Message Reference for a description of the error message and action to be taken to correct the error.

To calculate the number of locks

Number of concurrent users that request session data to be stored in HADB (one session record per user)

Maximum size of the BLOB session

Persistence scope (max session data size in case of session/modified session and maximum number of attributes in case of modified session)

Assume that the maximum concurrent users at any time is X, i.e., X session data records are present in the HADB, and the session size (for session/modified session) or attribute size (for modified attribute) is Y. The number of records written to HADB is as follows:

For record operations such as insert, delete, update and read, one lock will be used per record.



Note	Locks are held for both primary records and hot-standby records. Hence, for insert, update and delete operations a transaction will need twice as many locks as the number of records. Read operations need locks only on the primary records. During refragmentation and creation of secondary indices, log records for the involved table are also sent to the fragment replicas being created. In that case, a transaction needs four times as many locks as the number of involved records. (Assuming all queries are for the affected table.)

Summary

Tuning the Attribute

Start with the default value. Look for exceptions with the indicated error codes in the Application Server log files. To get information on allocated locks and locks in use, use the following command:

Remember that under normal operations (no ongoing refragmentation) only half of the locks may be acquired by the client application.

Table 6-5 hadbm resourceinfo --locks
Node No.	Avail	Free	Waits
0	50000	50000	na
1	50000	50000	na

Legend


Node No.	The node number.
Avail	Number of locks available.
Free	Number of locks in use.
Waits	Number of transactions that have waited for a lock. “na” (not applicable) if all locks are available.

Timeouts

JDBC connection pool timeouts


Tip	Timeouts are documented in the DTD files. In particular, see lb.dtd for load balancer timeouts, and server.dtd for server timeouts.

These values govern how much time the server waits for a connection from the pool before it times out. In most cases, the default values work well. For detailed tuning information, see JDBC Connection Pool Tuning.

Load Balancer timeouts

response-timeout-in-seconds: The time for which the load balancer plug-in (lbplugin) hosted in the webserver will wait for a response before it declares an instance dead and fails over to the next instance in the cluster. This value should be large enough to accomadate the maximum latency for a request from the server instance under the worst (high load) conditions.

health checker: interval-in-seconds: Determines how frequently the load balancer pings the instance to see if it is healthy. Default value is 30 seconds. If the response-timeout-in-seconds is optimally tuned, and the server doesn't have too much traffic, then the default value works well.

health checker: timeout-in-seconds: How long the load balancer waits after pinging an instance. Default value is 100 seconds, which is generally fine.

The combination of the health checker’s interval-in-seconds and timeout-in-seconds values determine how much additional traffic goes from the load balancer plug-in (lbplugin) in the web server to the server instances.

HADB timeouts

Operating System Configuration

Semaphores

If the number of semaphores is too low, HADB may fail and the following error message is displayed:

This may occur either while starting the database, or during run time. Since the semaphores are provided as a global resource by the operating system, the configuration depends on all processes running on the host, and not the HADB alone. In Solaris, you can configure the semaphore settings by editing the /etc/system file.

To run the nodes, NNODES (the number of nodes submitted implicitly by --hosts option to the HADB) and NCONNS connections (HADB configuration parameter NumberOfSessions, default value being 100) per host, the following semaphore settings may be used1:

If you plan to run multiple nodes per host, make sure semmap = NNODES. The commands sysinfo and sysdef may be used to inspect the settings.

Shared Memory

Set the maximum shared memory size to the total amount of physical RAM. Additionally, the maximum number of shared memory segments per process should be set to at least six to accommodate the HADB processes. The number of system wide, shared memory identifiers, should be set depending on the number of nodes running on the host.

Solaris

In Solaris 9, because of the kernel changes, the hmsys:shminfo_shmseg variable is obsolete. In Solaris 8, add the following settings to the /etc/system file:

set shmsys:shminfo_shmmax = 0xffffffff
set shmsys:shminfo_shmseg = <default=6>
set shmsys:shminfo_shmmni = <default=100> + (6 * NNODES)

Linux


Note	The host must be rebooted after changing these settings.

echo 536870912 > /proc/sys/kernel/shmmax
echo 536870912 > /proc/sys/kernel/shmall

Where the file shmmax contains the maximum size of a single shared memory segment, and shmall contains the total shared memory to be made available.

This value is large enough for a standard HADB node that uses default values. If you change the default values, you should consider changing these values, as well.

To make these changes permanent, add those lines to /etc/rc.local on your Linux machine. (In modern Redhat versions of Linux, you can also modify sysctl.conf to set the kernel parameters.)

Tuning the Application Server for High-Availability

This section discusses how you can configure the high availability features of Application Server for your application.

Configuring and Tuning the Application Server

To ensure highly available web applications with persistent session data, Application Server provides a backend store to save the HttpSession data. As such there is a overhead involved in saving and reading the data back from HADB. Understanding, the different schemes of session persistence and its impact on performance and availability will help you to make an appropriate decision in configuring Application Server for deployment.

It is recommended that a 1:2 ratio be maintained between the application server instances and HADB nodes. For every application server instance you need to have two HADB nodes.

Implications of Persistence Frequency on HTTP Session Performance

Application Server provides HTTP session persistence and thus failover by writing the session data to HADB. The frequency at which this is written to HADB is controlled and configured by specifying the persistence frequency to one of the following values:

Specify the persistence frequency in the server.xml in the following way. This entry is available under the following tree:

Assuming that all the other parameters related to the server and the application remain same, time-based provides better performance than web-method. Availability is less in time-based when compared with web-method. This is because the session state is written to the persistent store (HADB) at the time interval that is configured via reapIntervaSecondsl (default is 60 sec). If the server instance fails within that interval, the updates made to the session state from the last time the session information is written to HADB are lost.

web-method as the Persistence Frequency

In cases where web-method is specified as the persistence frequency mechanism, even before the server response to the client request is sent, the server writes the current HTTP session state to HADB. Depending on the size of the data that needs to be persisted, the response time varies. This mode of persistence frequency is recommended for applications where availability is very critical and some performance degradation is acceptable. For more details on web-method as the persistence frequency, see the “Configuring Session Persistence” chapter in the Application Server Administrator’s Guide.

time-based as the persistence Frequency

In cases where time-based session is specified as the persistence frequency mechanism, the session persistence happens at the value specified in reapIntervalSeconds. By default, the value for reapIntervalSeconds is set to 60 seconds. There is a thread with configurable reapInterval at which time the thread wakes up and iterates over valid sessions in memory and saves the session data for the remaining valid sessions. This mode of persistence frequency is recommended when performance is critical and availability is not. The response to clients are not held back by performing the Save operation to the HADB. The default value can be changed under the <manager-properties> tag in the following way.

Summary

The supported schemes for persistence frequency are web-method and time-based. In terms of performance ranking, time-based performs better than web-method. If you are willing to trade availability during reapIntervalSeconds, it is recommended that you use time-based scheme as the persistence-frequency.

Implications of Persistence Scope on HTTP Session Performance

The Application Server allows the deployer to specify the scope of the persistence in addition to persistence frequency.

session as Persistence Scope

When session is specified as the persistence scope, the entire session data regardless of whether it is modified or not is written to the HADB. This ensures that the data in the backend store is always current. There is degradation of performance since all the session data needs to be persisted for every request.

modified-session as the persistence scope

When modified-session is specified as the persistence frequency mechanism, the server examines the state of the HTTP session. If and only if the data appears to be modified, the session persistence to HADB occurs. The modified-session is better than session because calls to HADB to persist data happens only when the session is modified.

modified-attribute as the persistence scope

The modified-attribute mode of persistence scope is useful as there are no cross references for the attributes and the application uses setAttribute and getAttribute semantics to manipulate HTTP session data.

Applications written this way can take advantage of this to obtain better performance.

For detailed description of different persistence scopes, see the “Configuring Session Persistence” chapter in the Application Server Administrator’s Guide.

Impact of checkpointing on Stateful Session Bean performance

Checkpointing ensures that a stateful bean state is saved into a highly available persistent store (HADB) so that in the event of failure of the server instance that is servicing requests using this SFSB can be failed over to another available instance in the cluster and the bean state can be recovered.

The size of the data that is being checkpointed and the frequency at which checkpointing happens determines the additonal overhead in respone time for a given client interaction.

From a performance point of view, checkpointing should be explicitly specified for only those methods that alter the bean state significantly, by adding the <checkpointed-methods> tag in the sun-ejb-jar.xml file. For more details, refer to the Developer’s Guide to Enterprise JavaBeans Technology.

Configuring the JDBC Connection Pool

The Application Server relies on the JDBC connectivity to the backend persistent store (HADB) to store and retrieve the HTTP session data. Hence, it is imperative that the connection pool be configured in the best possible way to ensure that all Read-Write operations from the HADB occur faster.

For more information on configuring the JDBC connection pool, see JDBC Connection Pool Tuning. A more detailed discussion can be found in the Application Server, Administrator’s Guide, chapter, “Monitoring the Application Server.”

<jdbc-connection-pool steady-pool-size="8" max-pool-size="16" max-wait-time-in-millis="60000" pool-resize-quantity="2" idle-timeout-in-seconds="300" is-isolation-level-guaranteed="true" is-connection-validation-required="true" connection-validation-method="meta-data" fail-all-connections="false" datasource-classname="com.sun.hadb.jdbc.ds.HadbDataSource" name="CluJDBC" transaction-isolation-level="repeatable-read">

For optimal performance, it is recommended that the pool contain eight to 16 connections per node. For example, if you have four nodes configured, then the steady-pool size must be set to 32 and the max-pool-size must be 64.


is-connection-validation	True
connection-validation-method	meta-data
cacheDatabaseMetaData	False
eliminateRedundantEndTransaction	True
transaction-isolation-level	repeatable-read

By default, the connection pool is configured as follows. For more information on configuring the JDBC connection pool, see JDBC Connection Pool Tuning. A more detailed discussion can be found in the Application Server Administrator’s Guide, chapter, “Monitoring the Application Server.”

Impact of Session Size on Performance

It is critical to be aware of the impact of HTTP session size on performance. Performance has an inverse relationship with the size of the session data that needs to be persisted. Session data is stored in HADB in a serialized manner. There is an overhead in serializing the data and inserting it as a BLOB and also deserializing it for retrieval.

In our tests we have observed that for upto 24KB session size, the performance remains unchanged. When the session size reaches 100KB and above, and if the same back end store is used for the same number of connections, then throughput drops by 90%.


Note	The values of idle-timeout-in-seconds and also pool-resize-quantity must be adjusted based on the monitoring statistics.

Pay sufficient attention while determining the HTTP session size. If you are creating large HTTP session objects, calculate the HADB nodes as discussed in Tuning HADB.

Load Balancer Configuration

Application Server provides a load balancer plugin for the Sun ONE Web Server that can balance the load of requests among multiple instances which are part of the cluster. For more information on configuring the load balancer see the Application Server Administrator’s Guide, “Configuring Load Balancer.”

Before you tune the parameters in the load balancer configuration file (loadbalancer.xml), see the Web Server Performance and Tuning Guide for information on how to tune the web server for best results.


Note	It is assumed in the following section that the web server is tuned effectively to service the incoming requests.

The load balancer periodically checks all the configured Application Server instances that are marked as unhealthy, based on the values specified in the health-checker element. Enabling the health-checker is optional. If the health-checker is not enabled, periodic health check of unhealthy instances is not performed.

The load balancer’s health check mechanism communicates with the application server instance using HTTP. The health checker sends an HTTP request to the URL specified and waits for a response. The status code in the HTTP response header should be between 100 and 500 to consider the instance to be healthy.

To enable the health checker, edit the following properties in the loadbalancer.xml file:

url

Specifies the listener’s URL that the load balancer checks to determine its state of health.

interval-in-seconds

Specifies the interval at which health checks of instances occur. The default is 30 seconds.

timeout-in-seconds

Specifies the timeout interval within which a response must be obtained for a listener to be considered healthy. The default is 10 seconds.

If the typical response from the server takes n number of seconds and under peak load takes m number of seconds, then it is suggested to configure the health checker as shown below:

<health-checker url="http://hostname.domain:port" interval-in-seconds="n" timeout-in-seconds="m+n"/>

For more information, see the “Configuring Load Balancer” chapter in the Application Server Administrator’s Guide.

1Default values are provided for Solaris 8. The HADB resource requirements should be added to the previous value of the variables regardless of whether these are the default values or not.

Previous Contents Index Next
Sun Java System Application Server Standard and Enterprise Edition 7 2004Q2 Performance and Tuning Guide