Sun Java System Application Server 9.1 High Availability Administration Guide

Monitoring HADB

You can monitor the activities of HADB by:

These sections briefly describe the hadbm status, hadbm deviceinfo, and hadbm resourceinfo commands. For information on interpreting HADB information, see Performance in Sun Java System Application Server 9.1 Performance Tuning Guide.

Getting the Status of HADB

Use the hadbm status command to display the status of the database or its nodes. The command syntax is:

hadbm status  
[--nodes]  
[--adminpassword=password | --adminpasswordfile=file]  
[--agent=maurl] 
[dbname]

The dbname operand specifies the database name. The default is hadb.

The --nodes option (short form -n) displays information on each node in the database. For more information , see Node Status. See General Options for a description of other command options.

For more information, see hadbm-status(1).

Example 3–15 Example of getting HADB status

For example:

hadbm status --nodes

Database States

A database’s state summarizes its current condition. The following table describes the possible database states.

Table 3–14 HADB States


Database State	Description
High-Availability Fault Tolerant (HAFaultTolerant)	Database is fault tolerant and has at least one spare node on each DRU.
Fault Tolerant	All the mirrored node pairs are up and running.
Operational	At least one node in each mirrored node pair is running.
Non Operational	One or more mirrored node pairs is missing both nodes. If the database is non-operational, clear the database as described in Clearing a database.
Stopped	No nodes are running in the database.
Unknown	Cannot determine the state of the database.

Node Status

Use the --nodes option to make the hadbm status command display the following information for each node in the database:

Node number
Name of the machine where the node is running
Port number of the node
Role of the node. For a list of roles and their meanings, see Roles of a Node
State of the node. For a list of states and their meanings, see States of a Node
Number of the corresponding mirror node.

A node’s role and state can change as described in these sections:

Roles of a Node

A node is assigned a role during its creation and can take any one of these roles:

Active: Stores data and allows client access. Active nodes are in mirrored pairs.
Spare: Allows client access, but does not store data. After initializing data devices, monitors other data nodes to initiate repair if another node becomes unavailable.
Offline: Provide no services until their role changes. When placed back online, its role can change back to its former role.
Shutdown: An intermediate step between active and offline, waiting for a spare node to take over its functioning. After the spare node has taken over, the node is taken offline.

States of a Node

A node can be in any one of the following states:

Starting: The node is starting.
Waiting: The node cannot decide its start level and is offline. If a single node is in this state for more than two minutes, stop the node and then start it at the repair level; see Stopping a Node and Starting a Node Clearing a database.
Running: The node is providing all services that are appropriate for its role.
Stopping: The node is in the process of stopping.
Stopped: The node is inactive. Repair of a stopped node is prohibited.
Recovering: The node is being recovered. When a node fails, the mirror node takes over the functions of the failed node. The failed node tries to recover by using the data and log records in main memory or on disk. The failed node uses the log records from the mirror node to catch up with the transactions performed when it was down. If recovery is successful, the node becomes active. If recovery fails, the node state changes to repairing.
Repairing: The node is being repaired. This operation reinitializes the node and copies the data and log records from the mirror node. Repair is more time consuming than recovery.

Getting Device Information

Monitor free space in HADB data (disk storage) devices:

Routinely, to check the trend in disk space use.
As part of preventive maintenance: if the user load has increased and you want to resize or scale the database configuration.
As part of scaling up the database: Before running hadbm addnodes to add new nodes to the system, check whether there is enough device space. Remember, you need around 40-50% free space on the existing nodes to add nodes.
When you see messages in the history files and server.log file such as
- No free blocks on data devices
- No unreserved blocks on data devices .

Use the hadbm deviceinfo command to get information about free space in data devices. This command displays the following information for each node of the database:

Total device size allocated, in MB (Totalsize).
Free space in MB (Freesize).
Percent of device currently being used (Usage)

The command syntax is:

hadbm deviceinfo  [--details]  
[--adminpassword=password | --adminpasswordfile=file]  
[--agent=maurl]  [dbname]

The dbname operand specifies the database name. The default is hadb.

The --details option displays the following additional information:

Number of read operations by the device.
Number of write operations by the device.
Name of the device.

See General Options for a description of other command options.

For more information, see hadbm-deviceinfo(1).

To determine the space available for user data, take the total device size, then subtract the space reserved for HADB: four times the LogBufferSize + 1% of the device size. If you do not know the size of the log buffer, use the command hadbm get logbufferSize. For example, if the total device size is 128 MB and the LogBufferSize is 24 MB, the space available for user data is 128 – (4 x 24) = 32 MB. Of the 32 MB, half is used for replicated data and around one percent is used for the indices, and only 25 percent is available for the real user data.

The space available for user data is the difference between the total size and reserved size. If the data is refragmented in the future, the free size must be approximately equal to 50% of the space available for user data. If refragmentation is not relevant, the data devices can be exploited to their maximum. Resource consumption warnings are written to the history files when the system is running short on device space.

For more information about tuning HADB, see the Sun Java System Application Server Performance Tuning Guide.

Example 3–16 Example of getting device information

The following command:

hadbm deviceinfo --details

Displays the following example results:

NodeNO Totalsize Freesize Usage NReads NWrites DeviceName
0      128       120      6%    10000  5000    C:\Sun\SUNWhadb\hadb.data.0
1      128       124      3%    10000  5000    C:\Sun\SUNWhadb\hadb.data.1
2      128       126      2%     9500  4500    C:\Sun\SUNWhadb\hadb.data.2
3      128       126      2%     9500  4500    C:\Sun\SUNWhadb\hadb.data.3

Getting Runtime Resource Information

The hadbm resourceinfo command displays HADB runtime resource information. You can use this information to help identify resource contention, and reduce performance bottlenecks. For details, see Tuning HADB in Sun Java System Application Server 9.1 Performance Tuning Guide.

The command syntax is:

hadbm resourceinfo  [--databuf]  [--locks]  [--logbuf]  [--nilogbuf]  
[--adminpassword=password | --adminpasswordfile=file]  
[--agent=maurl]  
[dbname]

The dbname operand specifies the database name. The default is hadb.

The following table describes the hadbm resourceinfo special command options. See General Options for a description of other command options.

For more information, see hadbm-resourceinfo(1).

Table 3–15 hadbm resourceinfo Command Options


Option	Description
--databuf -d	Display data buffer pool information. See Data Buffer Pool Information below for more information.
--locks -l	Display lock information. See Lock Information below for more information.
--logbuf -b	Display log buffer information. See Log Buffer Information below for more information.
--nilogbuf -n	Display node internal log buffer information. See Node Internal Log Buffer Information below for more information.

Data Buffer Pool Information

Data buffer pool information contains the following:

NodeNo: Node number.
Avail: Total space available in the pool, in MBytes.
Free: Free space available, in MBytes.
Access: Cumulative number of accesses to the data buffer from database, from start until now.
Misses: Cumulative number of page faults that have occurred from database start until now.
Copy-on-Write: Cumulative number of pages copied internally in the data buffer due to checkpointing.

When a user transaction performs an operation on a record, the page containing the record must be in the data buffer pool. If it is not, a miss or a page fault occurs. The transaction then has to wait until the page is retrieved from the data device file on the disk.

If the miss rate is high, increase the data buffer pool. Since the misses are cumulative, run hadbm resourceinfo periodically and use the difference between two runs to see the trend of miss rate. Do not be concerned if free space is very small, since the checkpointing mechanism will make new blocks available.

Example 3–17 Example data buffer pool information

For example:

NodeNO Avail Free Access Misses Copy-on-Write
0 256 128 100000 50000 10001 256 128 110000 45000 950

Lock Information

Lock information is as follows:

NodeNo: Node Number.
Avail: Total number of locks available on the node.
Free: Number of free locks.
Waits: Number of transactions waiting to acquire locks. This is cumulative.

One single transaction cannot use more than 25% of the available locks on a node. Therefore, transactions performing operations in large scale should be aware of this limitation. It is best to perform such transactions in batches, where each batch must be treated as a separate transaction, that is, each batch commits. This is needed because read operations running with repeatable read isolation level, and delete, insert, and update operations use locks that are released only after the transaction terminates.

To change the NumberOfLocks, see Clearing and Archiving History Files.

Example 3–18 Example lock information

For example:

NodeNO Avail Free Waits
0 50000 20000 101 50000 20000 0

Log Buffer Information

Log buffer information is:

NodeNo: Node Number
Available: amount of memory allocated for the log buffer in MB
Free: amount of free memory in MB

Do not worry if free space is very small, since HADB starts compressing the log buffer. HADB starts compression from the head of the ring buffer and performs it on consecutive log records. Compression cannot proceed when HADB encounters a log record that has not been executed by the node and received by the mirror node

Example 3–19 Example of log buffer information

For example:

NodeNO Avail Free
0 16 21 16 3

Node Internal Log Buffer Information

Node internal log buffer information is:

Node Number
Available: amount of memory allocated for the log device in MB
Free: amount of free memory in MB

Example 3–20 Example of internal log buffer information

For example:

NodeNO Avail Free

0 16 21 16 3