Sun Java logo     Previous      Contents      Index      Next     

Sun logo
Sun Java System Application Server 7 2004Q2 Update 1 Standard and Enterprise Edition Troubleshooting Guide 

Chapter 6
Administration Problems

This chapter discusses problems that you may encounter while administering the Application Server. Full reference material and instructions for performing administration tasks can be found in the Administrator’s Guide and Administrator’s Guide to Security.

The following sections are contained in this chapter:


Server Logs

This section covers:

Application Server logs

The Application Server collects and stores event information in two log files which are located in the logs directory:

Log entries can also be directed to another log file as specified by the administrator. In addition, each virtual server within an Application Server instance has its own identity and can have its own log file.

The following components and subsystems can utilize selective logging of server messages:

Extensive information on how these logs work and the information gathered in them is available in the Using Logging chapter of the Sun Java System Application Server Administrator’s Guide. Log levels are also described in the online help of the Administration interface.

HADB History Files

Inspection of the history files is a common procedure described under Examining the HADB history files.


General Problems

This section covers the following problems:

All administrative operations fail

A message like the following appears while performing adminstrative operations:

Manual changes to [server1] configuration detected.
Please reconfigure (Keep or Discard manual changes).

All subsequent operations then fail, except for the operation that gets you out of this situation.

This problem arises when users edit any of the major configuration files manually, without using the administrative interfaces (Admin Console or CLI).

Explanation

In reality, there are two sets of configuration files, one used by the server, and one that the administrative interfaces manipulate. The existence of two files requires the system to “apply” any changes you make using an administrative interface. For example, if you create a resource with the asadmin create-resource command, the server does not see that resource until you reconfigure the server.

The administrative server continuously examines the following configuration files of a server instance <install_dir>/config:

If any of these files is manually modified, the administrative server blocks users from doing any administrative tasks, to prevent any possibility of conflict between the manual modification and changes made through administrative interface.

Important Note:
The administration server uses the files in the <instance_dir>/config/backup directory, rather than the configuration files that the server runtime uses, in <instance_dir>/config. Never modify the backup files by hand. If you happen to change the actual configuration files llisted above, you will have to use the reconfigure command in asadmin.

Solution

To resolve this situation following actions can be taken. They are presented in order of safety, so each one requires an increased degree of caution.

  1. Be absolutely sure that the manual modification is correct from both XML validation point of view (it should resolve correctly against the DTD) and general verification point of view (for example, you have not set an undeployed web module as the default web module of a virtual server).
  2. Start asadmin in multi or single mode. In single mode you generally need to specify administrative server credentials on the command line .
  3. Issue one of the following reconfigure commands:
    • Issue reconfigure --keepmanualchanges server-id
      where server-id is the target server whose configuration was modified. This command will change the administration server's view of the server's configuration and modify itself to use the manual changes. Use this option only if you are absolutely sure that your manual changes were correct. Both the servers will then be in sync.
    • Or issue reconfigure --discardmanualchanges server-id
      where server-id is the target server whose configuration was modified. This command changes the target server's view of its own configuration. In most cases, the administration server will notify the target server (if running) of the changes, and the changes will be dynamically applied. The administration server and target server will then be in sync.
  4. If you are absolutely sure about the changes that you made to the real configuration files, then you can take following actions—but they are not recommended. They are to be used only as a last resort, and only then by a power user:
    • Stop both the administrative server and target server.
    • Overwrite the modified files (for example, server.xml and init.conf) in <instance_dir>/config/backup directory with the files in <instance_dir>/config directory.
    • Delete <instance_dir>/config/backup/server.xml.timestamp
      and <instance_dir>/config/backup/init.conf.timestamp .


Command-Line Interface Problems

This section discusses problems that you may encounter while using the command-line interface of the Application Server.

Can’t access the command-line utility.

After installing the Application Server software, you will need to configure your environment to include the bin directory of the Application Server if you are going to do any of the following:

Solution

Add the install_dir/bin directory to your PATH environment variable. If you are not familiar with the process of setting environment variables, refer to the post-installation instructions in the Sun Java System Application Server Installation Guide.


Note

If your Admin Server is running under SSL, the --secure flag must be used.


Can’t access the Application Server man pages.

For the Solaris unbundled version of the product, you will not be able to access the man pages until you add the install_dir/man to the MANPATH environment variable.

Solution

Add install_dir/man to your MANPATH environment variable.


Graphical Interface Problems

This section discusses problems that you may encounter while using the Administration interface of the Application Server.


Note

If your Admin Server is running under SSL, https://... must be used for browser access.


This section addresses the following issues:

Can’t access the Administration interface.

If the connection was refused when attempting to invoke the graphical Administration interface, it is likely that the Admin Server is not running.

Solution

Refer to Can't access the Admin Server. for information on troubleshooting this problem.

Can’t undo accidental “changes.”

If an instance has been flagged for Apply Changes Required, and you decide NOT to make changes (perhaps the changes were a mistake and you want to forget the whole thing), there is no obvious method to unset the Apply Changes Required condition. Clicking Apply Changes seems to be forced at this point.

Shutting down your browser, restarting the Application Server instance, and so on. does NOT clear the Apply Changes flag. You are still prompted to apply the changes (since the backup configuration file is different from the current and applied configuration file).


Monitoring Problems

This section covers:

Load Balancer Plug-in isn’t being monitored

Logging for the load balancer plug-in is not automatically turned on. To turn on load balancer plug-in log messages:

  1. Set the web server logging level to DEBUG.
  2. Set the value of the require-monitor-data property to true. For example:
  3. <property name=”require-monitor-data” value=”true” />


    Tip

    When logging is enabled on the load balancer plug-in, the load balancer writes HTTP session IDs in the web server log files. Therefore, if the web server hosting the load balancer plug-in is located in the DMZ, we recommend that you do not use the DEBUG or similar log level in production environments. If you must use the DEBUG logging level, then you should turn off load balancer logging by setting the require-monitor-data property to false in loadbalancer.xml file.


For more information, refer to the Configuring Load Balancer chapter of the Sun Java System Application Server Administrator’s Guide.


Authentication/Authorization Problems

This section addressed the following problems:

Don’t know the admin username/password

You don’t have the admin username or password you need to administer the system.

Solution 1

Try the user name admin. This is the default user name specified in the server configuration dialog during installation. Typical passwords are adminadmin, or administrator.

Solution 2

Examine the following file (assuming your admin server is under domain1):

Solaris: /sun/appserver7/domains/domain1/admin-server/config/admpw

Windows 2000: D:\Sun\AppServer7\domains\domain1\admin-server\config\admpw

The file consists of a single line such as:

admin:{SHA}W6ph5Mm5Pz8GgiULbPgzG37mj9g=

The first field (before the colon) is the user name and the second field is the encrypted password. Although you can’t read the password, you can see the username, which may jog your memory.

Note:
The config directory that contains the admpw file can be accessed only by the user who installed the product.

Solution 3

Change the username and reset the password to nothing by modifying the admpw file (assuming your admin server is under domain1):

To change the username, type it in place of the existing name. To reset the password, delete all text after the colon. Then save the file and restart the admin server. You can now log in to the admin UI using the specified username, with no password. You should then immediately set a new password by navigating to Admin Server -> Security -> Access Control.

Solution 4

Delete the administrative domain and recreate it with a new password.

Solution 5

As a last resort, uninstall and reinstall the Application Server.

Don’t know the Admin Server port number

If you do not know the HTTP server port number of the Admin Server, you can inspect the Admin Server's configuration file to determine the HTTP server port number:

  1. Navigate to domain_config_dir/domain1/admin-server/config/ and open the server.xml file in a text editor.
  2. Look for the following element:
  3. http-listener id="http-listener-1" address="0.0.0.0" port="4848"...

    Here, port 4848 is the HTTP port number used by the admin server.

Connection Refused when accessing the Admin Server

If the connection is refused when attempting to access the Admin Server with your browser, it is likely that the Admin Server is not running.

Solution

Start the Admin Server, if you have not already done so, using the instructions in “Starting and Stopping the Server” in the Installation Guide. Otherwise consult the Admin Server log file to determine why it failed to start, as described in “Using Logging” in the Administrator’s Guide..

Can’t import the certificate for my server.

Consider the following:

Has the trust database been created?

If you haven't created the trust database in Sun Java System Application Server, you need to do that.

Solution

In the Security page of the Administration interface, click the Manage Database tab and create the trust database by entering its password.

Was the certificate generated with the right tool?

The app server supports NLS database only. So, certutil and openssl are compatible tools. You can’t use certificates generated by keytool directly on Appserver.

Solution

Generate the certificate with certutil or openssl.

The server does not recognize my certificate.

There are three certificates involved in client certificate authentication.

  1. First is the server certificate with which you will enable security in the server instance. This must be installed in the server as a Certificate for "This server.”
  2. Second is the client certificate which you will install in the browser to authenticate yourself to the server when client-cert authentication is enabled.
  3. Third is the server certificate chain which links the prior two certificates. This must be installed in the server instance as the certificate for "Server certificate chain.” If this certificate is not installed on the server instance, the instance doesn't know which client certificate to authenticate.
Solution

Verify that all the certificates have been implemented correctly. Be sure that you implement the chain in #3 and that the ROOT Certificate Authority (CA) is trusted.

LDAP authentication/equalization is not working.

In order for the Application Server to use an LDAP-based directory server for authentication and authorization, the security realm must be configured and the LDAP realm must be activated.

Solution
  1. In the left pane of the Administration interface, expand the server1/Security/Realms/ldap tree.
  2. In the right panel, verify that the Classname field contains the following information:
  3. com.iplanet.ias.security.auth.realm.ldap.LDAPRealm

    This class is the interface between the Application Server and the LDAP-based directory server.

  4. Click Properties to display the pane for configuring specifics for the Directory Server implementation. Enter data similar to the following:
    • Name: directory Value: ldap://localhost:389
    • base-dn Value: dc=sun,dc=com
    • jaas-context Value ldapRealm
  5. In the left pane of the Administration interface, expand the server1/Security hierarchy and change the Default Realm to ldap.
  6. Apply Changes and Restart your instance as prompted.


HADB Administration Problems

In the Sun Java System Application Server 7, Enterprise Edition, the hadbm and its many subcommands and options is provided for administering the high-availability database (HADB). A summary of the hadbm commands in contained in Summary of High Availability Commands.

The hadbm command is located in the install_dir/SUNWhadb/4/bin directory.

Refer to the chapter on Configuring the High Availability Database in the Sun Java System Application Server Administrator’s Guide for a full explanation of this command. Specifics on the various hadbm subcommands are explained in the hadbm man pages.

The following problems are addressed in this section:

hadbm command fails: host unreachable.

The command fails with the error, “Host unreachable: <hostname>”.

The host could be unreachable either because it is down, or because the communication pathway has not been established. To isolate the problem, consider the following:

Is the host up and running?

If the remote host isn’t running or can’t accept connections, attempts to access it will fail.

Solution

Try pinging the host to see if it is up and running, ready to accept communications:

ping <hostname>

Is RSH or SSH set up and running?

The communication pathway must be established before the hadbm command can succeed.

Solution

The hadbm commands will not work if host communication has not been set up. That is, the HADB nodes must have been configured for Remote Shell (RSH) or Secured Shell (SSH). Refer to “Preparing for HADB setup” in the Installation Guide for guidelines on verifying RSH and SSH.

If the verification does not work, remote communication for the cluster has not been set up correctly. Instructions for doing this are contained in the Setting Up Host Communication section of the Sun Java System Application Server Installation Guide.

Are the SSH binaries in the proper location?

When using SSH, the relevant binaries must be in the proper location.

Solution

Make sure that the ssh and scp binaries are in /usr/bin.

Is your communication protocol configured properly?

Your communication protocol (RSH/SSH) must be configured properly.

Solution

If you are using clsetup, and you plan to use RSH for your communication, make sure you uncomment the following line in the clresource.conf file:

set managementProtocol=rsh

If you are using SSH, make sure you closely follow all the SSH configuration steps contained in the Sun Java System Application Server Installation Guide.

hadbm command fails: command not found

The hadbm command can be run from the current directory or you can set the search PATH to access the hadb commands from anywhere, which is much more convenient. The error, “hadbm: Command not found”, indicates that neither of these conditions has been met.

Solution 1

You can cd to the directory that contains the hadbm command and run it from there:

cd <install_dir>/SUNWhadb/4/bin/
hadbm

Solution 2

You use the full path to invoke the hadbm command:

<install_dir>/SUNWhadb/4/bin/hadbm

Solution 3

You can use the hadbm command from anywhere by setting the PATH variable. Instructions for setting the PATH variable are contained in the Preparing for HADB Setup chapter of the Sun Java System Application Server Installation Guide.

To verify that the PATH settings are correct, run the following commands:

which asadmin
which hadbm

These commands should echo the paths to the utilities.

hadbm command fails: JAVA_HOME not defined

The message “Error: JAVA_HOME is not defined correctly” indicates that the JAVA_HOME environment variable has not been set properly.

If multiple Java versions are installed on the system, you must ensure that the JAVA_HOME environment variable points to the correct Java version (1.4.1_03 or above for Enterprise Edition).

Instructions for setting the PATH variable are contained in the Preparing for HADB Setup chapter of the Sun Java System Application Server Installation Guide.

create fails: “path does not exist on a host”

After issuing the hadbm create command, an error like the following appears on the console:

./hadbm create ...
...
hadbm:Error 22022: Specified path does not exist on a host. Please specify a valid path: [ machineName ... ]

This error message indicates that the HADB server component is not installed on the machine on which you are trying to create the HA database.

Solution

Install the HADB server in the in the <install_dir> directory, and run the command again.


Note

HADB executables cannot be installed on different paths on different hosts.


database doesn’t start.

The create or start command fails with the console error message:

hadbm: Error 22095: Database could not be started...

Consider the following possibilities:

Was there a shared memory get segment failure?

If the history files show the error message:

..'systemerr'..HADB-S-01760: Shared memory get segment failed..

Solution 1: Use sync;sync and reboot instead of init 6

The hadbm create command can fail with this error occurs after making changes to /etc/system and doing a system reset with the init 6 command.

Instead of re-spinning the machine with init 6, do sync;sync as root user and then reboot.

Solution 2: Increase the amount of shared memory

There may not be as much shared memory as the HADB needs. The amount of shared memory required by HADB depends on parameters like DataBufferPoolSize, LogbufferSize, and other parameters. Look into the file /etc/system and set shmsys:shminfo_shmmax to the maximum value possible (the preferred value is 0xffffffff).

Verify that other shared memory settings are configured correctly. After making your changes, issue the hadbm stop command and (for Solaris) reboot the machine. (For Linux, rebooting is not necessary.)

For more information on the mechanics of configuring shared memory, consult the chapter, “Preparing for HADB Setup” in the Sun Java System Application Server Installation Guide. For guidelines on choosing the best settings, consult the Performance Tuning Guide.

Solution 3: Verify /etc/system settings

Verify the settings in the system file. Even a single mistyped character can create problems.

Solution 4: Resolve conflicts

Use ipcs to see if there are any shared memory segments or semaphores occupied unnecessarily by you or the other users. Use ipcrm to free them and then try starting the database.

Solution 5: Increase the number of semaphores

If the problem persists, then the operating system may not have enough shared memory or semaphores, etc. Increase them according to the number of nodes you have in the machine. (For details, see the Deployment Guide). Note that after making these changes, you must restart the machine to make them available.

Do the history files contain errors?

If the problem still persists, inspect the history files, as described in Examining the HADB history files.

Some of the more likely error messages to look for are:

Once you’ve verified that none of the above errors have occurred, try the following remedies, in order:

For more information, refer to the Error Message Reference.

Do you need a simple solution?

As a last resort, try the following possible solutions.

Solution 1: Delete the database

Issue the hadbm delete command, and see if that allows the hadbm create to proceed normally.

Solution 2: Reboot the machine.

Sometimes a system reboot is the necessary last resort. Issue hadbm delete, reboot, and then rerun the hadb create command.

clear command failed

When this command fails, the history files are likely to explain why. See Do the history files contain errors? for instructions on viewing the history files and a list of some common error messages.

create-session-store failed

The asadmin create-session-store command could fail for one of these reasons:

Invalid user name or password

This error occurs when the --dbsystempassword you supplied to create-session-store command is not the same password as the one given at the time of database creation.

Solution 1

Try the command again with the correct password.

Solution 2

If you can't remember the dbsystem password, you’ll need to clear the database using hadbm clear and provide a new dbsystem system password.

SQLException: No suitable driver

The create-session-store produced the error: SessionStoreException: java.sql.SQLException: No suitable driver.

Solution 1

This error can occur when asadmin is not able to find hadbjdbc4.jar from the AS_HADB path defined in the asenv.conf in application server config directory.

To solve it, change the AS_HADB to point to the location of your HADB installation

Here is a sample AS_HADB entry from an asenv.conf file:

AS_HADB=/export/home0/hercules/0815/SUNWhadb/4.2.2-17

Solution 2

This error can also occur if you provide the incorrect value for --storeUrl. To solve that problem, obtain the correct URL using hadbm get jdbcURL.

Attaching shared memory segment fails due to insufficient space

You get an error message like the following:

Attaching shared memory segment with key xx failed,
OS status=12 OS message: Not enough space.

Solution: Increase shared memory

See Solution 2: Increase the amount of shared memory.

Can’t restart the HADB

HADB restart will not work after a double node failure. Additional recovery actions are needed before HADB can be restarted.

Symptoms of a double node failure include:

This problem occurs when mirror HADB host machines have failed or been rebooted, typically after a power outage, or when a machine is rebooted without first stopping the HADB (in a single-machine installation), or when a pair of mirror machines from both Data Redundancy Units (DRUs) are rebooted.

If mirror host machine pairs are rebooted, or if host failures cause an unplanned reboot of one or more mirror host machine pairs, then the mirror nodes on these machines are not available—and the data is likely to be in an inconsistent state, because a record may have been in the process of being committed when the power failed, or the reboot occured.


Tip

To prevent such problems, be sure to use the procedure described in the HADB chapter of the Admin Guide when rebooting as a part of a planned maintenance.


HADB cannot heal itself automatically in such 'double failure' situations, because the part of the data that resided on the pair nodes is lost. In such cases, hadbm start command does not succeed, and the hadbm status command shows that HADB is in a non-operational state.

Explanation

The HADB does much of its data management in memory, for performance. If both DRUs are rebooted, then the HADB doesn’t have a chance to write its data blocks to disk.

For more information on the DRUs and HADB configuation, see "Administering the High Availability Database" in the Administration Guide, and the Deployment Guide.


Tip

If you notice strange HADB behavior (for example consistent timeout problems) and want to check whether a restart cures the problem, use the hadbm restart command.

When you restart the HADB in that manner, data remains available. On the other hand, if you stop and start HADB in separate operations using hadbm stop and hadbm start, data is unavailable while HADB is stopped.


Solution
  1. In the Admin Guide, in the chapter "Administering the High Availability Database", follow the instructions under, "Recovering from Session Data Corruption".
  2. If other parts of the system are running, take the steps described in Maintaining service while taking the HADB offline.
  3. Verify that the node states show Starting/Recovering, then reset the database by following the instructions in Clearing the database and recreating the session store.

Error: Specified database does not exist

This error message occurs when the management host you are using to issue the hadbm command is different from the management host that was used to create the HADB .

Solution

Use the same host that was used when the HADB is created.

hadbm command doesn’t return control to user.

Many hadbm commands, in particular hadbm set, restart all the nodes of the database in order. If some problem has occurred, then the command may not return.

Solution 1

From another window/shell, look at the history files for all the nodes to see if an error has occurred or if the command is still in progress. Run hadbm status --nodes to see if all the nodes are up and running. If they are not and there appears to be a permanent failure, you will need to cancel the command, and then try running hadbm restart.

Solution 2

If Solution 1 fails, and your command was an attempt to set a configuration value for hadbm, try resetting it back to its old value and see if the database restarts correctly.

If the restart continues to fail, follow the instructions in Clearing the database and recreating the session store to reset the database.

Solution 3

If clearing the database is unsuccessful, you’ll have to delete the database using hadbm delete, recreate it using hadbm create, and then recreate the session store using asadmin create-session-store.


Cluster Administration Problems

In the Sun Java System Application Server 7, Enterprise Edition, you can use the cladmin command to run the following asadmin commands simultaneously on all application server instances in a cluster: start-instance, stop-instance, deploy, undeploy, create-jdbc-resource, create-jdbc-connection-pool, configure-session-persistence, delete-jdbc-resource, delete-jdbc-connection-pool. This simplifies the task of cluster administration.

The cladmin command is located in the install_dir/bin directory. The default location of the cladmin input files, clinstance.conf and clpassword.conf, is /etc/opt/SUNWappserver7.

Refer to the chapter on Using the cladmin Command in the Sun Java System Application Server Administrator’s Guide for a full explanation of this command.

This section addresses the following problems:

Refragmentation of the HADB fails.

The attempt to refragment the HADB failed.

Consider the following possibility:

Is there enough space on the data devices?

Messages like these indicate that refragmentation failed for lack of space on the data devices:

HIGH LOAD: about to run out of device space ...
HIGH LOAD: about to run out of device space on mirror node ...

The problem occurs when data devices are filled beyond 50% or 60% of the available space, which does not leave enough extra space to carry out the refragmentation.


Tip

Monitor your data device usage using the hadbm deviceinfo command.


Solution 1: Make more unreserved blocks available on data devices

Use the df command to see how much space has been used on the machine. To determine the space available for user data, take 99% of the total device size, then subtract 4 times the LogBufferSize. The difference between the total device size and the free size is the user data size. If the data may be refragmented in the future, the user data size should not exceed 50% of the space available for user data. If refragmentation is not relevant, close to 100% may be used.

Solution 2

Find out what disk the data devices are on with the hadbm get DevicePath command and check the for space on that disk. If there is room, increase the size of the data devices using the following command:

hadbm set TotalDataDevicePerNode=size

If the data devices cannot accommodate a copy of the user data during refragmentation, then refragmentationwill not succeed. If the refragmenting is performed while adding nodes, you will need to delete the database and create a new database including the new nodes. In that case, the data is lost.

The cladmin command is not working.

Consider the following possibilities:

Are the Admin Servers of all the instances in the cluster started?

Before running the clsetup command, all the Admin Servers in the cluster must be running.

Do all the instances in the cluster have same administrator user name and password?

During installation, the installation program creates a clinstance.conf file with entries for two instances. If you add more instances to the cluster, you must add information about these instances in the clinstance.conf file.

Are the input files correct?

The order in which entries appear in the clinstance.conf file is important and must not be changed from the default order. If you add information about more application server instances, entries for these instances must in the correct order.

Solution

Verify that any changes you have made to the input files follow the format specified in the Sun Java System Application Server Administrator’s Guide.

Are the input files on all instances in the cluster identical?

The values in the input files must be identical on all instances in the cluster. The cladmin command is not designed to set up each instance with different values.

Solution

Verify that the cladmin input files are identical on all instances in the cluster.

Application is not available on the cluster.

Consider the following possibilities:

Did the application deploy successfully to the cluster?

It’s possible that the deploy operation failed. To find out, run this command against each instance in the cluster.

asadmin list-components --type web

Solution

If the application isn’t listed, try redeploying it and look for errors during deployment.


Common Administration and Recovery Actions

This section describes common administrative and recovery procedures that are used in a variety of situations.

This section covers:

Examining the HADB history files

The history files are generally found at their default location, /var/tmp. If they are not at that location, use hadbm get HistoryPath to find the path to the history files.

The history file names are of the form <dbname>.out.<nodenumber>. The default database name is hadb, so for the default database name, the history file for node 0 would be hadb.out.0.

For example, for an HADB instance named failover, with two nodes on the same system, the history file names would be failover.out.0 and failover.out.1.


Note

The hadbm utility cleans up all the files it created when hadbm create fails. In that case, the messages about the cause of the error are lost. But if the client machine has a historypath directory (default /var/tmp), then the history files are preserved there when the command fails.

If the historypath directory does not exist, you need to examine the syslogs on the hosts for error and warning messages from HADB. Messages are prefixed with “HADB” (the default syslogprefix value, which can be changed using the create command’s --set option).


For additional information on the history files, refer to the Configuring the High Availability Database chapter in the Administrator’s Guide.

Maintaining service while taking the HADB offline

Any command that makes HADB unavailable (such as hadbm clear) causes the application servers to start reporting errors in the error log. Client requests will then take a long time to get handled as the application continues retrying its requests to HADB, which can’t answer because it is unavailable.

You can avoid this situation by disabling session persistence prior to clearing or stopping the database. This procedure takes time, but it lets the system maintain full service of your application(s) while HADB is down.

Perform the following steps:

  1. Disable session persistence by using cladmin to set availability-enabled to false for the cluster. (See the “Session Persistence” section of the Admin Guide for the details of this procedure.)
  2. Restart all your instances using the following approach:
    • Disable half of the instances in your cluster (or as many as you can at a time to maintain the necessary level of service for your application) by marking them as disabled in the load balancer configuration file. (See the load balancing section of the Admin Guide for details
    • After the quiescence period has been reached, restart the disabled instances, and then re-enable them in the load balancer.
    • Repeat those steps for the next batch of instances until you have restarted all the instances
  3. Once HADB is back up and running again, set availability-enabled to true and follow the restart process again.

Starting the hadb nodes after rebooting the host machine

When one or more HADB host machines undergoes an unplanned reboot, use the hadbm status command to see if the machines hosted HADB mirror nodes. If they did, then:

Clearing the database and recreating the session store

Clearing the database, restarting it, and recreating the session store is always the quickest way to fix your database. All session data in HADB will be lost, but all session data will still be available because it exists in the application server cache. (The only exception is sessions that have been passivated. They will not be in the application server cache and thus will be lost when you clear HADB.)


Tip

If you need to keep servicing user requests, following the instructions in the previous section, “Maintaining service while taking the HADB offline”.



Tip

Avoid losing important data.

For transient session data, losing the data is generally not an issue. The problem concerns losing passivated session state, that is, when a session no longer has any data in the Application Server cache and the state has been passivated to the HADB. Sessions are passivated when the maximum number of sessions exceeds the number specified in sun-web.xml file for each application.

To avoid losing data, configure the application to reject any new session requests when the maximum number of sessions is met by setting the maximum number of sessions to a very high value and rejecting any sessions beyond that. This prevents passivation, thus avoiding the risk of losing session data if you need to clear the HADB.


  1. Use this command to clear the database, reinitialize all data devices, and recreate all system tables:
  2. hadbm clear --spares x --dbpassword=tttt smokedb

    where x is the number of spares you originally had, and tttt is the database password.

    This command clears the database—all your old data is lost.

  3. Get the JDBC URL:
  4. asadmin hadbm get jdbcURL smokedb

  5. Recreate the appserver schemas and set up session persistence:
  6. asadmin create-session-store
            --storeurl <jdbc url returned from step 4>
      --storeuser appservusr --storepassword <password>

Stopping a node when inetd is active

If you have set up inetd, a node is automatically restarted when you issue the stopnode command The command hadbm status --nodes then shows the node as starting, even though you just stopped it. The node then resumes the running state, but is in the offline role.

To make changes to the host which require a reboot, you need to perform some additional tasks to stop the node:

  1. Comment out the inetd entry for that node from the inetd configuration files (otherwise, the node is automatically restarted as soon as you stop it).
  2. Re-add the entry to the inetd files after you have restarted the node.
  3. Restart inetd by sending a SIGHUP to the process. For example:
  4. ps -e |grep inetd (to find PID)
    kill -HUP <PID_inetd>

For additional information, refer to the HADB Setup chapter of the Administrator’s Guide.

Rebooting a machine that has HADB nodes

HADB achieves fault tolerance by replicating data on mirror nodes. Mirror nodes should be placed on separate Data Redundancy Units in a deployment environment. (See the Admin Guide, chapter "Administering the High Availability Database" and the Deployment Guide for details on DRUs and HADB configurations.)


Note

Single machine configurations are recommended only for development and test environments.


HADB tolerates single point failures—the failure of one node, the failure of one machine, the simultaneous failure of multiple machines belonging to the same DRU, or the failure of the whole DRU. However, HADB does not tolerate double failures—simultaneous failure of one or more mirror machine pairs.

For that reason, you need to exercies care when rebooting a machine that has HADB nodes. For a complete description of the procedure to follow , see the Administration Guide, HADB chapter, “Maintaining the HADB machines”.



Previous      Contents      Index      Next     


Copyright 2004 Sun Microsystems, Inc. All rights reserved.