![]() | |
Sun Java System Application Server 7 2004Q2 Update 1 Standard and Enterprise Edition Troubleshooting Guide |
Chapter 6
Administration ProblemsThis chapter discusses problems that you may encounter while administering the Application Server. Full reference material and instructions for performing administration tasks can be found in the Administrator’s Guide and Administrator’s Guide to Security.
The following sections are contained in this chapter:
Server LogsThis section covers:
Application Server logs
The Application Server collects and stores event information in two log files which are located in the logs directory:
Log entries can also be directed to another log file as specified by the administrator. In addition, each virtual server within an Application Server instance has its own identity and can have its own log file.
The following components and subsystems can utilize selective logging of server messages:
Extensive information on how these logs work and the information gathered in them is available in the Using Logging chapter of the Sun Java System Application Server Administrator’s Guide. Log levels are also described in the online help of the Administration interface.
HADB History Files
Inspection of the history files is a common procedure described under Examining the HADB history files.
General ProblemsThis section covers the following problems:
All administrative operations fail
A message like the following appears while performing adminstrative operations:
Manual changes to [server1] configuration detected.
Please reconfigure (Keep or Discard manual changes).All subsequent operations then fail, except for the operation that gets you out of this situation.
This problem arises when users edit any of the major configuration files manually, without using the administrative interfaces (Admin Console or CLI).
Explanation
In reality, there are two sets of configuration files, one used by the server, and one that the administrative interfaces manipulate. The existence of two files requires the system to “apply” any changes you make using an administrative interface. For example, if you create a resource with the asadmin create-resource command, the server does not see that resource until you reconfigure the server.
The administrative server continuously examines the following configuration files of a server instance <install_dir>/config:
- server.xml - Configuration for all the J2EE subsystems and resources, applications etc. One per instance.
- init.conf - Configuration for core Web Engine. One per instance.
- <virtual-server-id>.obj.conf - Virtual Server configuration. One per virtual server. Hence could be multiple of these per instance.
- mime.types - Configuration for all the mime types that the server supports. Could be multiple per instance.
If any of these files is manually modified, the administrative server blocks users from doing any administrative tasks, to prevent any possibility of conflict between the manual modification and changes made through administrative interface.
Important Note:
The administration server uses the files in the <instance_dir>/config/backup directory, rather than the configuration files that the server runtime uses, in <instance_dir>/config. Never modify the backup files by hand. If you happen to change the actual configuration files llisted above, you will have to use the reconfigure command in asadmin.Solution
To resolve this situation following actions can be taken. They are presented in order of safety, so each one requires an increased degree of caution.
- Be absolutely sure that the manual modification is correct from both XML validation point of view (it should resolve correctly against the DTD) and general verification point of view (for example, you have not set an undeployed web module as the default web module of a virtual server).
- Start asadmin in multi or single mode. In single mode you generally need to specify administrative server credentials on the command line .
- Issue one of the following reconfigure commands:
- Issue reconfigure --keepmanualchanges server-id
where server-id is the target server whose configuration was modified. This command will change the administration server's view of the server's configuration and modify itself to use the manual changes. Use this option only if you are absolutely sure that your manual changes were correct. Both the servers will then be in sync.- Or issue reconfigure --discardmanualchanges server-id
where server-id is the target server whose configuration was modified. This command changes the target server's view of its own configuration. In most cases, the administration server will notify the target server (if running) of the changes, and the changes will be dynamically applied. The administration server and target server will then be in sync.- If you are absolutely sure about the changes that you made to the real configuration files, then you can take following actions—but they are not recommended. They are to be used only as a last resort, and only then by a power user:
- Stop both the administrative server and target server.
- Overwrite the modified files (for example, server.xml and init.conf) in <instance_dir>/config/backup directory with the files in <instance_dir>/config directory.
- Delete <instance_dir>/config/backup/server.xml.timestamp
and <instance_dir>/config/backup/init.conf.timestamp .
Command-Line Interface ProblemsThis section discusses problems that you may encounter while using the command-line interface of the Application Server.
Can’t access the command-line utility.
After installing the Application Server software, you will need to configure your environment to include the bin directory of the Application Server if you are going to do any of the following:
Solution
Add the install_dir/bin directory to your PATH environment variable. If you are not familiar with the process of setting environment variables, refer to the post-installation instructions in the Sun Java System Application Server Installation Guide.
Can’t access the Application Server man pages.
For the Solaris unbundled version of the product, you will not be able to access the man pages until you add the install_dir/man to the MANPATH environment variable.
Solution
Add install_dir/man to your MANPATH environment variable.
Graphical Interface ProblemsThis section discusses problems that you may encounter while using the Administration interface of the Application Server.
This section addresses the following issues:
Can’t access the Administration interface.
If the connection was refused when attempting to invoke the graphical Administration interface, it is likely that the Admin Server is not running.
Solution
Refer to Can't access the Admin Server. for information on troubleshooting this problem.
Can’t undo accidental “changes.”
If an instance has been flagged for Apply Changes Required, and you decide NOT to make changes (perhaps the changes were a mistake and you want to forget the whole thing), there is no obvious method to unset the Apply Changes Required condition. Clicking Apply Changes seems to be forced at this point.
Shutting down your browser, restarting the Application Server instance, and so on. does NOT clear the Apply Changes flag. You are still prompted to apply the changes (since the backup configuration file is different from the current and applied configuration file).
Monitoring ProblemsThis section covers:
Load Balancer Plug-in isn’t being monitored
Logging for the load balancer plug-in is not automatically turned on. To turn on load balancer plug-in log messages:
- Set the web server logging level to DEBUG.
- Set the value of the require-monitor-data property to true. For example:
<property name=”require-monitor-data” value=”true” />
For more information, refer to the Configuring Load Balancer chapter of the Sun Java System Application Server Administrator’s Guide.
Authentication/Authorization ProblemsThis section addressed the following problems:
Don’t know the admin username/password
You don’t have the admin username or password you need to administer the system.
Solution 1
Try the user name admin. This is the default user name specified in the server configuration dialog during installation. Typical passwords are adminadmin, or administrator.
Solution 2
Examine the following file (assuming your admin server is under domain1):
Solaris: /sun/appserver7/domains/domain1/admin-server/config/admpw
Windows 2000: D:\Sun\AppServer7\domains\domain1\admin-server\config\admpw
The file consists of a single line such as:
admin:{SHA}W6ph5Mm5Pz8GgiULbPgzG37mj9g=
The first field (before the colon) is the user name and the second field is the encrypted password. Although you can’t read the password, you can see the username, which may jog your memory.
Note:
The config directory that contains the admpw file can be accessed only by the user who installed the product.Solution 3
Change the username and reset the password to nothing by modifying the admpw file (assuming your admin server is under domain1):
To change the username, type it in place of the existing name. To reset the password, delete all text after the colon. Then save the file and restart the admin server. You can now log in to the admin UI using the specified username, with no password. You should then immediately set a new password by navigating to Admin Server -> Security -> Access Control.
Solution 4
Delete the administrative domain and recreate it with a new password.
Solution 5
As a last resort, uninstall and reinstall the Application Server.
Don’t know the Admin Server port number
If you do not know the HTTP server port number of the Admin Server, you can inspect the Admin Server's configuration file to determine the HTTP server port number:
Connection Refused when accessing the Admin Server
If the connection is refused when attempting to access the Admin Server with your browser, it is likely that the Admin Server is not running.
Solution
Start the Admin Server, if you have not already done so, using the instructions in “Starting and Stopping the Server” in the Installation Guide. Otherwise consult the Admin Server log file to determine why it failed to start, as described in “Using Logging” in the Administrator’s Guide..
Can’t import the certificate for my server.
Consider the following:
Has the trust database been created?
If you haven't created the trust database in Sun Java System Application Server, you need to do that.
Solution
In the Security page of the Administration interface, click the Manage Database tab and create the trust database by entering its password.
Was the certificate generated with the right tool?
The app server supports NLS database only. So, certutil and openssl are compatible tools. You can’t use certificates generated by keytool directly on Appserver.
Solution
Generate the certificate with certutil or openssl.
The server does not recognize my certificate.
There are three certificates involved in client certificate authentication.
- First is the server certificate with which you will enable security in the server instance. This must be installed in the server as a Certificate for "This server.”
- Second is the client certificate which you will install in the browser to authenticate yourself to the server when client-cert authentication is enabled.
- Third is the server certificate chain which links the prior two certificates. This must be installed in the server instance as the certificate for "Server certificate chain.” If this certificate is not installed on the server instance, the instance doesn't know which client certificate to authenticate.
Solution
Verify that all the certificates have been implemented correctly. Be sure that you implement the chain in #3 and that the ROOT Certificate Authority (CA) is trusted.
LDAP authentication/equalization is not working.
In order for the Application Server to use an LDAP-based directory server for authentication and authorization, the security realm must be configured and the LDAP realm must be activated.
Solution
- In the left pane of the Administration interface, expand the server1/Security/Realms/ldap tree.
- In the right panel, verify that the Classname field contains the following information:
com.iplanet.ias.security.auth.realm.ldap.LDAPRealm
This class is the interface between the Application Server and the LDAP-based directory server.
- Click Properties to display the pane for configuring specifics for the Directory Server implementation. Enter data similar to the following:
- In the left pane of the Administration interface, expand the server1/Security hierarchy and change the Default Realm to ldap.
- Apply Changes and Restart your instance as prompted.
HADB Administration ProblemsIn the Sun Java System Application Server 7, Enterprise Edition, the hadbm and its many subcommands and options is provided for administering the high-availability database (HADB). A summary of the hadbm commands in contained in Summary of High Availability Commands.
The hadbm command is located in the install_dir/SUNWhadb/4/bin directory.
Refer to the chapter on Configuring the High Availability Database in the Sun Java System Application Server Administrator’s Guide for a full explanation of this command. Specifics on the various hadbm subcommands are explained in the hadbm man pages.
The following problems are addressed in this section:
hadbm command fails: host unreachable.
The command fails with the error, “Host unreachable: <hostname>”.
The host could be unreachable either because it is down, or because the communication pathway has not been established. To isolate the problem, consider the following:
Is the host up and running?
If the remote host isn’t running or can’t accept connections, attempts to access it will fail.
Solution
Try pinging the host to see if it is up and running, ready to accept communications:
ping <hostname>
Is RSH or SSH set up and running?
The communication pathway must be established before the hadbm command can succeed.
Solution
The hadbm commands will not work if host communication has not been set up. That is, the HADB nodes must have been configured for Remote Shell (RSH) or Secured Shell (SSH). Refer to “Preparing for HADB setup” in the Installation Guide for guidelines on verifying RSH and SSH.
If the verification does not work, remote communication for the cluster has not been set up correctly. Instructions for doing this are contained in the Setting Up Host Communication section of the Sun Java System Application Server Installation Guide.
Are the SSH binaries in the proper location?
When using SSH, the relevant binaries must be in the proper location.
Solution
Make sure that the ssh and scp binaries are in /usr/bin.
Is your communication protocol configured properly?
Your communication protocol (RSH/SSH) must be configured properly.
Solution
If you are using clsetup, and you plan to use RSH for your communication, make sure you uncomment the following line in the clresource.conf file:
set managementProtocol=rsh
If you are using SSH, make sure you closely follow all the SSH configuration steps contained in the Sun Java System Application Server Installation Guide.
hadbm command fails: command not found
The hadbm command can be run from the current directory or you can set the search PATH to access the hadb commands from anywhere, which is much more convenient. The error, “hadbm: Command not found”, indicates that neither of these conditions has been met.
Solution 1
You can cd to the directory that contains the hadbm command and run it from there:
cd <install_dir>/SUNWhadb/4/bin/
hadbmSolution 2
You use the full path to invoke the hadbm command:
<install_dir>/SUNWhadb/4/bin/hadbm
Solution 3
You can use the hadbm command from anywhere by setting the PATH variable. Instructions for setting the PATH variable are contained in the Preparing for HADB Setup chapter of the Sun Java System Application Server Installation Guide.
To verify that the PATH settings are correct, run the following commands:
which asadmin
which hadbmThese commands should echo the paths to the utilities.
hadbm command fails: JAVA_HOME not defined
The message “Error: JAVA_HOME is not defined correctly” indicates that the JAVA_HOME environment variable has not been set properly.
If multiple Java versions are installed on the system, you must ensure that the JAVA_HOME environment variable points to the correct Java version (1.4.1_03 or above for Enterprise Edition).
Instructions for setting the PATH variable are contained in the Preparing for HADB Setup chapter of the Sun Java System Application Server Installation Guide.
create fails: “path does not exist on a host”
After issuing the hadbm create command, an error like the following appears on the console:
./hadbm create ...
...
hadbm:Error 22022: Specified path does not exist on a host. Please specify a valid path: [ machineName ... ]This error message indicates that the HADB server component is not installed on the machine on which you are trying to create the HA database.
Solution
Install the HADB server in the in the <install_dir> directory, and run the command again.
database doesn’t start.
The create or start command fails with the console error message:
hadbm: Error 22095: Database could not be started...
Consider the following possibilities:
Was there a shared memory get segment failure?
If the history files show the error message:
..'systemerr'..HADB-S-01760: Shared memory get segment failed..
Solution 1: Use sync;sync and reboot instead of init 6
The hadbm create command can fail with this error occurs after making changes to /etc/system and doing a system reset with the init 6 command.
Instead of re-spinning the machine with init 6, do sync;sync as root user and then reboot.
Solution 2: Increase the amount of shared memory
There may not be as much shared memory as the HADB needs. The amount of shared memory required by HADB depends on parameters like DataBufferPoolSize, LogbufferSize, and other parameters. Look into the file /etc/system and set shmsys:shminfo_shmmax to the maximum value possible (the preferred value is 0xffffffff).
Verify that other shared memory settings are configured correctly. After making your changes, issue the hadbm stop command and (for Solaris) reboot the machine. (For Linux, rebooting is not necessary.)
For more information on the mechanics of configuring shared memory, consult the chapter, “Preparing for HADB Setup” in the Sun Java System Application Server Installation Guide. For guidelines on choosing the best settings, consult the Performance Tuning Guide.
Solution 3: Verify /etc/system settings
Verify the settings in the system file. Even a single mistyped character can create problems.
Solution 4: Resolve conflicts
Use ipcs to see if there are any shared memory segments or semaphores occupied unnecessarily by you or the other users. Use ipcrm to free them and then try starting the database.
Solution 5: Increase the number of semaphores
If the problem persists, then the operating system may not have enough shared memory or semaphores, etc. Increase them according to the number of nodes you have in the machine. (For details, see the Deployment Guide). Note that after making these changes, you must restart the machine to make them available.
Do the history files contain errors?
If the problem still persists, inspect the history files, as described in Examining the HADB history files.
Some of the more likely error messages to look for are:
- Shared memory get segment failed
The system has not been set up with enough shared memory. (Discussed in the previous section.)- Could not verify node address
This message occurs when another process is using the port that an HADB server is trying to process. It can occur in several situations:
- The portBase is used by another process running on this host machine.
In that case, set the PortBase attribute to another value using the command
hadbm set portbase=<value>.- You tried to stop the hadb node for maintenence, but that action failed.
Try again to stop the node with the hadbm command. If that fails, kill the OS process clu_nsup_srv for this node without the -9 option. The nsup process should then stop its hadb child process. If the parent process nsup does not exist, kill all the child processes using kill -9. (For more information, see Is There a Disk Contention?.)- You stopped the hadb node for maintenence and an inetd process restarted the hadb node before you intended to start it.
In that case, make sure that inetd does not start the hadb node before you stop it by following the instructions in Stopping a node when inetd is active.- hadbm <command>' fails with internal error:
"The database could not be started”
Check the following:
- RSH and SSH are set up correctly, and you can communicate with all the machines in the HADB configuration.
- Shared memory is all correct on all machines in the HADB configuration.
- No other HADB databases are running on the machines, or any other processes that could be using the same port numbers.
- All necessary directories exist and have write permissions.
- There is enough space in directory where devices are going to be written.
Once you’ve verified that none of the above errors have occurred, try the following remedies, in order:
- Delete the database and retry.
- Delete the database, reboot, and retry.
- Delete database, reinstall the HADB software, and retry.
- Contact Product Support, as described on (more...) .
For more information, refer to the Error Message Reference.
Do you need a simple solution?
As a last resort, try the following possible solutions.
Solution 1: Delete the database
Issue the hadbm delete command, and see if that allows the hadbm create to proceed normally.
Solution 2: Reboot the machine.
Sometimes a system reboot is the necessary last resort. Issue hadbm delete, reboot, and then rerun the hadb create command.
clear command failed
When this command fails, the history files are likely to explain why. See Do the history files contain errors? for instructions on viewing the history files and a list of some common error messages.
create-session-store failed
The asadmin create-session-store command could fail for one of these reasons:
Invalid user name or password
This error occurs when the --dbsystempassword you supplied to create-session-store command is not the same password as the one given at the time of database creation.
Solution 1
Try the command again with the correct password.
Solution 2
If you can't remember the dbsystem password, you’ll need to clear the database using hadbm clear and provide a new dbsystem system password.
SQLException: No suitable driver
The create-session-store produced the error: SessionStoreException: java.sql.SQLException: No suitable driver.
Solution 1
This error can occur when asadmin is not able to find hadbjdbc4.jar from the AS_HADB path defined in the asenv.conf in application server config directory.
To solve it, change the AS_HADB to point to the location of your HADB installation
Here is a sample AS_HADB entry from an asenv.conf file:
AS_HADB=/export/home0/hercules/0815/SUNWhadb/4.2.2-17
Solution 2
This error can also occur if you provide the incorrect value for --storeUrl. To solve that problem, obtain the correct URL using hadbm get jdbcURL.
Attaching shared memory segment fails due to insufficient space
You get an error message like the following:
Attaching shared memory segment with key xx failed,
OS status=12 OS message: Not enough space.Solution: Increase shared memory
See Solution 2: Increase the amount of shared memory.
Can’t restart the HADB
HADB restart will not work after a double node failure. Additional recovery actions are needed before HADB can be restarted.
Symptoms of a double node failure include:
This problem occurs when mirror HADB host machines have failed or been rebooted, typically after a power outage, or when a machine is rebooted without first stopping the HADB (in a single-machine installation), or when a pair of mirror machines from both Data Redundancy Units (DRUs) are rebooted.
If mirror host machine pairs are rebooted, or if host failures cause an unplanned reboot of one or more mirror host machine pairs, then the mirror nodes on these machines are not available—and the data is likely to be in an inconsistent state, because a record may have been in the process of being committed when the power failed, or the reboot occured.
Tip
To prevent such problems, be sure to use the procedure described in the HADB chapter of the Admin Guide when rebooting as a part of a planned maintenance.
HADB cannot heal itself automatically in such 'double failure' situations, because the part of the data that resided on the pair nodes is lost. In such cases, hadbm start command does not succeed, and the hadbm status command shows that HADB is in a non-operational state.
Explanation
The HADB does much of its data management in memory, for performance. If both DRUs are rebooted, then the HADB doesn’t have a chance to write its data blocks to disk.
For more information on the DRUs and HADB configuation, see "Administering the High Availability Database" in the Administration Guide, and the Deployment Guide.
Solution
- In the Admin Guide, in the chapter "Administering the High Availability Database", follow the instructions under, "Recovering from Session Data Corruption".
- If other parts of the system are running, take the steps described in Maintaining service while taking the HADB offline.
- Verify that the node states show Starting/Recovering, then reset the database by following the instructions in Clearing the database and recreating the session store.
Error: Specified database does not exist
This error message occurs when the management host you are using to issue the hadbm command is different from the management host that was used to create the HADB .
Solution
Use the same host that was used when the HADB is created.
hadbm command doesn’t return control to user.
Many hadbm commands, in particular hadbm set, restart all the nodes of the database in order. If some problem has occurred, then the command may not return.
Solution 1
From another window/shell, look at the history files for all the nodes to see if an error has occurred or if the command is still in progress. Run hadbm status --nodes to see if all the nodes are up and running. If they are not and there appears to be a permanent failure, you will need to cancel the command, and then try running hadbm restart.
Solution 2
If Solution 1 fails, and your command was an attempt to set a configuration value for hadbm, try resetting it back to its old value and see if the database restarts correctly.
If the restart continues to fail, follow the instructions in Clearing the database and recreating the session store to reset the database.
Solution 3
If clearing the database is unsuccessful, you’ll have to delete the database using hadbm delete, recreate it using hadbm create, and then recreate the session store using asadmin create-session-store.
Cluster Administration ProblemsIn the Sun Java System Application Server 7, Enterprise Edition, you can use the cladmin command to run the following asadmin commands simultaneously on all application server instances in a cluster: start-instance, stop-instance, deploy, undeploy, create-jdbc-resource, create-jdbc-connection-pool, configure-session-persistence, delete-jdbc-resource, delete-jdbc-connection-pool. This simplifies the task of cluster administration.
The cladmin command is located in the install_dir/bin directory. The default location of the cladmin input files, clinstance.conf and clpassword.conf, is /etc/opt/SUNWappserver7.
Refer to the chapter on Using the cladmin Command in the Sun Java System Application Server Administrator’s Guide for a full explanation of this command.
This section addresses the following problems:
Refragmentation of the HADB fails.
The attempt to refragment the HADB failed.
Consider the following possibility:
Is there enough space on the data devices?
Messages like these indicate that refragmentation failed for lack of space on the data devices:
HIGH LOAD: about to run out of device space ...
HIGH LOAD: about to run out of device space on mirror node ...The problem occurs when data devices are filled beyond 50% or 60% of the available space, which does not leave enough extra space to carry out the refragmentation.
Solution 1: Make more unreserved blocks available on data devices
Use the df command to see how much space has been used on the machine. To determine the space available for user data, take 99% of the total device size, then subtract 4 times the LogBufferSize. The difference between the total device size and the free size is the user data size. If the data may be refragmented in the future, the user data size should not exceed 50% of the space available for user data. If refragmentation is not relevant, close to 100% may be used.
Solution 2
Find out what disk the data devices are on with the hadbm get DevicePath command and check the for space on that disk. If there is room, increase the size of the data devices using the following command:
hadbm set TotalDataDevicePerNode=size
If the data devices cannot accommodate a copy of the user data during refragmentation, then refragmentationwill not succeed. If the refragmenting is performed while adding nodes, you will need to delete the database and create a new database including the new nodes. In that case, the data is lost.
The cladmin command is not working.
Consider the following possibilities:
Are the Admin Servers of all the instances in the cluster started?
Before running the clsetup command, all the Admin Servers in the cluster must be running.
Do all the instances in the cluster have same administrator user name and password?
During installation, the installation program creates a clinstance.conf file with entries for two instances. If you add more instances to the cluster, you must add information about these instances in the clinstance.conf file.
Are the input files correct?
The order in which entries appear in the clinstance.conf file is important and must not be changed from the default order. If you add information about more application server instances, entries for these instances must in the correct order.
Solution
Verify that any changes you have made to the input files follow the format specified in the Sun Java System Application Server Administrator’s Guide.
Are the input files on all instances in the cluster identical?
The values in the input files must be identical on all instances in the cluster. The cladmin command is not designed to set up each instance with different values.
Solution
Verify that the cladmin input files are identical on all instances in the cluster.
Application is not available on the cluster.
Consider the following possibilities:
Did the application deploy successfully to the cluster?
It’s possible that the deploy operation failed. To find out, run this command against each instance in the cluster.
asadmin list-components --type web
Solution
If the application isn’t listed, try redeploying it and look for errors during deployment.
Common Administration and Recovery ActionsThis section describes common administrative and recovery procedures that are used in a variety of situations.
This section covers:
Examining the HADB history files
The history files are generally found at their default location, /var/tmp. If they are not at that location, use hadbm get HistoryPath to find the path to the history files.
The history file names are of the form <dbname>.out.<nodenumber>. The default database name is hadb, so for the default database name, the history file for node 0 would be hadb.out.0.
For example, for an HADB instance named failover, with two nodes on the same system, the history file names would be failover.out.0 and failover.out.1.
For additional information on the history files, refer to the Configuring the High Availability Database chapter in the Administrator’s Guide.
Maintaining service while taking the HADB offline
Any command that makes HADB unavailable (such as hadbm clear) causes the application servers to start reporting errors in the error log. Client requests will then take a long time to get handled as the application continues retrying its requests to HADB, which can’t answer because it is unavailable.
You can avoid this situation by disabling session persistence prior to clearing or stopping the database. This procedure takes time, but it lets the system maintain full service of your application(s) while HADB is down.
Perform the following steps:
- Disable session persistence by using cladmin to set availability-enabled to false for the cluster. (See the “Session Persistence” section of the Admin Guide for the details of this procedure.)
- Restart all your instances using the following approach:
- Disable half of the instances in your cluster (or as many as you can at a time to maintain the necessary level of service for your application) by marking them as disabled in the load balancer configuration file. (See the load balancing section of the Admin Guide for details
- After the quiescence period has been reached, restart the disabled instances, and then re-enable them in the load balancer.
- Repeat those steps for the next batch of instances until you have restarted all the instances
- Once HADB is back up and running again, set availability-enabled to true and follow the restart process again.
Starting the hadb nodes after rebooting the host machine
When one or more HADB host machines undergoes an unplanned reboot, use the hadbm status command to see if the machines hosted HADB mirror nodes. If they did, then:
- If the database status is “functional”, then the hadb nodes on the rebooted machines have to be restarted after the host machines have come up. Use hadbm status –nodes to find out which nodes are not running and which machines host those nodes, then start the nodes using hadbm startnode on their host machines.
- If the database status is “non-functional”, see Can’t restart the HADB.
Clearing the database and recreating the session store
Clearing the database, restarting it, and recreating the session store is always the quickest way to fix your database. All session data in HADB will be lost, but all session data will still be available because it exists in the application server cache. (The only exception is sessions that have been passivated. They will not be in the application server cache and thus will be lost when you clear HADB.)
Tip
If you need to keep servicing user requests, following the instructions in the previous section, “Maintaining service while taking the HADB offline”.
- Use this command to clear the database, reinitialize all data devices, and recreate all system tables:
hadbm clear --spares x --dbpassword=tttt smokedb
where x is the number of spares you originally had, and tttt is the database password.
This command clears the database—all your old data is lost.
- Get the JDBC URL:
asadmin hadbm get jdbcURL smokedb
- Recreate the appserver schemas and set up session persistence:
asadmin create-session-store
--storeurl <jdbc url returned from step 4>
--storeuser appservusr --storepassword <password>Stopping a node when inetd is active
If you have set up inetd, a node is automatically restarted when you issue the stopnode command The command hadbm status --nodes then shows the node as starting, even though you just stopped it. The node then resumes the running state, but is in the offline role.
To make changes to the host which require a reboot, you need to perform some additional tasks to stop the node:
- Comment out the inetd entry for that node from the inetd configuration files (otherwise, the node is automatically restarted as soon as you stop it).
- Re-add the entry to the inetd files after you have restarted the node.
- Restart inetd by sending a SIGHUP to the process. For example:
ps -e |grep inetd (to find PID)
kill -HUP <PID_inetd>For additional information, refer to the HADB Setup chapter of the Administrator’s Guide.
Rebooting a machine that has HADB nodes
HADB achieves fault tolerance by replicating data on mirror nodes. Mirror nodes should be placed on separate Data Redundancy Units in a deployment environment. (See the Admin Guide, chapter "Administering the High Availability Database" and the Deployment Guide for details on DRUs and HADB configurations.)
HADB tolerates single point failures—the failure of one node, the failure of one machine, the simultaneous failure of multiple machines belonging to the same DRU, or the failure of the whole DRU. However, HADB does not tolerate double failures—simultaneous failure of one or more mirror machine pairs.
For that reason, you need to exercies care when rebooting a machine that has HADB nodes. For a complete description of the procedure to follow , see the Administration Guide, HADB chapter, “Maintaining the HADB machines”.