Sun ONE logo      Previous      Contents      Index      Next     

Sun ONE Application Server 7, Enterprise Edition Troubleshooting Guide

Chapter 5
Administration Problems

This chapter discusses problems that you may encounter while administering the Sun™ Open Net Environment (ONE) Application Server 7 product. Full reference material and instructions for performing administration tasks can be found in the Sun ONE Application Server Administrator’s Guide and Administrator’s Guide to Security.

The following sections are contained in this chapter:


Server Logs

This section covers:

Application Server logs

The Application Server collects and stores event information in two log files which are located in the logs directory:

Log entries can also be directed to another log file as specified by the administrator. In addition, each virtual server within an Application Server instance has its own identity and can have its own log file.

The following components and subsystems can utilize selective logging of server messages:

Extensive information on how these logs work and the information gathered in them is available in the Using Logging chapter of the Sun ONE Application Server Administrator’s Guide. Log levels are also described in the online help of the Administration interface.

HADB History Files

The HADB history files are named using the format dbname.out.node_number. There should be a file for each node on a host. If there are four nodes on a host locally, then four files are created, each with the node number corresponding to the node.

For example, for an HADB instance named failover, with two nodes on the same system, the history file names would be failover.out.0 and failover.out.1.

The history files for the current server are located in the /var/tmp directory.

For additional information on the history files, refer to the Configuring the High Availability Database chapter in the Sun ONE Application Server Administrator’s Guide

HADB logs

The logs associated with high-availability administration include the following:

Some guidelines on using logs:

Device directory location: /var/opt/SUNWhadb

Configuration files location: /etc/opt/SUNWhadb/dbdef


Command-Line Interface Problems

This section discusses problems that you may encounter while using the command-line interface of the Application Server.

Can’t access the command-line utility.

After installing the Application Server software, you will need to configure your environment to include the bin directory of the Application Server if you are going to do any of the following:

Solution

Add the install_dir/bin directory to your PATH environment variable. If you are not familiar with the process of setting environment variables, refer to the post-installation instructions in the Sun ONE Application Server Installation Guide.


Note

If your Admin Server is running under SSL, the --secure flag must be used.


Can’t access the Application Server man pages.

For the Solaris unbundled version of the product, you will not be able to access the man pages until you add the install_dir/man to the MANPATH environment variable.

Solution

Add install_dir/man to your MANPATH environment variable.


Graphical Interface Problems

This section discusses problems that you may encounter while using the Administration interface of the Application Server.


Note

If your Admin Server is running under SSL, https://... must be used for browser access.


This section addresses the following issues:

Can’t access the Administration interface.

If the connection was refused when attempting to invoke the graphical Administration interface, it is likely that the Admin Server is not running.

Solution

Refer to "Can't access the Admin Server." for information on troubleshooting this problem.

Can’t undo accidental “changes.”

If an instance has been flagged for Apply Changes Required, and you decide NOT to make changes (perhaps the changes were a mistake and you want to forget the whole thing), there is no obvious method to unset the Apply Changes Required condition. Clicking Apply Changes seems to be forced at this point.

Shutting down your browser, restarting the Application Server instance, and so on. does NOT clear the Apply Changes flag. You are still prompted to apply the changes (since the backup configuration file is different from the current and applied configuration file).

Solution

Copy over the latest updated server.xml file from the backup before updates are applied. This should effectively turn off the Apply Changes Required flag.


Monitoring Problems

This section covers:

Load Balancer Plug-in isn’t being monitored

Logging for the load balancer plug-in is not automatically turned on. To turn on load balancer plug-in log messages:

  1. Set the web server logging level to DEBUG.
  2. Set the value of the require-monitor-data property to true. For example:
  3. <property name=?require-monitor-data” value=”true” />


    Tip

    When logging is enabled on the load balancer plug-in, the load balancer writes HTTP session IDs in the web server log files. Therefore, if the web server hosting the load balancer plug-in is located in the DMZ, we recommend that you do not use the DEBUG or similar log level in production environments. If you must use the DEBUG logging level, then you should turn off load balancer logging by setting the require-monitor-data property to false in loadbalancer.xml file.


For more information, refer to the Configuring Load Balancer chapter of the Sun ONE Application Server Administrator’s Guide.


Authentication/Authorization Problems

This section addressed the following problems:

Can’t import the certificate for my server.

Consider the following:

Has the trust database been created?

If you haven't created the trust database in Sun ONE Application Server, you need to do that.

Solution

In the Security page of the Administration interface, click the Manage Database tab and create the trust database by entering its password.

Was the certificate generated with the right tool?

The app server supports NLS database only. So, certutil and openssl are compatible tools. You can’t use certificates generated by keytool directly on Appserver.

Solution

Generate the certificate with certutil or openssl.

The server does not recognize my certificate.

There are three certificates involved in client certificate authentication.

  1. First is the server certificate with which you will enable security in the server instance. This must be installed in the server as a Certificate for "This server.”
  2. Second is the client certificate which you will install in the browser to authenticate yourself to the server when client-cert authentication is enabled.
  3. Third is the server certificate chain which links the prior two certificates. This must be installed in the server instance as the certificate for "Server certificate chain.” If this certificate is not installed on the server instance, the instance doesn't know which client certificate to authenticate.
Solution

Verify that all the certificates have been implemented correctly. Be sure that you implement the chain in #3 and that the ROOT Certificate Authority (CA) is trusted.

LDAP authentication/equalization is not working.

In order for the Application Server to use an LDAP-based directory server for authentication and authorization, the security realm must be configured and the LDAP realm must be activated.

Solution
  1. In the left pane of the Administration interface, expand the server1/Security/Realms/ldap tree.
  2. In the right panel, verify that the Classname field contains the following information:
  3. com.iplanet.ias.security.auth.realm.ldap.LDAPRealm

    This class is the interface between the Application Server and the LDAP-based directory server.

  4. Click Properties to display the pane for configuring specifics for the Directory Server implementation. Enter data similar to the following:
    • Name: directory Value: ldap://localhost:389
    • base-dn Value: dc=sun,dc=com
    • jaas-context Value ldapRealm
  5. In the left pane of the Administration interface, expand the server1/Security hierarchy and change the Default Realm to ldap.
  6. Apply Changes and Restart your instance as prompted.


HADB Administration Problems

In the Sun ONE Application Server 7, Enterprise Edition, the hadbm and its many subcommands and options is provided for administering the high-availability database (HADB). A summary of the hadbm commands in contained in "Summary of High Availability Commands".

The hadbm command is located in the install_dir/SUNWhadb/4/bin directory.

Refer to the chapter on Configuring the High Availability Database in the Sun ONE Application Server Administrator’s Guide for a full explanation of this command. Specifics on the various hadbm subcommands are explained in the hadbm man pages.

The following problems are addressed in this section:

hadbm command fails: host unreachable.

The command fails with the error, “Host unreachable: <hostname>”.

The host could be unreachable either because it is down, or because the communication pathway has not been established. To isolate the problem, consider the following:

Is the host up and running?

If the remote host isn’t running or can’t accept connections, attempts to access it will fail.

Solution

Try pinging the host to see if it is up and running, ready to accept communications:

ping <hostname>

Is RSH or SSH set up and running?

The communication pathway must be established before the hadbm command can succeed.

Solution

The hadbm commands will not work if host communication has not been set up. That is, the HADB nodes must have been configured for Remote Shell (RSH) or Secured Shell (SSH). Refer to “Preparing for HADB setup” in the Sun ONE Application Server Installation Guide for guidelines on verifying RSH and SSH.

If the verification does not work, remote communication for the cluster has not been set up correctly. Instructions for doing this are contained in the Setting Up Host Communication section of the Sun ONE Application Server Installation Guide.

Are the SSH binaries in the proper location?

When using SSH, the relevant binaries must be in the proper location.

Solution

If you use ssh, make sure that the binaries are in /usr/bin.

hadbm command fails: command not found

The hadbm command can be run from the current directory or you can set the search PATH to access the hadb commands from anywhere, which is much more convenient. The error, “hadbm: Command not found”, indicates that neither of these conditions has been met.

Solution 1

You can cd to the directory that contains the hadbm command and run it from there:

cd install_dir/SUNWhadb/4/bin/hadbm

Solution 2

You can use the hadbm command from anywhere by setting the PATH variable. Instructions for setting the PATH variable are contained in the Preparing for HADB Setup chapter of the Sun ONE Application Server Installation Guide.

To verify that the PATH settings are correct, run the following commands:

which asadmin
which hadbm

These commands should echo the paths to the utilities.

hadbm command fails: JAVA_HOME not defined

The message “Error: JAVA_HOME is not defined correctly” indicates that the JAVA_HOME environment variable has not been set properly.

If multiple Java versions are installed on the system, you must ensure that the JAVA_HOME environment variable points to the correct Java version (1.4.1_03 for Enterprise Edition).

Instructions for setting the PATH variable are contained in the Preparing for HADB Setup chapter of the Sun ONE Application Server Installation Guide.

create fails: “path does not exist on a host”

After issuing the hadbm create command, an error like the following appears on the console:

./hadbm create ...
...
hadbm:Error 22022: Specified path does not exist on a host. Please specify a valid path: [ machineName ... ]

This error message indicates that the HADB server component is not installed on the machine on which you are trying to create the HA database.

Solution

Install the HADB server on the machine you are creating the HADB on and run the command again.

database doesn’t start.

The create or start command fails with the console error message:

hadbm: Error 22095: Database could not be started...

Consider the following possibilities:

Was there a shared memory get segment failure?

If the history files show the error message:

..'systemerr'..HADB-S-01760: Shared memory get segment failed..

Solution 1: Use sync;sync and reboot instead of init 6

The hadbm create command can fail with this error occurs after making changes to /etc/system and doing a system reset with the init 6 command.

Instead of re-spinning the machine with init 6, do sync;sync as root user and then reboot.

Solution 2: Increase the amount of shared memory

There may not be as much shared memory as the HADB needs. The amount of shared memory required by HADB depends on parameters like DataBufferPoolSize, LogbufferSize, and other parameters. Look into the file /etc/system and set shmsys:shminfo_shmmax to the maximum value possible (the preferred value is 0xffffffff).

Verify that other shared memory settings are configured correctly. After making your changes, issue the hadbm stop command and reboot the machine.

For more information on the mechanics of configuring shared memory, consult the chapter, “Preparing for HADB Setup” in the Sun ONE Application Server Installation Guide. For guidelines on choosing the best settings, consult the Performance Tuning Guide.

Solution 3: Verify /etc/system settings

Verify the settings in the system file. Even a single mistyped character will create problems.

Solution 4: Resolve conflicts

Use ipcs to see if there are any shared memory segments or semaphores occupied unnecessarily by you or the other users. Use ipcrm to free them and then try starting the database.

Solution 5: Increase the number of semaphores

If the problem persists, then the operating system may not have enough shared memory or semaphores, etc. Increase them according to the number of nodes you have in the machine. (For details, see the Deployment Guide). Note that after making these changes, you must restart the machine to make them available.

Do the history files contain errors?

If the problem still persists, look into the HADB history files.

Note:
The hadbm utility cleans up all the files it created when hadbm create fails. In that case, the messages about the cause of the error are lost. But if the client machine has a historypath directory (default /var/tmp), then the history files are preserved there when the command fails.

If the historypath directory does not exist, you need to examine the syslogs on the hosts for error and warning messages from HADB. Messages are prefixed with “HADB” (the default syslogprefix value, which can be changed using the create command’s --set option).

Some of the more likely error messages to look for are:

Once you’ve verified that none of the above errors have occurred, try the following remedies, in order:

For more information, refer to the Error Message Reference.

Do you need a simple solution?

As a last resort, try the following possible solutions.

Solution 1: Delete the database

Issue the hadbm delete command, and see if that allows the hadbm create to proceed normally.

Solution 2: Reboot the machine.

Sometimes a system reboot is the necessary last resort. Issue hadbm delete, reboot, and then rerun the hadb create command.

clear command failed

When this command fails, the history files are likely to explain why. See "Do the history files contain errors?" for instructions on viewing the history files and a list of some common error messages.

create-session-store failed

The asadmin create-session-store command could fail for one of these reasons:

Invalid user name or password

This error occurs when the --dbsystempassword you supplied to create-session-store command is not the same password as the one given at the time of database creation.

Solution 1

Try the command again with the correct password.

Solution 2

If you can't remember the dbsystem password, you’ll need to clear the database using hadbm clear and provide a new dbsystem system password.

SQLException: No suitable driver

The create-session-store produced the error: SessionStoreException: java.sql.SQLException: No suitable driver.

Solution 1

This error can occur when asadmin is not able to find hadbjdbc4.jar from the AS_HADB path defined in the asenv.conf in application server config directory.

To solve it, change the AS_HADB to point to the location of your HADB installation

Here is a sample AS_HADB entry from an asenv.conf file:

AS_HADB=/export/home0/hercules/0815/SUNWhadb/4.2.2-17

Solution 2

This error can also occur if you provide the incorrect value for --storeUrl. To solve that problem, obtain the correct URL using hadbm get jdbcURL.

“No space left on device” appears in server.log

When the error message “No space left on device“ appears at regular intervals in the server.log, it can indicate that the HADB has run out of shared memory. To solve the problem, see "Solution 2: Increase the amount of shared memory".

On the other, if the message does not come and go intermittently, then you need to add a device or increase the size of existing devices.

Solution: Determine available space for user data

To determine the space available for user data, take 99% of the total device size, then subtract 4 times the LogBufferSize. The difference between the total device size and the free size is the user data size. If the data may be refragmented in the future, the user data size should not exceed 50% of the space available for user data. If refragmentation is not relevant, close to 100% may be used.

node status is “starting” after issuing stopnode.

After issuing the stopnode command, hadbm status --nodes shows the node as starting.

This situation occurs if you have set up inetd, because the node is automatically restarted when you stop a node. The result is that the node resumes the running state but is in the offline role.

Therefore, if you have set up inetd, and you want to make changes to the host which requires a reboot, then you need to perform some additional tasks to stop the node:

Solution
  1. Comment out the inetd entry for that node from the inetd configuration files (or the node is automatically restarted as soon as you stop it).
  2. Re-add the entry to the inetd files after you have restarted the node.
  3. Restart inetd by sending a SIGHUP to the process. For example:
  4. ps -e |grep inetd (to find PID)

    kill -HUP <PID_inetd>

For additional information, refer to the HADB Setup chapter of the Sun ONE Application Server Administrator’s Guide.

Node failure occurred.

If hadbm status shows that a node has stopped, you should first follow the instructions in "Examine the history files" to see what caused the node failure. A common situation is a power failure on the machine where the node resides. (If a power failure hasn’t occurred, examine the logs to find the cause.)


Tip

You may want to restart a node if you notice strange behavior in a node (for example excessive CPU consumption) and want to check whether a restart cures the problem. Use the hadbm restartnode command to restart an HADB node.


Solution

To get the node back up and running, do the following (example for node 2):

Double node failure occurred.

A properly configured HADB installation will handle single node failures. Double node failures (two mirror nodes are down) are not handled—all persistent data are lost. In the case of power failure, data will be consistent and available, as long as there isn’t a double node failure.

When a double node failure occurs, the hadbm status command shows the database as “non-operational”.

Solution

Reset the database by following the instructions in "Clear the database and recreate session store".

Can’t restart the HADB after an ungraceful shutdown.

Situation: On a machine with a running HADB, the machine was shut down without first stopping the database. After restarting the machine, the database does not start. The node status shows that the nodes are in Starting or Recovering state. Even after stopping and then restarting each of the nodes, they remain in the Starting state. Eventually, the node status changes to Stopped.

In the case of a power failure, it is most likely that data is in an inconsistent state, that is, a record may have been in the process of being committed when the power failed.


Tip

If you notice strange HADB behavior (for example consistent timeout problems) and want to check whether a restart cures the problem, use the hadbm restart command.

When you restart the HADB in that manner, data remains available. On the other hand, if you stop and start HADB in separate operations using hadbm stop and hadbm start, data is unavailable while HADB is stopped.


Solution

To rectify the situation, verify that the node states show Starting/Recovering, then reset the database by following the instructions in "Clear the database and recreate session store".

hadbm command doesn’t return control to user.

Many hadbm commands, in particular hadbm set, restart all the nodes of the database in order. If some problem has occurred, then the command may not return.

Solution 1

From another window/shell, look at the history files for all the nodes to see if an error has occurred or if the command is still in progress. Run hadbm status --nodes to see if all the nodes are up and running. If they are not and there appears to be a permanent failure, you will need to cancel the command, and then try running hadbm restart.

Solution 2

If Solution 1 fails, and your command was an attempt to set a configuration value for hadbm, try resetting it back to its old value and see if the database restarts correctly.

If the restart continues to fail, follow the instructions in "Clear the database and recreate session store" to reset the database.

Solution 3

If clearing the database is unsuccessful, you’ll have to delete the database using hadbm delete, recreate it using hadbm create, and then recreate the session store using asadmin create-session-store.


Cluster Administration Problems

In the Sun ONE Application Server 7, Enterprise Edition, you can use the cladmin command to run the following asadmin commands simultaneously on all application server instances in a cluster: start-instance, stop-instance, deploy, undeploy, create-jdbc-resource, create-jdbc-connection-pool, configure-session-persistence, delete-jdbc-resource, delete-jdbc-connection-pool. This simplifies the task of cluster administration.

The cladmin command is located in the install_dir/bin directory. The default location of the cladmin input files, clinstance.conf and clpassword.conf, is /etc/opt/SUNWappserver7.

Refer to the chapter on Using the cladmin Command in the Sun ONE Application Server Administrator’s Guide for a full explanation of this command.

This section addresses the following problems:

Refragmentation of the HADB fails.

The attempt to refragment the HADB failed.

Consider the following possibility:

Is there enough space on the data devices?

Messages like these indicate that refragmentation failed for lack of space on the data devices:

HIGH LOAD: about to run out of device space ...
HIGH LOAD: about to run out of device space on mirror node ...

The problem occurs when data devices are filled beyond 50% or 60% of the available space, which does not leave enough extra space to carry out the refragmentation. (To see how much space has been used on the machine, use the df command. To calculate the amount of space that can be used for user data, see "Solution: Determine available space for user data".)


Tip

Monitor your data device usage using the hadbm deviceinfo command.


Solution 1

If your history files are becoming too large, clear them using the hadbm clearhistory command. This is the simplest solution, so try it first. History files are located in /var/tmp.

Solution 2

Find out what disk the data devices are on with the hadbm get DevicePath command and check the for space on that disk. If there is room, increase the size of the data devices using the following command:

hadbm set TotalDataDevicePerNode=size

Solution 3

If your data devices are using more than 50-60% of capacity and you cannot increase the size of your device as suggested above, do one of the following:

Solution 4

If your devices are running at 80% or 90%, and all else fails, follow the instructions in "Clear the database and recreate session store".

The cladmin command is not working.

Consider the following possibilities:

Are the Admin Servers of all the instances in the cluster started?

Before running the clsetup command, all the Admin Servers in the cluster must be running.

Do all the instances in the cluster have same administrator user name and password?

During installation, the installation program creates a clinstance.conf file with entries for two instances. If you add more instances to the cluster, you must add information about these instances in the clinstance.conf file.

Are the input files correct?

The order in which entries appear in the clinstance.conf file is important and must not be changed from the default order. If you add information about more application server instances, entries for these instances must in the correct order.

Solution

Verify that any changes you have made to the input files follow the format specified in the Sun ONE Application Server Administrator’s Guide.

Are the input files on all instances in the cluster identical?

The values in the input files must be identical on all instances in the cluster. The cladmin command is not designed to set up each instance with different values.

Solution

Verify that the cladmin input files are identical on all instances in the cluster.

Application is not available on the cluster.

Consider the following possibilities:

Did the application deploy successfully to the cluster?

It’s possible that the deploy operation failed. To find out, run this command against each instance in the cluster.

asadmin list-components --type web

Solution

If the application isn’t listed, try redeploying it and look for errors during deployment.


Common Administration and Recovery Actions

This section describes common administrative and recovery procedures that are used in a variety of situations.

This section covers:

Examine the history files

The history files are generally found at their default location, /var/tmp. If they are not at that location, use hadbm get HistoryPath to find the path to the history files.

The history file names are of the form <dbname>.out.<nodenumber>. The default database name is hadb, so for the default database name, the history file for node 0 would be hadb.out.0.

Maintain service while taking HADB offline

Any command that makes HADB unavailable (such as hadbm clear) causes the application servers to start reporting errors in the error log. Client requests will then take a long time to get handled as the application continues retrying its requests to HADB, which can’t answer because it is unavailable.

You can avoid this situation by disabling session persistence prior to clearing or stopping the database. This procedure takes time, but it lets the system maintain full service of your application(s) while HADB is down.

Perform the following steps:

  1. Disable session persistence by using cladmin to set availability-enabled to false for the cluster. (See the “Session Persistence” section of the Admin Guide for the details of this procedure.)
  2. Restart all your instances using the following approach:
    • Disable half of the instances in your cluster (or as many as you can at a time to maintain the necessary level of service for your application) by marking them as disabled in the load balancer configuration file. (See the load balancing section of the Admin Guide for details
    • After the quiescence period has been reached, restart the disabled instances, and then re-enable them in the load balancer.
    • Repeat those steps for the next batch of instances until you have restarted all the instances
  3. Once HADB is back up and running again, set availability-enabled to true and follow the restart process again.

Clear the database and recreate session store

Clearing the database, restarting it, and recreating the session store is always the quickest way to fix your database. All session data in HADB will be lost, but all session data will still be available because it exists in the application server cache. (The only exception is sessions that have been passivated. They will not be in the application server cache and thus will be lost when you clear HADB.)


Tip

If you need to keep servicing user requests, following the instructions in the previous section, “Maintain service while taking HADB offline”.



Tip

Avoid losing important data.

For transient session data, losing the data is generally not an issue. The problem concerns losing passivated session state, that is, when a session no longer has any data in the Application Server cache and the state has been passivated to the HADB. Sessions are passivated when the maximum number of sessions exceeds the number specified in sun-web.xml file for each application.

To avoid losing data, configure the application to reject any new session requests when the maximum number of sessions is met by setting the maximum number of sessions to a very high value and rejecting any sessions beyond that. This prevents passivation, thus avoiding the risk of losing session data if you need to clear the HADB.


  1. Use this command to clear the database, reinitialize all data devices, and recreate all system tables:
  2. hadbm clear --spares x --dbpassword=tttt smokedb

    where x is the number of spares you originally had, and tttt is the database password.

    This command clears the database—all your old data is lost.

  3. Get the JDBC URL:
  4. asadmin hadbm get jdbcURL smokedb

  5. Recreate the appserver schemas and set up session persistence:
  6. asadmin create-session-store
            --storeurl <jdbc url returned from step 4>
      --storeuser appservusr --storepassword <password>



Previous      Contents      Index      Next     


Copyright 2003 Sun Microsystems, Inc. All rights reserved.