Administration Problems

This chapter discusses problems that you may encounter while administering the Sun™ Open Net Environment (ONE) Application Server 7 product. Full reference material and instructions for performing administration tasks can be found in the Sun ONE Application Server Administrator’s Guide and Administrator’s Guide to Security.

Server Logs

Application Server logs

The Application Server collects and stores event information in two log files which are located in the logs directory:

Log entries can also be directed to another log file as specified by the administrator. In addition, each virtual server within an Application Server instance has its own identity and can have its own log file.

The following components and subsystems can utilize selective logging of server messages:

CORBA-based clients (ORB)

Web container

Enterprise JavaBeans (EJB) container

Message-driven bean (MDB) container

Java Transaction Service (JTS)

Java Message Service (JMS)

Virtual Servers

Extensive information on how these logs work and the information gathered in them is available in the Using Logging chapter of the Sun ONE Application Server Administrator’s Guide. Log levels are also described in the online help of the Administration interface.

HADB History Files

The HADB history files are named using the format dbname.out.node_number. There should be a file for each node on a host. If there are four nodes on a host locally, then four files are created, each with the node number corresponding to the node.

For example, for an HADB instance named failover, with two nodes on the same system, the history file names would be failover.out.0 and failover.out.1.

For additional information on the history files, refer to the Configuring the High Availability Database chapter in the Sun ONE Application Server Administrator’s Guide

HADB logs

The logs associated with high-availability administration include the following:

Web server errors are written to the Application Server log file, server.log, default location /admin-server/logs/server.log (equates to the old web server errors.log file).

Database creation errors are written to server.log.

Cluster administration errors are written to:

Set the value of the require-monitor-data property to true in the loadbalancer.xml file in order to see monitoring details in the log.

The UnhealthyInstances messages that appear in the log should be particularly helpful in troubleshooting.

Setting a large tuple log size will increase performance of the logging facility.

The cladmin.log file may be useful in troubleshooting cluster administration.

Command-Line Interface Problems

This section discusses problems that you may encounter while using the command-line interface of the Application Server.

Can’t access the command-line utility.

After installing the Application Server software, you will need to configure your environment to include the bin directory of the Application Server if you are going to do any of the following:

Solution

Add the install_dir/bin directory to your PATH environment variable. If you are not familiar with the process of setting environment variables, refer to the post-installation instructions in the Sun ONE Application Server Installation Guide.

Can’t access the Application Server man pages.


Note	If your Admin Server is running under SSL, the --secure flag must be used.

For the Solaris unbundled version of the product, you will not be able to access the man pages until you add the install_dir/man to the MANPATH environment variable.

Solution

Graphical Interface Problems

This section discusses problems that you may encounter while using the Administration interface of the Application Server.



Note	If your Admin Server is running under SSL, https://... must be used for browser access.

Can’t access the Administration interface.

If the connection was refused when attempting to invoke the graphical Administration interface, it is likely that the Admin Server is not running.

Solution

Can’t undo accidental “changes.”

If an instance has been flagged for Apply Changes Required, and you decide NOT to make changes (perhaps the changes were a mistake and you want to forget the whole thing), there is no obvious method to unset the Apply Changes Required condition. Clicking Apply Changes seems to be forced at this point.

Shutting down your browser, restarting the Application Server instance, and so on. does NOT clear the Apply Changes flag. You are still prompted to apply the changes (since the backup configuration file is different from the current and applied configuration file).

Solution

Copy over the latest updated server.xml file from the backup before updates are applied. This should effectively turn off the Apply Changes Required flag.

Monitoring Problems

Load Balancer Plug-in isn’t being monitored

Logging for the load balancer plug-in is not automatically turned on. To turn on load balancer plug-in log messages:

Set the web server logging level to DEBUG.

Set the value of the require-monitor-data property to true. For example:



Tip	When logging is enabled on the load balancer plug-in, the load balancer writes HTTP session IDs in the web server log files. Therefore, if the web server hosting the load balancer plug-in is located in the DMZ, we recommend that you do not use the DEBUG or similar log level in production environments. If you must use the DEBUG logging level, then you should turn off load balancer logging by setting the require-monitor-data property to false in loadbalancer.xml file.

For more information, refer to the Configuring Load Balancer chapter of the Sun ONE Application Server Administrator’s Guide.

Authentication/Authorization Problems

Can’t import the certificate for my server.

Has the trust database been created?

If you haven't created the trust database in Sun ONE Application Server, you need to do that.

Solution

In the Security page of the Administration interface, click the Manage Database tab and create the trust database by entering its password.

Was the certificate generated with the right tool?

The app server supports NLS database only. So, certutil and openssl are compatible tools. You can’t use certificates generated by keytool directly on Appserver.

Solution

The server does not recognize my certificate.

First is the server certificate with which you will enable security in the server instance. This must be installed in the server as a Certificate for "This server.”

Second is the client certificate which you will install in the browser to authenticate yourself to the server when client-cert authentication is enabled.

Third is the server certificate chain which links the prior two certificates. This must be installed in the server instance as the certificate for "Server certificate chain.” If this certificate is not installed on the server instance, the instance doesn't know which client certificate to authenticate.

Solution

Verify that all the certificates have been implemented correctly. Be sure that you implement the chain in #3 and that the ROOT Certificate Authority (CA) is trusted.

LDAP authentication/equalization is not working.

In order for the Application Server to use an LDAP-based directory server for authentication and authorization, the security realm must be configured and the LDAP realm must be activated.

Solution

In the left pane of the Administration interface, expand the server1/Security/Realms/ldap tree.

In the right panel, verify that the Classname field contains the following information:

com.iplanet.ias.security.auth.realm.ldap.LDAPRealm

This class is the interface between the Application Server and the LDAP-based directory server.

Click Properties to display the pane for configuring specifics for the Directory Server implementation. Enter data similar to the following:

Name: directory Value: ldap://localhost:389

base-dn Value: dc=sun,dc=com

jaas-context Value ldapRealm

In the left pane of the Administration interface, expand the server1/Security hierarchy and change the Default Realm to ldap.

Apply Changes and Restart your instance as prompted.

HADB Administration Problems

In the Sun ONE Application Server 7, Enterprise Edition, the hadbm and its many subcommands and options is provided for administering the high-availability database (HADB). A summary of the hadbm commands in contained in "Summary of High Availability Commands".

Refer to the chapter on Configuring the High Availability Database in the Sun ONE Application Server Administrator’s Guide for a full explanation of this command. Specifics on the various hadbm subcommands are explained in the hadbm man pages.

hadbm command fails: host unreachable.

The host could be unreachable either because it is down, or because the communication pathway has not been established. To isolate the problem, consider the following:

Is the host up and running?

If the remote host isn’t running or can’t accept connections, attempts to access it will fail.

Solution

Try pinging the host to see if it is up and running, ready to accept communications:

Is RSH or SSH set up and running?

The communication pathway must be established before the hadbm command can succeed.

Solution

The hadbm commands will not work if host communication has not been set up. That is, the HADB nodes must have been configured for Remote Shell (RSH) or Secured Shell (SSH). Refer to “Preparing for HADB setup” in the Sun ONE Application Server Installation Guide for guidelines on verifying RSH and SSH.

If the verification does not work, remote communication for the cluster has not been set up correctly. Instructions for doing this are contained in the Setting Up Host Communication section of the Sun ONE Application Server Installation Guide.

Are the SSH binaries in the proper location?

Solution

hadbm command fails: command not found

The hadbm command can be run from the current directory or you can set the search PATH to access the hadb commands from anywhere, which is much more convenient. The error, “hadbm: Command not found”, indicates that neither of these conditions has been met.

Solution 1

You can cd to the directory that contains the hadbm command and run it from there:

Solution 2

You can use the hadbm command from anywhere by setting the PATH variable. Instructions for setting the PATH variable are contained in the Preparing for HADB Setup chapter of the Sun ONE Application Server Installation Guide.

hadbm command fails: JAVA_HOME not defined

The message “Error: JAVA_HOME is not defined correctly” indicates that the JAVA_HOME environment variable has not been set properly.

If multiple Java versions are installed on the system, you must ensure that the JAVA_HOME environment variable points to the correct Java version (1.4.1_03 for Enterprise Edition).

Instructions for setting the PATH variable are contained in the Preparing for HADB Setup chapter of the Sun ONE Application Server Installation Guide.

create fails: “path does not exist on a host”

After issuing the hadbm create command, an error like the following appears on the console:

./hadbm create ...
...
hadbm:Error 22022: Specified path does not exist on a host. Please specify a valid path: [ machineName ... ]

This error message indicates that the HADB server component is not installed on the machine on which you are trying to create the HA database.

Solution

Install the HADB server on the machine you are creating the HADB on and run the command again.

database doesn’t start.

Was there a shared memory get segment failure?

Solution 1: Use sync;sync and reboot instead of init 6

The hadbm create command can fail with this error occurs after making changes to /etc/system and doing a system reset with the init 6 command.

Instead of re-spinning the machine with init 6, do sync;sync as root user and then reboot.

Solution 2: Increase the amount of shared memory

There may not be as much shared memory as the HADB needs. The amount of shared memory required by HADB depends on parameters like DataBufferPoolSize, LogbufferSize, and other parameters. Look into the file /etc/system and set shmsys:shminfo_shmmax to the maximum value possible (the preferred value is 0xffffffff).

Verify that other shared memory settings are configured correctly. After making your changes, issue the hadbm stop command and reboot the machine.

For more information on the mechanics of configuring shared memory, consult the chapter, “Preparing for HADB Setup” in the Sun ONE Application Server Installation Guide. For guidelines on choosing the best settings, consult the Performance Tuning Guide.

Solution 3: Verify /etc/system settings

Verify the settings in the system file. Even a single mistyped character will create problems.

Solution 4: Resolve conflicts

Use ipcs to see if there are any shared memory segments or semaphores occupied unnecessarily by you or the other users. Use ipcrm to free them and then try starting the database.

Solution 5: Increase the number of semaphores

If the problem persists, then the operating system may not have enough shared memory or semaphores, etc. Increase them according to the number of nodes you have in the machine. (For details, see the Deployment Guide). Note that after making these changes, you must restart the machine to make them available.

Do the history files contain errors?

Note:
The hadbm utility cleans up all the files it created when hadbm create fails. In that case, the messages about the cause of the error are lost. But if the client machine has a historypath directory (default /var/tmp), then the history files are preserved there when the command fails.

If the historypath directory does not exist, you need to examine the syslogs on the hosts for error and warning messages from HADB. Messages are prefixed with “HADB” (the default syslogprefix value, which can be changed using the create command’s --set option).

Shared memory get segment failed
The system has not been set up with enough shared memory. (Discussed in the previous section.)

Could not verify my node address
Another process is using the port which one of the HADB servers processes. This can be resolved by stopping the other process, or by setting the PortBase attribute to another value using the command hadbm set portbase=<value>.

hadbm <command>' fails with internal error:
"The database could not be started”

Check the following:

RSH and SSH are set up correctly, and you can communicate with all the machines in the HADB configuration.

Shared memory is all correct on all machines in the HADB configuration.

No other HADB databases are running on the machines, or any other processes that could be using the same port numbers.

All necessary directories exist and have write permissions.

There is enough space in directory where devices are going to be written.

Once you’ve verified that none of the above errors have occurred, try the following remedies, in order:

Delete the database and retry.

Delete the database, reboot, and retry.

Delete database, reinstall the HADB software, and retry.

Contact Technical Support, as described in "Product Support".

Do you need a simple solution?

Solution 1: Delete the database

Issue the hadbm delete command, and see if that allows the hadbm create to proceed normally.

Solution 2: Reboot the machine.

Sometimes a system reboot is the necessary last resort. Issue hadbm delete, reboot, and then rerun the hadb create command.

clear command failed

When this command fails, the history files are likely to explain why. See "Do the history files contain errors?" for instructions on viewing the history files and a list of some common error messages.

create-session-store failed

Invalid user name or password

This error occurs when the --dbsystempassword you supplied to create-session-store command is not the same password as the one given at the time of database creation.

Solution 1

Solution 2

If you can't remember the dbsystem password, you’ll need to clear the database using hadbm clear and provide a new dbsystem system password.

SQLException: No suitable driver

The create-session-store produced the error: SessionStoreException: java.sql.SQLException: No suitable driver.

Solution 1

This error can occur when asadmin is not able to find hadbjdbc4.jar from the AS_HADB path defined in the asenv.conf in application server config directory.

To solve it, change the AS_HADB to point to the location of your HADB installation

Solution 2

This error can also occur if you provide the incorrect value for --storeUrl. To solve that problem, obtain the correct URL using hadbm get jdbcURL.

“No space left on device” appears in server.log

When the error message “No space left on device“ appears at regular intervals in the server.log, it can indicate that the HADB has run out of shared memory. To solve the problem, see "Solution 2: Increase the amount of shared memory".

On the other, if the message does not come and go intermittently, then you need to add a device or increase the size of existing devices.

Solution: Determine available space for user data

To determine the space available for user data, take 99% of the total device size, then subtract 4 times the LogBufferSize. The difference between the total device size and the free size is the user data size. If the data may be refragmented in the future, the user data size should not exceed 50% of the space available for user data. If refragmentation is not relevant, close to 100% may be used.

node status is “starting” after issuing stopnode.

After issuing the stopnode command, hadbm status --nodes shows the node as starting.

This situation occurs if you have set up inetd, because the node is automatically restarted when you stop a node. The result is that the node resumes the running state but is in the offline role.

Therefore, if you have set up inetd, and you want to make changes to the host which requires a reboot, then you need to perform some additional tasks to stop the node:

Solution

Comment out the inetd entry for that node from the inetd configuration files (or the node is automatically restarted as soon as you stop it).

Re-add the entry to the inetd files after you have restarted the node.

Restart inetd by sending a SIGHUP to the process. For example:

ps -e |grep inetd (to find PID)

kill -HUP <PID_inetd>

For additional information, refer to the HADB Setup chapter of the Sun ONE Application Server Administrator’s Guide.

Node failure occurred.

If hadbm status shows that a node has stopped, you should first follow the instructions in "Examine the history files" to see what caused the node failure. A common situation is a power failure on the machine where the node resides. (If a power failure hasn’t occurred, examine the logs to find the cause.)



Tip	You may want to restart a node if you notice strange behavior in a node (for example excessive CPU consumption) and want to check whether a restart cures the problem. Use the hadbm restartnode command to restart an HADB node.

Solution

For a spare node:

host1$hadbm startnode 2

For an active node:

host1$hadbm startnode -l=repair 2

Double node failure occurred.

A properly configured HADB installation will handle single node failures. Double node failures (two mirror nodes are down) are not handled—all persistent data are lost. In the case of power failure, data will be consistent and available, as long as there isn’t a double node failure.

When a double node failure occurs, the hadbm status command shows the database as “non-operational”.

Solution

Can’t restart the HADB after an ungraceful shutdown.

Situation: On a machine with a running HADB, the machine was shut down without first stopping the database. After restarting the machine, the database does not start. The node status shows that the nodes are in Starting or Recovering state. Even after stopping and then restarting each of the nodes, they remain in the Starting state. Eventually, the node status changes to Stopped.

In the case of a power failure, it is most likely that data is in an inconsistent state, that is, a record may have been in the process of being committed when the power failed.



Tip	If you notice strange HADB behavior (for example consistent timeout problems) and want to check whether a restart cures the problem, use the hadbm restart command. When you restart the HADB in that manner, data remains available. On the other hand, if you stop and start HADB in separate operations using hadbm stop and hadbm start, data is unavailable while HADB is stopped.

Solution

hadbm command doesn’t return control to user.

Many hadbm commands, in particular hadbm set, restart all the nodes of the database in order. If some problem has occurred, then the command may not return.

Solution 1

From another window/shell, look at the history files for all the nodes to see if an error has occurred or if the command is still in progress. Run hadbm status --nodes to see if all the nodes are up and running. If they are not and there appears to be a permanent failure, you will need to cancel the command, and then try running hadbm restart.

Solution 2

If Solution 1 fails, and your command was an attempt to set a configuration value for hadbm, try resetting it back to its old value and see if the database restarts correctly.

Solution 3

If clearing the database is unsuccessful, you’ll have to delete the database using hadbm delete, recreate it using hadbm create, and then recreate the session store using asadmin create-session-store.

Cluster Administration Problems

In the Sun ONE Application Server 7, Enterprise Edition, you can use the cladmin command to run the following asadmin commands simultaneously on all application server instances in a cluster: start-instance, stop-instance, deploy, undeploy, create-jdbc-resource, create-jdbc-connection-pool, configure-session-persistence, delete-jdbc-resource, delete-jdbc-connection-pool. This simplifies the task of cluster administration.

The cladmin command is located in the install_dir/bin directory. The default location of the cladmin input files, clinstance.conf and clpassword.conf, is /etc/opt/SUNWappserver7.

Refer to the chapter on Using the cladmin Command in the Sun ONE Application Server Administrator’s Guide for a full explanation of this command.

Refragmentation of the HADB fails.

Is there enough space on the data devices?

Messages like these indicate that refragmentation failed for lack of space on the data devices:

HIGH LOAD: about to run out of device space ...
HIGH LOAD: about to run out of device space on mirror node ...

The problem occurs when data devices are filled beyond 50% or 60% of the available space, which does not leave enough extra space to carry out the refragmentation. (To see how much space has been used on the machine, use the df command. To calculate the amount of space that can be used for user data, see "Solution: Determine available space for user data".)

Solution 1


Tip	Monitor your data device usage using the hadbm deviceinfo command.

If your history files are becoming too large, clear them using the hadbm clearhistory command. This is the simplest solution, so try it first. History files are located in /var/tmp.

Solution 2

Find out what disk the data devices are on with the hadbm get DevicePath command and check the for space on that disk. If there is room, increase the size of the data devices using the following command:

Solution 3

If your data devices are using more than 50-60% of capacity and you cannot increase the size of your device as suggested above, do one of the following:

If your machine has extra disks or has the possibility of adding additional disks, use hadbm stop and hadbm delete to stop and delete the database. Then create a new database with an increased number of devices.

Add the nodes without refragmenting using

hadbm addnodes --no-refragment

Then recreate the session store so that it is applied to the new nodes by following the instructions in "Clear the database and recreate session store".

Solution 4

The cladmin command is not working.

Are the Admin Servers of all the instances in the cluster started?

Before running the clsetup command, all the Admin Servers in the cluster must be running.

Do all the instances in the cluster have same administrator user name and password?

During installation, the installation program creates a clinstance.conf file with entries for two instances. If you add more instances to the cluster, you must add information about these instances in the clinstance.conf file.

Are the input files correct?

The order in which entries appear in the clinstance.conf file is important and must not be changed from the default order. If you add information about more application server instances, entries for these instances must in the correct order.

Solution

Verify that any changes you have made to the input files follow the format specified in the Sun ONE Application Server Administrator’s Guide.

Are the input files on all instances in the cluster identical?

The values in the input files must be identical on all instances in the cluster. The cladmin command is not designed to set up each instance with different values.

Solution

Verify that the cladmin input files are identical on all instances in the cluster.

Application is not available on the cluster.

Did the application deploy successfully to the cluster?

It’s possible that the deploy operation failed. To find out, run this command against each instance in the cluster.

Solution

If the application isn’t listed, try redeploying it and look for errors during deployment.

Common Administration and Recovery Actions

This section describes common administrative and recovery procedures that are used in a variety of situations.

Examine the history files

The history files are generally found at their default location, /var/tmp. If they are not at that location, use hadbm get HistoryPath to find the path to the history files.

The history file names are of the form <dbname>.out.<nodenumber>. The default database name is hadb, so for the default database name, the history file for node 0 would be hadb.out.0.

Maintain service while taking HADB offline

Any command that makes HADB unavailable (such as hadbm clear) causes the application servers to start reporting errors in the error log. Client requests will then take a long time to get handled as the application continues retrying its requests to HADB, which can’t answer because it is unavailable.

You can avoid this situation by disabling session persistence prior to clearing or stopping the database. This procedure takes time, but it lets the system maintain full service of your application(s) while HADB is down.

Disable session persistence by using cladmin to set availability-enabled to false for the cluster. (See the “Session Persistence” section of the Admin Guide for the details of this procedure.)

Restart all your instances using the following approach:

Disable half of the instances in your cluster (or as many as you can at a time to maintain the necessary level of service for your application) by marking them as disabled in the load balancer configuration file. (See the load balancing section of the Admin Guide for details

After the quiescence period has been reached, restart the disabled instances, and then re-enable them in the load balancer.

Repeat those steps for the next batch of instances until you have restarted all the instances

Once HADB is back up and running again, set availability-enabled to true and follow the restart process again.

Clear the database and recreate session store

Clearing the database, restarting it, and recreating the session store is always the quickest way to fix your database. All session data in HADB will be lost, but all session data will still be available because it exists in the application server cache. (The only exception is sessions that have been passivated. They will not be in the application server cache and thus will be lost when you clear HADB.)



Tip	If you need to keep servicing user requests, following the instructions in the previous section, “Maintain service while taking HADB offline”.



Tip	Avoid losing important data. For transient session data, losing the data is generally not an issue. The problem concerns losing passivated session state, that is, when a session no longer has any data in the Application Server cache and the state has been passivated to the HADB. Sessions are passivated when the maximum number of sessions exceeds the number specified in sun-web.xml file for each application. To avoid losing data, configure the application to reject any new session requests when the maximum number of sessions is met by setting the maximum number of sessions to a very high value and rejecting any sessions beyond that. This prevents passivation, thus avoiding the risk of losing session data if you need to clear the HADB.

Use this command to clear the database, reinitialize all data devices, and recreate all system tables:

hadbm clear --spares x --dbpassword=tttt smokedb

where x is the number of spares you originally had, and tttt is the database password.

This command clears the database—all your old data is lost.

Get the JDBC URL:

asadmin hadbm get jdbcURL smokedb

Recreate the appserver schemas and set up session persistence:

asadmin create-session-store
--storeurl <jdbc url returned from step 4>
--storeuser appservusr --storepassword <password>

When issuing this command, make sure the user and password used in create-session-store match the user and password specified in the JDBC connection pool for HADB.

For more information, see the Sun ONE Application Server Administrator’s Guide.

Chapter 5 Administration Problems

Server Logs

Application Server logs

HADB History Files

HADB logs

Command-Line Interface Problems

Can’t access the command-line utility.

Solution

Can’t access the Application Server man pages.

Solution

Graphical Interface Problems

Can’t access the Administration interface.

Solution

Can’t undo accidental “changes.”

Solution

Monitoring Problems

Load Balancer Plug-in isn’t being monitored

Authentication/Authorization Problems

Can’t import the certificate for my server.

Has the trust database been created?

Solution

Was the certificate generated with the right tool?

Solution

The server does not recognize my certificate.

Solution

LDAP authentication/equalization is not working.

Solution

HADB Administration Problems

hadbm command fails: host unreachable.

Is the host up and running?

Solution

Is RSH or SSH set up and running?

Solution

Are the SSH binaries in the proper location?

Solution

hadbm command fails: command not found

Solution 1

Solution 2

hadbm command fails: JAVA_HOME not defined

create fails: “path does not exist on a host”

Solution

database doesn’t start.

Was there a shared memory get segment failure?

Solution 1: Use sync;sync and reboot instead of init 6

Solution 2: Increase the amount of shared memory

Solution 3: Verify /etc/system settings

Solution 4: Resolve conflicts

Solution 5: Increase the number of semaphores

Do the history files contain errors?

Do you need a simple solution?

Solution 1: Delete the database

Solution 2: Reboot the machine.

clear command failed

create-session-store failed

Invalid user name or password

Solution 1

Solution 2

SQLException: No suitable driver

Solution 1

Solution 2

“No space left on device” appears in server.log

Solution: Determine available space for user data

node status is “starting” after issuing stopnode.

Solution

Node failure occurred.

Solution

Double node failure occurred.

Solution

Can’t restart the HADB after an ungraceful shutdown.

Solution

hadbm command doesn’t return control to user.

Solution 1

Solution 2

Solution 3

Cluster Administration Problems

Refragmentation of the HADB fails.

Is there enough space on the data devices?

Solution 1

Solution 2

Solution 3

Chapter 5
Administration Problems