![]() | |
Sun Java System Application Server 7 2004Q2 Update 1 Standard and Enterprise Edition Troubleshooting Guide |
Chapter 2
Installation and Uninstallation ProblemsThe high-availability components of the Sun Java System Application Server, Enterprise Edition include the HADB, the HADB Management Client, and the load balancer plug-in. During installation, these components can be installed with the rest of the Application Server components, or separately. The load balancer plug-in is usually installed separately from the Application Server components.
This chapter addresses problems that you may encounter while performing installation or uninstallation of the Sun Java System Application Server, Enterprise Edition or its components or plug-ins.
The following sections are contained in this chapter:
Install/Uninstall LogsThis section describes the log files that are relevant when installing and uninstalling the Sun ONE Application Server. For a description of the format and the kinds of messages that can appear in a log file, see Evaluate Messages and Examine Log Files.
The following Application Server logs can be useful for troubleshooting problems you may have with installation or uninstallation:
/var/sadm/install/logs/Sun_ONE_Application_Server_install.log
/var/sadm/install/logs/Sun_ONE_Application_Server_uninstall.logUse the following log for troubleshooting problems with the clsetup command:
/var/tmp/clsetup.logIn addition to these log files, low-level installation and uninstallation log files are created at these locations:
/var/sadm/install/logs/Sun_ONE_Application_Server_install.<timestamp>
/var/sadm/install/logs/Sun_ONE_Application_Server_uninstall.<timestamp>The following logs are associated with the high-availability components:
- Web server errors, including load balancer error messages, are written into the web server error.log.
- Application server messages, including deployment errors, are logged in the respective instance server's server.log file (the default location is /var/opt/SUNWappserver7/domains/domain1/server1/logs).
- Admin-server messages are logged in the admin-server's server.log file (the default location is /var/opt/SUNWappserver7/domains/domain1/admin-server/logs
- Database creation errors are written to
/var/sadm/install/logs/clsetup.log.- Initial cluster setup errors are written to
/var/tmp/clsetup.log- Cluster administration errors are written to
/var/tmp/cladmin.logSome guidelines on using the logs:
- Set the value of the require-monitor-data property to true in the loadbalancer.xml file in order to see monitoring details in the log.
- The UnhealthyInstances messages that appear in the log should be particularly helpful in troubleshooting load balancer problems.
- The cladmin.log file may be useful in troubleshooting cluster administration.
- The clsetup.log file may be helpful in finding out what went wrong during installation when you establish a new cluster.
- The HADB history files, described in Examining the HADB history files.
Can’t install remotely using the graphical interfaceOn UNIX, if you are installing the Application Server software remotely using the graphical interface, you must enable the display configuration on the machine where you are installing the product.
Solution
Set the DISPLAY environment variable to contain the name of the server and domain, using this format:
Then run the following command on the remote client:
xhost +
Setup failure during Linux installWhen an unsupported JDK is present, choosing “"upgrade JDK" or "Reuse existing JDK" can make the setup program fail while installing SUNWicu. Installing the RPM with rpm -ihv manually then gives a segmentation fault.
Solutution
Install the J2SE platform first, and then retry the Application Server installation.
Pre-existing JDK prevents installation, even after it has been removedThis error typically occurs on Solaris. The Application Server installer complains about a pre-existing JDK even though you removed the /usr/j2se directory. Installation does not proceed.
Explanation
The JDK can be installed in two different ways:
Whenthe JDK is installed as packages it always resides under /usr/j2se. It is then not enough to remove the directory. The packages have to be removed as well.
To find out if your system has any JDK packages, the Application Server executes the following commands as the root user:
pkginfo SUNWj3dev
pkginfo SUNWj3rt
pkginfo SUNWj3man
pkginfo SUNWj3dmoIf you receive a description of the queried package, this package is installed on your system.
Solution
Remove the JDK packages that installer checks for by executing the following commands as the root user:
pkgrm SUNWj3dev
pkgrm SUNWj3rt
pkgrm SUNWj3man
pkgrm SUNWj3dmo
Install or upgrade of J2SE failsThe following types of errors may occur if you attempt to upgrade your J2SE during installation:
Incompatible J2SE version---cannot upgrade
This message occurs when the Solaris J2SE packages do not reside on the machine where you are peforming the installation, or when the J2SE version is not greater than or equal to version 1.3 and less than version 1.4.1_03.
Note
The installation program can only upgrade a package-based J2SE installation, not a file-based J2SE installation.
Solution 1
Verify that the following packages are present using the pkginfo -i -l command:
SUNWj3rt
SUNWj3dev
SUNWj3man
SUNWj3dmoSolution 2
Fixing the Solaris packages or completely removing the Solaris packages ( if they are not used by any other application programs) using the pkgrm command.
You can then install the J2SE component using the installation program by selecting the Install Java 2 SDK (1.4.2_04) option in the Java Configuration panel.
Failure to install J2SE
The install log file reports that J2SE could not be installed. This error can occur when the J2SE directory (/usr/j2se, by default) is not writable by the user performing the installation.
Solution
Make the J2SE directory writable.
Can’t reinstall the serverIf installation and uninstallation are performed according to the documented instructions and they complete normally, you will be able to reinstall the server with no problems. However, if you have used another method to remove the Application Server files, or if there as been a failure during installation or uninstallation, the system might be in an inconsistent state, leaving behind files or processes specific to theApplication Server in the /var/sadm/install/productregistry file. These leftover files and processes will provoke an error message similar to the following on a subsequent installation:
You will need to clean up these files or processes before attempting a new installation.
Solution: Clean up leftover files and processes
- Log in as root.
- Navigate to your installation directory and check the content of the /var/sadm/install/productregistry file for installed packages, that is, files having the SUNW string. For example:
cat /var/sadm/install/productregistry | grep SUNW
- Run pkgrm for the SUNW packages that were found in the product registry. For example:
pkgrm SUNWasaco
- Remove the following files, if they are present:
/tmp/setupSDKNative
/tmp/SolarisNativeToolkit_3.0_1
- After the packages have been removed, use the prodreg registry editor to remove the Application Server-specific entries.
- At the command line, kill all appservd processes that may be running by typing the following:
ps -ef | grep appservd
pkill appservd
- Remove all remaining files under the Application Server installation directories. Refer to Conventions Referring to Directories for further information and bundled and unbundled structures.
- If present, remove the following log file:
/var/sadm/install/logs/Sun_ONE_Application_Server_install.log
(This step is helpful because the installation process appends information to this file, if it exists.)
- Remove the Application Server directories.
Silent installation is not working correctlyConsider the following:
Is the silent installation configuration file correct?
To run a silent installation, you must have created a silent installation configuration file by running a standard installation using the savestate option as described n the Sun Java System Application Server Installation Guide.
./setup -savestate
In tailoring the file for your silent installation, if you have introduced any errors in the configuration file, for example mistyping a variable name, the silent installation may not run.
Solution
Verify that the silent installation configuration file is correct and that you have not introduced any errors that may invalidate the file.
Uninstallation failure needs cleanupIf an uninstallation fails, you may need to clean up some leftover files or processes before attempting a new installation.
Solution
Follow the instructions in Solution: Clean up leftover files and processes.
Can’t install the load balancer plug-inConsider the following possibilities:
Is your web server installed?
Before you can install the load balancer plug-in, you must have the web server already installed (Sun ONE Web Server 6.0, SP6 and above, or Apache Web Server 1.3.27). The web server is not required for the other Enterprise Edition components, just for the load balancer plug-in.
Solution
Install the web server before installing the load balancer plug-in.
Is there a previously installed load balancer or reverse proxy plug-in on your system?
The Sun Java System Application Server 7, Enterprise Edition requires that any existing load balancer or reverse proxy plug-in that exists on your system be removed before installing the load balancer plug-in.
Solution
Remove the existing plug-in using the uninstallation program. On a clean system, the following message should display if you try to access the plug-in:
ERROR: information for “SUNWaspx” was not found.
Has the load balancer plug-in already been installed?
If the load balancer plug-in component is disabled or grayed out on the Component Selection page, the correct version is already installed.
Are the configuration files correct?
The installation program checks to see if the appropriate configuration files for the load balancer plug-in are found in the location you specify.
For the Web Server plug-in, the following files are searched:
<install_dir>/config/magnus.conf
<install_dir>/config/obj.confFor the Apache Web Server plug-in, this file is searched:
<install_dir>/conf/httpd.conf
Solution
Specify the correction location.
Shared memory creation failedThis error occurs while running hadbm create or clsetup (which calls hadbm create). When the HADB server processes are booted for the first time on each machine in the HADB configuration, they create the shared memory segments which constitute the database.
The typical message in this case is:
Failed to create shared memory
This message indicates that the hadbm create command could not allocate the shared memory to the database segments.
If you see this error in the history file, consider the following:
Have you configured shared memory?
Shared memory must be configured for the HADB host machines before you can work with the HADB.
Solution
Configure shared memory by following the instructions in the Configuring Shared Memory and Semaphores section in the Preparing for HADB Setup chapter of the Sun Java System Application Server Installation Guide. For detailed information on the settings to use, consult the Performance and Tuning Guide, “Tuning for High Availability:Tuning HADB:Operating System Configuration”.
Is there an error in your /etc/system file?
You may have made a mistake or a typing error when you configured shared memory for the HADB.
Solution
Verify that you have followed the instructions in the Configuring Shared Memory and Semaphores section in the Preparing for HADB Setup chapter of the Sun Java System Application Server Installation Guide. Correct any typing error.
Did you reboot the machine after configuring shared memory?
The shared memory changes in the /etc/system file will not take affect until you have rebooted the machine.
Solution
Reboot the machine.
Are there any other processes consuming shared memory?
This situation can occur when there are too many AS instances and HADB nodes on the same machine.
Solution
Move some of the processes to another machine.
Are old AS or HADB installations occupying shared memory and semaphores?
Solution
Use the ipcs command to check the shared memory. If you find that the shared memory segments or semaphores are occupied unnecessarily, release them using ipcrm -s for semaphores and ipcrm -m for shared memory (in Unix systems).
clsetup is not workingThe clsetup command is used to automate the process of setting up a cluster. After the Sun Java System Application Server 7, Enterprise Edition software and high-availability components are installed, this script uses three input files to set up a basic cluster. The most likely problems are errors in the input files (if they have been edited) and clsetup requirements not being met.
Consider the following possibilities:
Was a previous clsetup terminated prematurely?
During clsetup , the HADB is created (a process that takes time and requires a bit of patience). Terminating clsetup during that process can leave HADB in an indeterminate state. This situation can produce a variety of a errors, including a SessionStoreException when creating database tables.
Solution
Ddelete HADB and rerun clsetup.
Have you configured shared memory?
Shared memory must be set up before you can use the clsetup command. See Have you configured shared memory?.
Has remote communication been set up correctly?
RSH or SSH must be set up before the clsetup command can be run.
To verify that remote communication has been established, rsh to each host in the cluster. The identity should be returned from the remote host. For example:
rsh computer99.zmtn.company.com uname -a
Instructions for setting up host communications are contained in the Preparing for HADB Setup chapter of the Sun Java System Application Server Installation Guide.
Solution
If the verification does not work, remote communication for the cluster has not been set up correctly. make sure that if you are using SSH Make sure that the scp and ssh binaries or that softlinks to them exist in /usr/bin. For further instructions, see the Setting Up Host Communication section of the Sun Java System Application Server Installation Guide.
Under SSH, are the HADB and the Application Server co-located on the same machine?
If you are co-locating the HADB and the Application Server on the same machine using SSH, a known_hosts file must exist under the /.ssh directory. That file is necessary for the HADB management client to communicate with the HADB nodes. Since hadbm does not permit use of localhost, you must use the acutal host name, instead.
Solution
If the known_hosts file is not under the /.ssh directory, run the ssh hostname command and answer yes to the prompt.
Are the application server and HADB installed in the same directories on each machine?
The clsetup program can not work when the files are installed in different directories on different machines.
Solution:
Reinstall the Application Server and HADB in the same directories on each machine."
Are all the Admin Servers on the application server instances in the cluster running?
Before running the clsetup command, all the Admin Servers in the cluster must be running.
Are the input files on all instances in the cluster identical?
The clsetup command is not designed to set up each instance with different values. For example, this command cannot create a JDBC connection with different settings for each instance.
Solution
Verify that the input files are identical on all instances in the cluster.
HADB database creation failsThe error occurs when using clsetup to start the database. The typical message in this case is:
failed to start database : HADB Database creation failed
To determine the cause of the problem, inspect the /var/tmp/clsetup.log file. Some possible errors are:
No available memory
Insufficient memory is available to create the database.
Solution 1
This problem can occur when changes are made to /etc/system and the init 6 command is given to reset the system. The following error message occurs in the database log file:
System aborted with message:
'Could not create shared DictCache segment'
...Shared memory get segment failed'To avoid this problem, do sync;sync as root user and then do reboot instead of init 6.
Solution 2
This error can occur when insufficient swap space has been allocated. Review the documentation on shared memory requirements in the Preparing for HADB Setup chapter of the Sun Java System Application Server Installation Guide.
Specified hosts are not reachable
This could happen when you run clsetup to configure the cluster. You might see errors similar to the following in the log file:
CREATING HADB DATABASE...
/opt/SUNWhadb/4.2.2-17/bin/hadbm create --installpath=/opt/SUNWhadb/4.2.2-17 --configpath=/etc/opt/SUNWhadb/dbdef --historypath=/var/tmp --devicepath=/opt/SUNWhadb/4 --datadevices=1 --portbase=15200 --spares=0 --inetd=false --inetdsetupdir=/tmp --devicesize=512 --dbpassword=password --hosts=eas-v880-1,eas-v880-1 hadb
hadbm:Error 22024: Specified hosts are not reachable: [ eas-v880-1 ]
HADB Database creation failed.
Solution
See hadbm command fails: host unreachable..
Too few semaphores
The history file contains the following entry:
No space left on device
This can be caused when the number of semaphores is too low. Since the semaphores are provided as a global resource by the operating system, the configuration depends on all processes running on the host, not only the HADB. This can occur either while starting the HADB, or during runtime.
Solution
Configure the semaphore settings by editing the /etc/system file. Instructions and guidelines are contained in the Configuring Shared Memory and Semaphores section of the Preparing for HADB Setup chapter of the Sun Java System Application Server Installation Guide.
hadbm create Failshadbm create fails with the folowing error message:
Node-x NSUP timestamp HADB-S-00240: Illegal node number
The likely cause for this error is that another process is occupying the port that the NSUP process on node X is trying to open.
To resolve this issue, check for processes running on this machine which may have taken the port that is required by the NSUP process.
HADB Database Nodes Cannot be ReachedIf the HADB database nodes cannot be reached and the database does not function, check whether dynamic IP addresses (DHCP) are used for hosts used in hadbm createdomain (or in other hadbm commands).
Hosts using DHCP are not supported by HADB.
HADB Creation Failures on Windows platformThe following issues might show up when running HADB 4.4 on Windows.
The unexpected behavior could be node restarts, network partitions or reconnects with messages "Network Partition: *** Reconnect detected ***", written in HADB history files as well as on the HADB host terminals.
In such cases, messages from nodes belonging to one database instance could be delivered to nodes belonging to the other database instance. This will lead to different problems, e.g. false network partitions and reconnects of partitions detected.
Solution: If management domains share HADB hosts, ensure that the nodes on the common host do not use the same port number.
hadbm create will give error when using a host with both single and multiple nets.
Scenario: A host has multiple network interfaces. The user issues commands, hadbm create/hadbm addnodes.
Solution: If a host has multiple network interfaces, specify the network interface to be used by HADB when issuing the commands, hadbm create/hadbm addnodes. If the hostname is used, the first interface registered on the host will be used, and there is no guarantee that the HADB nodes will be able to communicate.
Problems when running clsetup as non-rootIf you want to run the clsetup command as a user other than root, you’ll need to set up administration for non-root.
Solution
Follow the instructions in the Setting Up Administration for Non-Root section in the Sun Java System Application Server Installation Guide.
Can’t test the ssh setting as rootIn trying to test the SSH setting using the following command:
# ssh hostname date
the console prompts for the root password:
# root@hostname's password:
In Solaris 9, when you are using Sun verison of SSH software and running the HADB admin clients as root, the sshd configuration (/etc/sshd_config) on all machines in the cluster must have PermitRootLogin set to yes. Sun SSH does not permit root login by default; it is set to no.
Solution
Can’t get ssh to skip the login promptAn error similar to the following occurs, suggesting that the sshd server is not running on the destination machine:
Secure connection to vortex-dev1 refused; reverting to insecure method.
Using rsh. WARNING: Connection will not be encrypted.
Password:You can set up your local environment to use the HADB commands from anywhere by setting the PATH variable after you have implemented SSH. You should not have to log in.
Solution
- Verify that the SSH server is running by issuing the following command on the server machine:
ps -e |grep sshd
- If the SSH server is not running, start it as follow:
/etc/init.d/sshd start
- Check the ~<ssh-user>/.ssh/authorized_keys file on each destination machine to ensure that all the public keys from all the machines are listed in that file.
- For both the users home directory (~<ssh-user>) and the .ssh subdirectory, ensure that write permission is not granted for other or for group
For further information on setting up host communications for the HADB, refer to the Preparing for HADB Setup chapter of the Sun Java System Application Server Installation Guide.
Error configuring JMS Physical DestinationsThe following error message occurs when you attempt to configure Physical Destinations for a JMS Service node:
[C4003]: Error occured on connection creation. - caught java.net.ConnectException JMSService not available.
This error means that the app server instance could not connect to the JMS Service. Possible reasons for connection failure include :