Sun ONE Application Server 7, Enterprise Edition Troubleshooting Guide |
Chapter 2
Installation and Uninstallation ProblemsThe high-availability components of Sun Open Net Environment (ONE) Application Server 7, Enterprise Edition product include the HADB, the HADB Management Client, and the load balancer plug-in. During installation, these components can be installed with the rest of the Application Server components, or separately. The load balancer plug-in is usually installed separately from the Application Server components.
This chapter addresses problems that you may encounter while performing installation or uninstallation of the Sun ONE Application Server 7, Enterprise Edition product or its components or plug-ins.
The following sections are contained in this chapter:
Install/Uninstall LogsThe following Application Server logs can be useful for troubleshooting problems you may have with installation or uninstallation:
/var/sadm/install/logs/Sun_ONE_Application_Server_install.log
/var/sadm/install/logs/Sun_ONE_Application_Server_uninstall.logUse the following log for troubleshooting problems with the clsetup command:
/var/tmp/clsetup.logIn addition to these log files, low-level installation and uninstallation log files are created at these locations:
/var/sadm/install/logs/Sun_ONE_Application_Server_install.<timestamp>
/var/sadm/install/logs/Sun_ONE_Application_Server_uninstall.<timestamp>The following logs are associated with the high-availability components:
- Web server errors, including load balancer error messages, are written into the web server error.log.
- Application server messages are logged in the respective instance server's server.log file (the default location is /var/opt/SUNWappserver7/domains/domain1/server1/logs). This log includes admin server messages and deployment errors.
- Admin-server messages, including application deployment messages, are logged in the admin-server's server.log file (the default location is /var/opt/SUNWappserver7/domains/domain1/admin-server/logs
- Database creation errors are written to
/var/sadm/install/logs/clsetup.log.- Initial cluster setup errors are written to
/var/tmp/clsetup.log- Cluster administration errors are written to
/var/tmp/cladmin.logSome guidelines on using logs:
- Set the value of the require-monitor-data property to true in the loadbalancer.xml file in order to see monitoring details in the log.
- The UnhealthyInstances messages that appear in the log should be particularly helpful in troubleshooting load balancer problems.
- Using hadbm to set a large tuple log size will increase performance of the logging facility.
- The cladmin.log file may be useful in troubleshooting cluster administration.
- The clsetup.log file may be helpful in finding out what went wrong during installation when you establish a new cluster.
Device directory location: /var/opt/SUNWhadb
Configuration files location: /etc/opt/SUNWhadb/dbdef
Can’t install remotely using the graphical interface.On UNIX, if you are installing the Application Server software remotely using the graphical interface, you must enable the display configuration on the machine where you are installing the product.
Solution
Set the DISPLAY environment variable to contain the name of the server and domain, using this format:
Then run the following command on the remote client:
xhost +
Can’t reinstall the server.If installation and uninstallation are performed according to the documented instructions and they complete normally, you will be able to reinstall the server with no problems. However, if you have used another method to remove the Application Server files, or if there as been a failure during installation or uninstallation, the system might be in an inconsistent state, leaving behind files or processes specific to theApplication Server in the /var/sadm/install/productregistry file. These leftover files and processes will provoke an error message similar to the following on a subsequent installation:
You will need to clean up these files or processes before attempting a new installation.
Solution: Clean up leftover files and processes
- Log in as root.
- Navigate to your installation directory and check the content of the /var/sadm/install/productregistry file for installed packages, that is, files having the SUNW string. For example:
cat /var/sadm/install/productregistry | grep SUNW
- Run pkgrm for the SUNW packages that were found in the product registry. For example:
pkgrm SUNWasaco
- Remove the following files, if they are present:
/tmp/setupSDKNative
/tmp/SolarisNativeToolkit_3.0_1
- After the packages have been removed, use the prodreg registry editor to remove the Application Server-specific entries.
- At the command line, kill all appservd processes that may be running by typing the following:
ps -ef | grep appservd
pkill appservd
- Remove all remaining files under the Sun ONE Application Server installation directories. Refer to "Conventions Referring to Directories" for further information and bundled and unbundled structures.
- Remove the Sun ONE Application Server 7 directories.
Silent installation is not working correctly.Consider the following:
Is the silent installation configuration file correct?
To run a silent installation, you must have created a silent installation configuration file by running a standard installation using the savestate option as described n the Sun ONE Application Server Installation Guide.
./setup -savestate
In tailoring the file for your silent installation, if you have introduced any errors in the configuration file, for example mistyping a variable name, the silent installation may not run.
Solution
Verify that the silent installation configuration file is correct and that you have not introduced any errors that may invalidate the file.
Uninstallation failure needs cleanup.If an uninstallation fails, you may need to clean up some leftover files or processes before attempting a new installation.
Solution
Follow the instructions in "Solution: Clean up leftover files and processes".
Can’t install the load balancer plug-in.Consider the following possibilities:
Is your web server installed?
Before you can install the load balancer plug-in, you must have the web server already installed (Sun ONE Web Server 6.0, SP6 or Apache Web Server 1.3.27). The web server is not required for the other Enterprise Edition components, just for the load balancer plug-in.
Solution
Install the web server before installing the load balancer plug-in.
Is there a previously installed load balancer or reverse proxy plug-in on your system?
The Sun ONE Application Server 7, Enterprise Edition requires that any existing load balancer or revers proxy plug-in that exist on your system be removed before installing the load balancer plug-in.
Solution
Remove the existing plug-in using the uninstallation program. On a clean system, the following message should display if you try to access the plug-in:
ERROR: information for “SUNWaspx” was not found.
Has the load balancer plug-in already been installed?
If the load balancer plug-in component is disabled or grayed out on the Component Selection page, the correct version is already installed.
Are the configuration files correct?
The installation program checks to see if the appropriate configuration files for the load balancer plug-in are found in the location you specify.
For the Sun ONE Web Server plug-in, the following files are searched:
<install_dir>/config/magnus.conf
<install_dir>/config/obj.confFor the Apache Web Server plug-in, this file is searched:
<install_dir>/conf/httpd.conf
Solution
Specify the correction location.
Load balancer won’t start.A message similar to the following might appear in the load balancer log file when you try to start the load balancer:
lb.configurator: CNFG1008 : Multiple instances wit h the same name : are not allowed for the cluster cluster1
The most likely problem is that the load balancer configuration file, loadbalancer.xml, is not configured correctly.
Solution
Verify your loadbalancer.xml file and make sure that the instance name is unique.
Shared memory creation failed.This error occurs while running hadbm create or clsetup (which calls hadbm create). When the HADB server processes are booted for the first time on each machine in the HADB configuration, they create the shared memory segments which constitute the database.
The typical message in this case is:
Failed to create shared memory
This message indicates that the hadbm create command could not allocate the shared memory to the database segments.
If you see this error in the history file, consider the following:
Have you configured shared memory?
Shared memory must be configured for the HADB host machines before you can work with the HADB.
Solution
Configure shared memory by following the instructions in the Configuring Shared Memory and Semaphores section in the Preparing for HADB Setup chapter of the Sun ONE Application Server Installation Guide.
Is there an error in your /etc/system file?
You may have made a mistake or a typing error when you configured shared memory for the HADB.
Solution
Verify that you have followed the instructions in the Configuring Shared Memory and Semaphores section in the Preparing for HADB Setup chapter of the Sun ONE Application Server Installation Guide. Correct any typing error.
Did you reboot the machine after configuring shared memory?
The shared memory changes in the /etc/system file will not take affect until you have rebooted the machine.
Solution
Reboot the machine.
clsetup is not working.The clsetup command is used to automate the process of setting up a cluster in a typical single-machine configuration. After the Sun ONE Application Server 7, Enterprise Edition software and high-availability components are installed, this script uses three input files to set up a basic cluster. The most likely problems are errors in the input files (if they have been edited) and clsetup requirements not being met.
Consider the following possibilities:
Have you configured shared memory?
Shared memory must be set up before you can use the clsetup command. Instructions for setting up shared memory are contained in the Sun ONE Application Server Installation Guide.
Has remote communication been set up correctly?
RSH or SSH must be set up before the clsetup command can be run.
To verify that remote communication has been established, rsh to each host in the cluster. The identity should be returned from the remote host. For example:
rsh computer99.zmtn.company.com uname -a
Instructions for setting up host communications are contained in the Preparing for HADB Setup chapter of the Sun ONE Application Server Installation Guide.
Solution
If the verification does not work, remote communication for the cluster has not been set up correctly. Instructions for doing this are contained in the Setting Up Host Communication section of the Sun ONE Application Server Installation Guide.
Under SSH, are the HADB and the Application Server co-located on the same machine?
If you are co-locating the HADB and the Application Server on the same machine using SSH, a known_hosts file must exist under the .ssh directory. That file is necessary for the nodes to communicate with each other, so the hadb cluster functions properly
Solution
If the known_hosts file is not under the .ssh directory, run either the ssh localhost command or the ssh hostname command before using the clsetup command.
Are the application server and HADB installed in the same directories on each machine?
The clsetup program can not work when the files are installed in different directories on different machines.
Solution:
Reinstall the Sun ONE Application Server and HADB in the same directories on each machine."
Are all the Admin Servers on the application server instances in the cluster running?
Before running the clsetup command, all the Admin Servers in the cluster must be running.
Are the input files on all instances in the cluster identical?
The clsetup command is not designed to set up each instance with different values. For example, this command cannot create a JDBC connection with different settings for each instance.
Solution
Verify that the input files are identical on all instances in the cluster.
Unable to create HADB database using clsetup.This could happen when you run clsetup to configure the cluster. You might see errors similar to the following in the /var/tmp/clsetup.log file:
CREATING HADB DATABASE...
/opt/SUNWhadb/4.2.2-17/bin/hadbm create --installpath=/opt/SUNWhadb/4.2.2-17 --configpath=/etc/opt/SUNWhadb/dbdef --historypath=/var/tmp --devicepath=/opt/SUNWhadb/4 --datadevices=1 --portbase=15200 --spares=0 --inetd=false --inetdsetupdir=/tmp --devicesize=512 --dbpassword=password --hosts=eas-v880-1,eas-v880-1 hadb
hadbm:Error 22024: Specified hosts are not reachable: [ eas-v880-1 ]
HADB Database creation failed.
Solution
Make sure your communication protocol (RSH/SSH) is configured properly before running the clsetup command. If you plan to use RSH for your communication, make sure you uncomment the following line in the clresource.conf file before running the clsetup command:
set managementProtocol=rsh
If you are using SSH, make sure you closely follow all the SSH configuration steps contained in the Sun ONE Application Server Installation Guide.
Problems when running clsetup as non-root.If you want to run the clsetup command as a user other than root, you’ll need to set up administration for non-root.
Solution
Follow the instructions in the Setting Up Administration for Non-Root section in the Sun ONE Application Server Installation Guide.
Insufficient space.Consider the following possibilities:
Is the number of semaphores too low?
The typical message in this case is:
failed to start database : HADB Database creation failed
The history file then contains the following entry:
No space left on device
This can be caused when the number of semaphores is too low. Since the semaphores are provided as a global resource by the operating system, the configuration depends on all processes running on the host, not only the HADB. This can occur either while starting the HADB, or during runtime.
Solution
Configure the semaphore settings by editing the /etc/system file. Instructions and guidelines are contained in the Configuring Shared Memory and Semaphores section of the Preparing for HADB Setup chapter of the Sun ONE Application Server Installation Guide.
Can’t test the ssh setting as root.In trying to test the SSH setting using the following command:
# ssh hostname date
the console prompts for the root password:
# root@hostname's password:
When running the HADB admin clients as root, the sshd configuration (/etc/sshd_config) on all machines in the cluster must have PermitRootLogin set to yes. Sun SSH does not permit root login by default; it is set to no.
Solution
Can’t get ssh to skip the login prompt.An error similar to the following occurs, suggesting that the sshd server is not running on the destination machine:
Secure connection to vortex-dev1 refused; reverting to insecure method.
Using rsh. WARNING: Connection will not be encrypted.
Password:You can set up your local environment to use the HADB commands from anywhere by setting the PATH variable after you have implemented SSH. You should not have to log in.
Solution
- Verify that the SSH server is running by issuing the following command on the server machine:
ps -e |grep sshd
- If the SSH server is not running, start it as follow:
/etc/init.d/sshd start
- Check the ~<ssh-user>/.ssh/authorized_keys file on each destination machine to ensure that all the public keys from all the machines are listed in that file.
- For both the users home directory (~<ssh-user>) and the .ssh subdirectory, ensure that write permission is not granted for other or for group
For further information on setting up host communications for the HADB, refer to the Preparing for HADB Setup chapter of the Sun ONE Application Server Installation Guide.