Sun GlassFish Enterprise Server 2.1 Release Notes

High Availability

This section describes known high availability database (HADB) issues and associated solutions.

Load balancer plugin healthcheck generates a large number of connection/disconnection at the background (load) (6453946)

Description

Load balancer plug-in healthcheck generates a large number of connection/disconnection at the background (load). For health check purposes, a runDaemonMonitor thread performs connect/disconnect for every Application Server listener. This can lead to connection saturation on Enterprise Server.

Solution

A new attribute, monitor-interval-in-seconds, has been developed for the loadbalancer.xml file. This attribute can be used to insert a pause between connect/disconnect events in the case where hundreds of listeners are configure for the load balancer plug-in. Default pause value is 0.

HADB Configuration with Double Networks (no ID)

HADB configured with double networks on two subnets works properly on Solaris SPARC. However, due to problems in the operating system or network drivers on some hardware platforms, it has been observed that Solaris x86 and Linux platforms do not always handle double networks properly. This causes the following problems with HADB:

HADB Database Creation Fails (no ID)

Description

Creating a new database may fail with the following error, stating that too few shared memory segments are available:

HADB-E-21054: System resource is unavailable: HADB-S-05512: Attaching shared memory segment with key "xxxxx" failed, OS status=24 OS error message: Too many open files.

Solution

Verify that shared memory is configured and the configuration is working. In particular, on Solaris 8, inspect the file /etc/system, and check that the value of the variable shmsys:shminfo_shmseg is at least six times the number of nodes per host.

hadbm set does not check resource availability (disk and memory space) (5091280)

Description

When increasing device or buffer sizes using hadbm set, the management system checks resource availability when creating databases or adding nodes, but does not check if there are sufficient resources available when device or main-memory buffer sizes are changed.

Solution

Verify that there is enough free disk/memory space on all hosts before increasing any of the devicesize or buffersize configuration attributes.

Heterogeneous paths for packagepath not supported (5091349)

Description

It is not possible to register the same software package with the same name with different locations at different hosts; for example:


hadbm registerpackage test --packagepath=/var/install1 --hosts europa11
Package successfully registered.
hadbm registerpackage test --packagepath=/var/install2 --hosts europa12
hadbm:Error 22171: A software package has already been registered with 
the package name test.

Solution

HADB does not support heterogeneous paths across nodes in a database cluster. Make sure that the HADB server installation directory (--packagepath) is the same across all participating hosts.

hadbm createdomain may fail (6173886, 6253132)

Description

If running the management agent on a host with multiple network interfaces, the createdomain command may fail if not all network interfaces are on the same subnet:


hadbm:Error 22020: The management agents could not establish a 
domain, please check that the hosts can communicate with UDP multicast.

The management agents will (if not configured otherwise) use the "first" interface for UDP multicasts ("first" as defined by the result from java.net.NetworkInterface.getNetworkInterfaces()).

Solution

The best solution is to tell the management agent which subnet to use (set ma.server.mainternal.interfaces in the configuration file, e.g., ma.server.mainternal.interfaces=10.11.100.0). Alternatively one may configure the router between the subnets to route multicast packets (the management agent uses multicast address 228.8.8.8).

Before retrying with a new configuration of the management agents, you may have to clean up the management agent repository. Stop all agents in the domain, and delete all files and directories in the repository directory (identified by repository.dr.path in the management agent configuration file). This must be done on all hosts before restarting the agents with a new configuration file.

Starting, stopping, and reconfiguring HADB may fail or hang (6230792, 6230415)

Description

On Solaris 10 Opteron, starting, stopping or reconfiguring HADB using the hadbm command may fail or hang with one of the following errors:


hadbm:Error 22009: The command issued had no progress in the last 
300 seconds.
HADB-E-21070: The operation did not complete within the time limit, 
but has not been cancelled and may complete at a later time.

This may happen if there are inconsistencies reading/writing to a file (nomandevice) which the clu_noman_srv process uses. This problem can be detected by looking for the following messages in the HADB history files:


n:3 NSUP INF 2005-02-11 18:00:33.844 p:731 Child process noman3 733 
does not respond.
n:3 NSUP INF 2005-02-11 18:00:33.844 p:731 Have not heard from it in 
104.537454 sec.
n:3 NSUP INF 2005-02-11 18:00:33.844 p:731 Child process noman3 733 
did not start.

Solution

The following workaround is unverified, as the problem has not been reproduced manually. However, running this command for the affected node should solve the problem.


hadbm restartnode --level=clear nodeno dbname

Note that all devices for the node will be reinitialized. You may have to stop the node before reinitializing it.

The management agent terminates with the exception "IPV6_MULTICAST_IF failed" (6232140)

Description

When starting on a host running Solaris 8 with several NIC cards installed, if there is a mixture of cards with IPv6 and IPv4 enabled, the management agent may terminate with the exception "IPV6_MULTICAST_IF failed."

Solution

Set the environment variable JAVA_OPTIONS to -Djava.net.preferIPv4Stack=true; for example:


export JAVA_OPTIONS="-Djava.net.preferIPv4Stack=true"

Alternatively, use Solaris 9 or later, which do not exhibit this problem.

clu_trans_srv cannot be interrupted (6249685)

Description

There is a bug in the 64-bit version of Red Hat Enterprise Linux 3.0 that makes the clu_trans_srv process end up in an uninterruptible mode when performing asynchronous I/O. This means that kill -9 does not work and the operating system must be rebooted.

Solution

Use a 32-bit version of Red Hat Enterprise Linux 3.0.

hadbm does not support passwords containing capital letters (6262824)

Description

Capital letters in passwords are converted to lowercase when the password is stored in hadb.

Solution

Do not use passwords containing capital letters.

Downgrading from HADB Version 4.4.2.5 to HADB Version 4.4.1.7 causes ma to fail with different error codes (6265419)

Description

When downgrading to a previous HADB version, the management agent may fail with different error codes.

Solution

It is possible to downgrade the HADB database, however the management agent cannot be downgraded if there changes have been made in the repository objects. After a downgrade, you must keep use the management agent from the latest HADB version.

Install/removal and symlink preservation (6271063)

Description

Regarding install/removal of HADB c package (Solaris: SUNWhadbc, Linux: sun-hadb-c) version <m.n.u-p>, the symlink /opt/SUNWhadb/<m> is never touched once it exists. Thus, it is possible that an orphaned symlink will exist.

Solution

Delete the symlink before install or after uninstall unless in use.

Management agents in global and local zones may interfere (6273681)

Description

On Solaris 10, stopping a management agent by using the ma-initd script in a global zone stops the management agent in the local zone as well.

Solution

Do not install the management agent both in the global and local zone.

hadbm/ma should give a better error message when a session object has timed out and deleted at MA (6275103)

Description

Sometimes, a resource contention problem on the server may cause a management client to become disconnected, When reconnecting, a misleading error message "hadbm:Error 22184: A password is required to connect to the management agent" may be returned.

Solution

Sometimes, a resource contention problem on the server may cause a management client to become disconnected, When reconnecting, a misleading error message "hadbm:Error 22184: A password is required to connect to the management agent" may be returned.

Check if there is a resource problem on the server, take proper action (e.g., add more resources), and retry the operation.

Non-root users cannot manage HADB (6275319)

Description

Installing with Java Enterprise System (as root) does not permit non-root users to manage HADB.

Solution

Always login as root to manage HADB.

The Management Agent should not use special-use interfaces (6293912)

Description

Special use interfaces with IP addresses like 0.0.0.0 should not be registered as valid interfaces to be used for HADB nodes in the Management Agent. Registering such interfaces may cause problems if HADB nodes are set up on these interfaces by means of a user issuing a hadbm create command using host names instead of IP addresses. The nodes will then be unable to communicate, causing the create command to hang.

Solution

When using hadbm create on hosts with multiple interfaces, always specify the IP addresses explicitly using DDN notation.

Reassembly failures on Windows (6291562)

Description

On the Windows platform, with certain configurations and loads, there may be a large number of reassembly failures in the operating system. The problem has been seen with configurations of more than twenty nodes when running several table scans (select *) in parallel. The symptoms may be that transactions abort frequently, repair or recovery may take a long time to complete, and there may be frequent timeouts in various parts of the system.

Solution

To fix the problem, the Windows registry variable HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters can be set to a value higher than the default 100. It is recommended that you increase this value to 0x1000 (4096). For more information, see. article 811003 from the Microsoft support pages.

Session state not maintained if the browser has another cookie with / path (6553415)

Description

Cookies with a path equal to “/” interfere with the cookies of a highly available web application deployed at a context root other than “/” that uses in-memory replication as its persistence type, making it impossible for the highly available web application to maintain any HTTP session state. One common scenario where this may happen is when using the same browser to access both the Admin GUI (which is deployed at “/”) and the highly available web application.

Solution

Access the web application deployed at “/” from a different browser.

LB does not work with IIS 6; SASL32.DLL and ZLIB.DLL missing under as-install/lib (6572184)

Description

SASL32.DLL and ZLIB.DLL are required files for Load Balancer to work with Windows IIS 6. These files are currently not available under as-install/lib.

Solution

Copy the two DLL files manually to as-install/lib. These files can be downloaded from:


http://download.java.net/javaee5/external/OS/aslb/jars/

Where OS represents the desired platform, and can be one of the following values:

DAS creation/startup and HA package propagation issues in Global Zone (6573511)

Description

Two issues arise when installing or uninstalling Enterprise Server with High Availability packages in a Global Zone:

  1. HA packages get installed in all zones, which may not be desirable.

  2. When uninstalling, HA, MQ, JDK packages get removed from all zones, which may not be desirable.

This problem does not occur when installing or uninstalling from a root local zone.

Solution

Perform installation and uninstallations from a local root zone rather than a global zone.

Highly available webapps deployed at “/” unable to resume in-memory replicated HTTP sessions (Issue Tracker 2972)

Description

Highly available web applications deployed at “/” are unable to maintain any HTTP sessions when using in-memory replication as their persistence type.

Solution

Deploy highly available web applications that use in-memory replication as their persistence type to a context root other than “/”. If you want to make such a web application available at “/”, you may designate it as the default-web-module of the virtual server to which the web application has been deployed.

AS LB installer did not put /usr/lib/mps path in apachectl LD_LIBRARY_PATH, can not start Apache SSL (6591878)

Description

During Enterprise Server Load Balancer installation for Apache on Solaris, the installer updates LD_LIBRARY_PATH in the apachectl script. However, the installer does not correctly write the /usr/lib/mps path. On Solaris, the Apache security instance will not start without this path in LD_LIBRARY_PATH.

Solution

This issue exists only on Solaris platforms. To work around the issue, add /opt/SUNWappserver/appserver/lib/lbplugin/lib to your LD_LIBRARY_PATH.

Enable/disable LB for an instance/cluster should show correct status (6595113)

Description

The Enable LoadBalance button is always enabled on the Clustered/Instance general page, regardless of what is saved in domain.xml.

Solution

AS9.1 EE IFR b58f/JES5 UR1. Cannot install Registry Server, because “incomplete” HA was detected. (6602508)

Description

(Solaris only) After installing Enterprise Server 2.1 on SPARC Solaris 10 with HADB, you may receive the following error after starting Enterprise Server and then attempting to install JES 5 UR1 with Registry Server:


Dependency Error:  Installation can not proceed because the version of HA
Session Store 4.4.3 detected on this host is incomplete , and a compatible
version is required by Servervice Registry Deployment Support.

Solution

It is not possible to install Registry Server from JES 5 UR1 with Enterprise Server 2.1 IFR on Solaris machines. The Registry Server packages have to be installed manually using the pkgadd command from the following JES5 UR1 distribution directory:


path/OS/Products/registry-svr/Packages

Internet Explorer 6.0/7.0 browser specific: Exporting load balancer configuration file throws error (6516068)

Description

(Internet Explorer 6 and 7 only) When attempting to export the Load Balancer configuration file (loadbalancer.xml) from Internet Explorer 6 or 7, the browser displays an error message saying that the sun-loadbalancer_1_2.dtd DTD file cannot be located.

Solution

To save the file, use the following workaround:

  1. Click Export on the Load Balancer page in Internet Explorer.

    The “XML page cannot be displayed” message is displayed.

  2. Click the error frame, and then choose File->Save As from the Internet Explorer.

  3. Save the loadbalancer.xml file to the directory of your choice.