This section describes known high availability database (HADB) issues and associated solutions.
HADB configured with double networks on two subnets works properly on Solaris SPARC. However, due to problems in the operating system or network drivers on some hardware platforms, it has been observed that Solaris x86 and Linux platforms do not always handle double networks properly. This causes the following problems with HADB:
On Linux, some of the HADB processes are blocked when sending messages. This causes HADB node restarts and network partitioning.
On Solaris x86, some problems may arise after a network failure that prevent switching to the other network interface. This does not happen all the time, so it is still better to have two networks than one. These problems are partially solved in Solaris 10.
Trunking is not supported.
HADB does not support double networks on Windows 2003 (ID 5103186).
Creating a new database may fail with the following error, stating that too few shared memory segments are available:
HADB-E-21054: System resource is unavailable: HADB-S-05512: Attaching shared memory segment with key "xxxxx" failed, OS status=24 OS error message: Too many open files.
Verify that shared memory is configured and the configuration is working. In particular, on Solaris 8, inspect the file /etc/system, and check that the value of the variable shmsys:shminfo_shmseg is at least six times the number of nodes per host.
HADB 4.3-0.16 and later is configured to use Intimate Shared Memory when it creates and attaches to its shared memory segments (uses the SHM_SHARE_MMU flag). The use of this flag essentially locks the shared memory segments into physical memory and prevents them from being paged out. This can easily cause problems with installations on low end machines.
Therefore if a developer has a machine with 512MB of memory and plenty of swap space available when using Application Server7.0 EE, and then installed 7.1 EE or later, he or she will encounter problems configuring the default clsetup cluster (which creates two HADB nodes, each with a devicesize of 512, which results in there not being enough physical RAM to support the shared memory that both nodes require.
Make sure you have the recommended amount of memory when co-locating Application Server and HADB. See HADB Requirements and Supported Platforms for more information.
When increasing device or buffer sizes using hadbm set, the management system checks resource availability when creating databases or adding nodes, but does not check if there are sufficient resources available when device or main-memory buffer sizes are changed.
Verify that there is enough free disk/memory space on all hosts before increasing any of the devicesize or buffersize configuration attributes.
It is not possible to register the same software package with the same name with different locations at different hosts; for example:
hadbm registerpackage test --packagepath=/var/install1 --hosts europa11 Package successfully registered. hadbm registerpackage test --packagepath=/var/install2 --hosts europa12 hadbm:Error 22171: A software package has already been registered with the package name test.
HADB does not support heterogeneous paths across nodes in a database cluster. Make sure that the HADB server installation directory (--packagepath) is the same across all participating hosts.
If running the management agent on a host with multiple network interfaces, the create domain command may fail if not all network interfaces are on the same subnet:
hadbm:Error 22020: The management agents could not establish a domain, please check that the hosts can communicate with UDP multicast.
The management agents will (if not configured otherwise) use the “first” interface for UDP multicasts (“first” as defined by the result from java.net.NetworkInterface.getNetworkInterfaces()).
The best solution is to tell the management agent which subnet to use (set ma.server.mainternal.interfaces in the configuration file, e.g., ma.server.mainternal.interfaces=10.11.100.0). Alternatively one may configure the router between the subnets to route multicast packets (the management agent uses multicast address 22.214.171.124).
Before retrying with a new configuration of the management agents, you may have to clean up the management agent repository. Stop all agents in the domain, and delete all files and directories in the repository directory (identified by repository.dr.path in the management agent configuration file). This must be done on all hosts before restarting the agents with a new configuration file.
After deleting an HADB instance, subsequent attempts to create new instances with the configure-ha-cluster command fail. The problem is that old directories are left from the original HADB instance in ha_install_dir/rep/* and ha_install_dir/config/hadb/instance_name.
Be sure to manually delete these directories after deleting an HADB instance.
On Solaris 10 Opteron, starting, stopping or reconfiguring HADB using the hadbm command may fail or hang with one of the following errors:
hadbm:Error 22009: The command issued had no progress in the last 300 seconds. HADB-E-21070: The operation did not complete within the time limit, but has not been cancelled and may complete at a later time.
This may happen if there are inconsistencies reading/writing to a file (nomandevice) which the clu_noman_srv process uses. This problem can be detected by looking for the following messages in the HADB history files:
n:3 NSUP INF 2005-02-11 18:00:33.844 p:731 Child process noman3 733 does not respond. n:3 NSUP INF 2005-02-11 18:00:33.844 p:731 Have not heard from it in 104.537454 sec. n:3 NSUP INF 2005-02-11 18:00:33.844 p:731 Child process noman3 733 did not start.
The following workaround is unverified, as the problem has not been reproduced manually. However, running this command for the affected node should solve the problem.
hadbm restartnode --level=clear nodeno dbname
Note that all devices for the node will be reinitialized. You may have to stop the node before reinitializing it.
When starting on a host running Solaris 8 with several NIC cards installed, if there is a mixture of cards with IPv6 and IPv4 enabled, the management agent may terminate with the exception "IPV6_MULTICAST_IF failed."
Set the environment variable JAVA_OPTIONS to -Djava.net.preferIPv4Stack=true; for example:
Alternatively, use Solaris 9 or later, which do not exhibit this problem.
There is a bug in the 64-bit version of Red Hat Enterprise Linux 3.0 that makes the clu_trans_srv process end up in an uninterruptible mode when performing asynchronous I/O. This means that kill -9 does not work and the operating system must be rebooted.
Use a 32-bit version of Red Hat Enterprise Linux 3.0.
Capital letters in passwords are converted to lowercase when the password is stored in hadb.
Do not use passwords containing capital letters.
When downgrading to a previous HADB version, the management agent may fail with different error codes.
It is possible to downgrade the HADB database, however the management agent cannot be downgraded if there changes have been made in the repository objects. After a downgrade, you must keep use the management agent from the latest HADB version.
Regarding install/removal of HADB c package (Solaris: SUNWhadbc, Linux: sun-hadb-c) version <m.n.u-p>, the symlink /opt/SUNWhadb/<m> is never touched once it exists. Thus, it is possible that an orphaned symlink will exist.
Delete the symlink before install or after uninstall unless in use.
On Solaris 10, stopping a management agent by using the ma-initd script in a global zone stops the management agent in the local zone as well.
Do not install the management agent both in the global and local zone.
Sometimes, a resource contention problem on the server may cause a management client to become disconnected. When reconnecting, a misleading error message "hadbm:Error 22184: A password is required to connect to the management agent" may be returned.
Check if there is a resource problem on the server, take proper action (e.g., add more resources), and retry the operation.
Installing with Java Enterprise System (as root) does not permit non-root users to manage HADB.
Always login as root to manage HADB.
Special use interfaces with IP addresses like 0.0.0.0 should not be registered as valid interfaces to be used for HADB nodes in the Management Agent. Registering such interfaces may cause problems if HADB nodes are set up on these interfaces by means of a user issuing a hadbm create command using host names instead of IP addresses. The nodes will then be unable to communicate, causing the create command to hang.
When using hadbm create on hosts with multiple interfaces, always specify the IP addresses explicitly using DDN notation.
On the Windows platform, with certain configurations and loads, there may be a large number of reassembly failures in the operating system. The problem has been seen with configurations of more than twenty nodes when running several table scans (select *) in parallel. The symptoms may be that transactions abort frequently, repair or recovery may take a long time to complete, and there may be frequent timeouts in various parts of the system.
To fix the problem, the Windows registry variable HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters can be set to a value higher than the default 100. It is recommended that you increase this value to 0x1000 (4096). For more information, see. article 811003 from the Microsoft support pages.
It is possible when a machine is under load that the masking mechanism fails and some characters from the password being entered are exposed. This poses a minor security risk, and the password should always be masked.
Put the passwords in their own password files (the method normally recommended since Application Server 8.1) and refer to these with either the --adminpassword or --dbpasswordfile options.
When the Application Server is installed in a Solaris Global Zone to /usr/SUNWappserver, the HADB component installed with that Application Server instance will not be available in Sparse Local Zones.
The problem is that HADB is installed to /opt/SUNWhadb in the Global Zone, but this directory is not readable from Sparse Local Zones. Unfortunately, the HADB bundle in JES5 is not relocateable.
Because the Application Server HADB component is not relocatable, the HADB component must be installed separately in each Sparse Local Zone from which you want to access HADB.