This section describes the known high availability issues and associated solutions.
ID |
Summary |
---|---|
6301842 |
Sometimes on Windows, the management agent cannot deregister the service when running, ma -r, and fails with the error message, Could not identify program. Solution Start a Windows command prompt window and run sc stop HADBMgmtAgent and then run sc delete HADBMgmtAgent. If the command ma -i -n servicename was used to install and start the service, then use servicename when running the command sc. |
6293912 |
The Management Agent should not use special-use interfaces. Solution When issuing hadbm create on hosts with multiple interfaces, always specify the IP-addresses explicitly, using DDN notation. |
6291562 |
Reassembly failures on Windows. On the Windows platform, with certain configurations and load, there may be a large number of reassembly failures in the operating system. The problem has been seen with configurations of more than 20 nodes when running several table scans (select *) in parallel. The symptoms could be that transactions abort frequently, or repair and recovery may take a long time to complete, and there may be frequent timeouts in various parts of the system. Solution To fix the problem, the Windows registry variable HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters should be set to a value higher than the default value of 100. We recommend increasing it to 0x1000 (4096). For more information, see article 811003 from the Microsoft support pages: http://support.microsoft.com/default.aspx?scid=kb;en-us;811003 |
6275319 |
Non-root users cannot manage HADB. Installing with Java Enterprise System (as root) does not permit non-root users to manage HADB. Solution Always login as root to manage HADB. |
6275103 |
hadbm management agent should give a better error message when a session object has timed out and deleted at MA. Sometimes, a resource contention problem on the server may cause a management client to become disconnected, When reconnecting, a misleading error message, hadbm:Error 22184: A password is required to connect to the management agent may be returned. Solution Check if there is a resource problem on the server, take proper action (e.g., add more resources), and retry the operation. |
6273681 |
Management agents in global and local zones may interfere. On Solaris 10, stopping a management agent by using the ma-initd script in a global zone stops the management agent in the local zone as well. Solution Do not install the management agent both in the global and local zone. |
6271063 |
Install/removal and symlink preservation. Regarding install/removal of HADB c package (Solaris: SUNWhadbc, Linux: sun-hadb-c) version <m.n.u-p>, the symlink /opt/SUNWhadb/<m> is never touched once it exists. Thus, it is possible that an orphaned symlink will exist. Solution Delete the symlink before install or after uninstall unless in use. |
6265419 |
Downgrading from HADB Version 4.4.2.5 to HADB Version 4.4.1.7 causes management agent to fail with different error codes. When downgrading to a previous HADB version, the management agent may fail with different error codes. Solution It is possible to downgrade the HADB database, however the management agent cannot be downgraded if there changes have been made in the repository objects. After a downgrade, you must use the management agent from the latest HADB version. |
6262824 |
hadbm does not support passwords containing uppercase letters. Capital letters in passwords are converted to lowercase when the password is stored in hadb. Solution Do not use passwords containing uppercase letters. |
6173886, 6253132 |
hadbm createdomain may fail. If running the management agent on a host with multiple network interfaces, the createdomain command may fail if not all network interfaces are on the same subnet: hadbm:Error 22020: The management agents could not establish a domain, please check that the hosts can communicate with UDP multicast. The management agents will (if not configured otherwise) use the first interface for UDP multicasts (first as defined by the result from java.net.NetworkInterface.getNetworkInterfaces()). Solution The best solution is to tell the management agent which subnet to use (using ma.server.mainternal.interfaces in the configuration file. For example, ma.server.mainternal.interfaces=10.11.100.0). Alternatively you can configure the router between the subnets to route multicast packets (the management agent uses multicast address 228.8.8.8). Before retrying with a new configuration of the management agents, you should clean up the management agent’s repository. Stop all agents in the domain, and delete all files and directories in the repository directory (identified by repository.dr.path in the management agent configuration file). This must be done on all hosts before restarting the agents with a new configuration file. |
6249685 |
clu_trans_srv process cannot be interrupted on Linux. There is a bug in the 64 bit version of Red Hat Enterprise Linux 3.0 that makes the clu_trans_srv process end up in an uninterruptible mode when performing asynchronous I/O. This means that kill -9 does not work and the operating system must be rebooted. Solution Use a 32 bit version of Red Hat Enterprise Linux 3.0. |
6230792, 6230415 |
Starting, stopping or reconfiguring HADB may fail or hang. On AMD OpteronTM systems running Solaris 10, starting, stopping or reconfiguring HADB using the hadbm command may fail or hang with one of the following errors: hadbm:Error 22009: The command issued had no progress in the last 300 seconds. HADB-E-21070: The operation did not complete within the time limit, but has not been cancelled and may complete at a later time. This may happen if there are inconsistencies while reading/writing to a file (nomandevice) which the clu_noman_srv process uses. This problem can be detected by looking for the following messages in the HADB history files: n:3 NSUP INF 2005-02-11 18:00:33.844 p:731 Child process noman3 733 does not respond. n:3 NSUP INF 2005-02-11 18:00:33.844 p:731 Have not heard from it in 104.537454 sec n:3 NSUP INF 2005-02-11 18:00:33.844 p:731 Child process noman3 733 did not start. Solution To solve the problem, run the following command for the affected node: hadbm restartnode --level=clear nodeno dbname Note that all devices for the node will be reinitialized. You may have to stop the node before reinitializing it. |
None |
HADB database creation fails. Creating a new database may fail with the following error, stating that too few shared memory segments are available: HADB-E-21054: System resource is unavailable : HADB-S-05512: Attaching shared memory segment with key "xxxxx" failed, OS status=24 OS error message: Too many open files. Solution Verify that shared memory is configured and the configuration is working. In particular, on Solaris 8, inspect the file /etc/system, and check that the value of the variable shmsys:shminfo_shmseg is at least six times the number of nodes per host. |
6232140 |
The management agent terminates with the exception, "IPV6_MULTICAST_IF failed." The management agent may terminate with the exception, IPV6_MULTICAST_IF failed, when starting on a host running Solaris 8 with several NIC cards, and if there is a mixture of cards with IPv6 and IPv4 enabled. The root cause is described in bug 4418866/4418865. Solution
|
6171832, 6172138 |
Stale sessions are not cleaned up leading to degraded HADB performance, or the data device is getting full. Solution To remove stale sessions efficiently, modify the sun-ejb-jar.xml file to set the value of cache-idle-timeout-in-seconds to less than the removal-timeout-in-seconds value. If the cache-idle-timeout-in-seconds is equal to or greater than the removal-timeout-in-seconds, old sessions will not be cleaned-up in HADB, which is the expected behavior. If you continue to face issues with stale sessions even after setting these properties as recommended, contact product support for help. |
6171994 |
Improper permissions in security.policy file causing startup hang. Description hadb-jdbc has improper access permissions in the security.policy file. Solution If there is an intermittent hang during startup, add the following suggested permissions in the security.policy file: By default, the following is present: permission java.net.SocketPermission "*", "connect"; Suggested permissions: permission java.net.SocketPermission "*", "connect accept,listen,resolve"; |
5042351 |
New tables created after new nodes are added will not spread on the added nodes. Description If a user creates a database instance, add nodes to it, then any new tables created afterwards will not be fragmented on the nodes added after database creation. Only the tables created before addnodes will be able to use the added nodes when hadbm addnodes refragment it. This is because create table uses the sysnode node group which is created at the boot time of the database (when hadbm create is executed). Solution Run hadbm refragment after new tables have been added, or create the new tables on nodegroup, all_nodes. |
6158393 |
HADB problem with RedHat AS 3.0 in co-located mode under load. Description HADB runs on RedHat Linux AS 3.0 co-located with Application Server. Transactions may get aborted and affect the performance. This is caused by the excessive swapping performed by the operating system. Solution This issue appears to have been resolved when HADB was tested against RedHat Linux AS 3.0 Update 4. |
6214601 |
Addnodes fails with table not found error since hadbm searches user tables in sysroot schema. Description The hadbm refragment command fails with: hadbm:Error 22042: Database could not be refragmented. Please retry with hadbm refragment command to refragment the database.. Caused by: HADB-E-11701: *Table singlesignon not found* Solution Refragment the Application Server tables manually with the help of clusql: > clusql server:port list> system+dbpassword specified at database create> SQL: set autocommit on; SQL: set schema haschema; SQL: alter table sessionattribute nodegroup all_nodes; SQL: alter table singlesignon nodegroup all_nodes; SQL: alter table statefulsessionbean nodegroup all_nodes; SQL: alter table sessionheader nodegroup all_nodes; SQL: alter table blobsessions nodegroup all_nodes; SQL: quit; |
6159633 |
configure-ha-cluster may hang. Description When the asadmin configure-ha-cluster command is used to create or configure a highly available cluster on more than one host, the command hangs. There are no exceptions thrown from the HADB Management Agent or the Application Server. Solution HADB does not support heterogeneous paths across nodes in a database cluster. Make sure that the HADB server installation directory and configuration directory are the same across all participating hosts. Additionally, clear the repository directories before running the command again. |
6197822 |
hadbm set brings the database instance to a state from which it is difficult to recover. Description In this scenario, the hadbm set command fails when attempting to change some database configuration variable; for example, setting DataBufferPoolSize to a larger size fails due to insufficient shared memory on node-0. The hadbm set command then leaves the database with node-0 in stopped state and node-1 in running state. Resetting the pool size back to the original value with the help of hadbm set fails with the message: 22073: The operation requires restart of node 1. Its mirror node is currently not available. Use hadbm status --nodes to see the status of the nodes. In this case, hadbm startnode 0 also fails. Solution Stop the database, then restore the old values using hadbm set and restart the database. |
6200133 |
Failure in configure-ha-cluster; creating an HADB instance fails. Description Attempts to create a HADB cluster fails with the message: HADB-E-00208: The transaction was aborted. The booting transaction populating the SQL dictionary tables gets aborted. Solution Run the configure-ha-cluster command again. If you run the hadbm create command and it fails with the previous message, rerun it. |
5091349 |
Heterogeneous install paths are not supported. It’s not possible to register the same software package with the same name at different locations on different hosts. Solution HADB does not support heterogeneous paths across nodes in a database cluster. Ensure that the HADB server installation directory and configuration directory are same across all participating hosts. |
5091280 |
hadbm set does not check resource availability (disk and memory space) Scenario Increasing device or buffer sizes using hadbm set. Description The management system will check resource availability when creating databases or adding nodes, but it will not check if there are sufficient resources available when device or main-memory buffer sizes are changed. Solution Check that there is enough free disk/memory space on all hosts before increasing any of the devicesize or buffersize configuration attributes. |
4855623 |
When one of the nodes’ host is down, hadbm stop command does not exit. The hadbm stop command may not be able to shutdown a database completely if HADB nodes do not receive shutdown messages due to network problems. The typical symptom is that hadbm takes more than 60 seconds to complete. In this situation, hadbm stop/delete will not work. You must specify the nodes that needs to be shutdown. Solution
|
4861337 |
If an active data node fails while executing hadm stopdb, hadm startdb will fail. hadbm status should return non-operational if the database is unable to start. Solution To correct the problem:
|
4958827 |
Child process transaction does not respond. When a host machine accommodates more than one HADB node and all nodes use the same disk for placing their devices, it is observed that the disk I/O becomes the bottleneck. HADB process have been waiting for asynchronous I/O and therefore did not answer the node supervisor’s heartbeat check. This causes the processes to be restarted by the node supervisor. Although this problem can occur on any operating system, it is observed on Red Hat Linux AS 2.1 and 3. Solution Use separate disks to place the devices belonging to different HADB nodes residing on the same machine. |
None |
HADB Configuration with Double Networks HADB, configured with double networks on two subnets, work properly on Solaris SPARC. However, due to problems in the operating system or network drivers on some hardware platforms, it is observed that Solaris x86 and Linux platforms do not handle double networks properly. This causes the following problems to HADB:
|