|Skip Navigation Links|
|Exit Print View|
|Oracle Solaris Cluster 3.3 3/13 Release Notes Oracle Solaris Cluster 3.3 3/13|
The following known issues and bugs affect the operation of the Oracle Solaris Cluster 3.3 3/13 release. Contact your Oracle support representative to learn whether a fix becomes available. Bugs and issues are grouped into the following categories:
Problem Summary: The MTU of the cluster clprivnet interface is always set to the default value of 1500 and does not match the MTU of the underlying private interconnects. Therefore, you cannot set the jumbo frame MTU size for the clprivnet interface.
Workaround: There is no known workaround. Contact your Oracle support representative to learn whether a patch becomes available.
Problem Summary: The cluster check utility may report a violation of check S6708502, indicating that real-time process ora_dism is not supported for Oracle Solaris Cluster.
Workaround: Ignore the check violation for this specific process. This real-time process is new for Oracle RAC 12c and is allowed for Oracle Solaris Cluster.
Problem Summary: The HA-Oracle database resource will not fail over when the public network fails when the HA-Oracle database is configured to use the Grid Infrastructure SCAN listener.
Workaround: When using the Oracle Grid Infrastructure SCAN listener with an HA-Oracle database, add a logical host with an IP address that is on the same subnet as the SCAN listener to the HA-Oracle database resource group.
Problem Summary: Extended attributes are not currently supported by cluster file systems. When a user mounts a cluster file system with the xattrmount option, the following behavior is seen:
The extended attribute operations on a regular file fail with an ENOENT error.
The extended attribute operations on a directory results in normal operations on the directory itself.
Therefore, any program accessing the extended attributes of files in a cluster file system might not get the expected results.
Workaround: Mount a cluster file system with the noxattrmount option.
Problem Summary: If a failover data service, such as HA for Oracle, is configured with the ScalMountpoint resource to probe and detect NAS storage access failure, and the network interface is lost, such as due to a loss of cable connection, the monitor probe hangs. If the Failover_mode property of the data service resource is set to SOFT, this results in a stop-failed status and the resource does not fail over. The associated error message is similar to the following:
SC[SUNW.ScalMountPoint:3,scalmnt-rg,scal-oradata-11g-rs,/usr/cluster/lib/rgm /rt/scal_mountpoint/scal_mountpoint_probe]: Probing thread for mountpoint /oradata/11g is hanging for timeout period 300 seconds
Workaround: Change the Failover_mode property on the data service resource to HARD.
# clresource set -p Failover_mode=HARD ora-server-rs # clresource show -v ora-server-rs | grep Failover_mode Failover_mode: HARD
Problem Summary: The current implementation requires an RTR file, rather than a symbolic link to the file, to be present in /usr/cluster/lib/rgm/rtreg.
Workaround: Perform the following commands as superuser on one node of the global cluster.
# cp /opt/SUNWscor/oracle_asm/etc/SUNW.scalable_acfs_proxy /usr/cluster/lib/rgm/rtreg/ # clrt register -Z zoneclustername SUNW.scalable_acfs_proxy # rm /usr/cluster/lib/rgm/rtreg/SUNW.scalable_acfs_proxy
Problem Summary: The clzonecluster boot, reboot, and halt subcommands fail, even if one of the cluster nodes is not in the cluster. An error similar to the following is displayed:
root@pnode1:~# clzc reboot zoneclustername clzc: (C827595) "pnode2" is not in cluster mode. clzc: (C493113) No such object. root@pnode1:~# clzc halt zoneclustername clzc: (C827595) "pnode2" is not in cluster mode. clzc: (C493113) No such object.
The clzonecluster boot, reboot, and halt subcommands should skip over nodes that are in noncluster mode, rather than fail.
Workaround: Use the following option with the clzonecluster boot or clzonecluster halt commands to specify the list of nodes for the subcommand:
The -n option allows running the subcommands on the specified subset of nodes. For example, if, in a three-node cluster with the nodes pnode1, pnode2, and pnode3, the node pnode2 is down, you could run the following clzonecluster subcommands to exclude the down node:
clzonecluster halt -n pnode1,pnode3 zoneclustername clzonecluster boot -n pnode1,pnode3 zoneclustername clzonecluster reboot -n pnode1,pnode3 zoneclustername
Problem Summary: The chmod command might fail to change setuid permissions on a file in a cluster file system. If the chmod command is run on a non-global zone and the non-global zone is not on the PxFS primary server, the chmod command fails to change the setuid permission.
# chmod 4755 /global/oracle/test-file chmod: WARNING: can't change /global/oracle/test-file
Workaround: Do one of the following:
Perform the operation on any global-cluster node that accesses the cluster file system.
Perform the operation on any non-global zone that runs on the PxFS primary node that has a loopback mount to the cluster file system.
Switch the PxFS primary to the global-cluster node where the non-global zone that encountered the error is running.
Problem Summary: When you use an XML configuration file to create resources, if any of the resources have extension properties that are not tunable, that is, the Tunable resource property attribute is set to None, the command fails to create the resource.
Workaround: Edit the XML configuration file to remove the non-tunable extension properties from the resource.
Problem Summary: Turning off fencing for a shared device with an active I/O load might result in a reservation conflict panic for one of the nodes that is connected to the device.
Workaround: Quiesce I/O to a device before you turn off fencing for that device.
Problem Summary: During cluster configuration on logical domains with hybrid I/O, autodiscovery does not report any paths for the cluster interconnect.
Workaround: When you run the interactive scinstall utility, choose to configure the sponsor node and additional nodes in separate operations, rather than by configuring all nodes in a single operation. When the utility prompts "Do you want to use autodiscovery?", answer "no". You can then select transport adapters from the list that is provided by the scinstall utility.
Problem Summary: If a Hitachi TrueCopy device group whose replica pair is in the COPY state, or an EMC SRDF device group whose replica pair is split, attempts to switch the device group over to another node, the switchover fails. Furthermore, the device group is unable to come back online on the original node until the replica pair is been returned to a paired state.
Workaround: Verify that TrueCopy replicas are not in the COPY state, or that SRDF replicas are not split before you attempt to switch the associated Oracle Solaris Cluster global-device group to another cluster node.
Problem Summary: Changing a cluster configuration from a three-node cluster to a two-node cluster might result in complete loss of the cluster, if one of the remaining nodes leaves the cluster or is removed from the cluster configuration.
Workaround: Immediately after removing a node from a three-node cluster configuration, run the cldevice clear command on one of the remaining cluster nodes.
Problem Summary: The scdidadm and cldevice commands are unable to verify that replicated SRDF devices that are being combined into a single DID device are, in fact, replicas of each other and belong to the specified replication group.
Workaround: Take care when combining DID devices for use with SRDF. Ensure that the specified DID device instances are replicas of each other and that they belong to the specified replication group.
Problem Summary: During an ungraceful shutdown of a cluster node, such as a node panic, the Oracle Clusterware sun.storage-proxy-resource of type sun.storage_proxy.type might stay offline upon node bootup. This in turn will cause the Oracle Solaris Cluster RAC server proxy resource to stay offline.
Workaround: Perform the following steps:
Bring up the ACFS storage proxy resource manually.
# crsctl stop res sun.storage-proxy-resource -n nodename # crsctl start res sun.storage-proxy-resource -n nodename
Bring online the Oracle Solaris Cluster RAC server-proxy resource.
# clresourcegroup online rac-server-proxy-resource-group
Problem Summary: The TimesTen active-standby configuration requires an integration of Oracle Solaris Cluster methods in the TimesTen ttCWadmin utility. This integration has not yet occurred, even though it is described in the Oracle Solaris Cluster Data Service for Oracle TimesTen Guide. Therefore, do not use the TimesTen active-standby configuration with Oracle Solaris Cluster HA for TimesTen and do not use the TimesTen ttCWadmin utility on Oracle Solaris Cluster.
The Oracle Solaris Cluster TimesTen data service comes with a set of resource types. Most of these resource types are meant to be used with TimesTen active-standby configurations, You must use only the ORCL.TimesTen_server resource type for your highly available TimesTen configurations with Oracle Solaris Cluster.
Workaround: Do not use the TimesTen active-standby configuration.
Problem Summary: If you run the clzonecluster halt zonecluster command, followed by the clzonecluster boot zonecluster command, one or more of the nodes fail to boot with the following error:
root@node1:/# clzonecluster boot zc1 Waiting for zone boot commands to complete on all the nodes of the zone cluster "zc1"... clzc: (C215301) Command execution failed on node node2. zoneadm: zone 'zc1': These file-systems are mounted on subdirectories of /gpool/zones/zone1/root: zoneadm: zone 'zc1': /gpool/zones/zone1/root/u01 zoneadm: zone 'zc1': call to zoneadmd failed
The zone cluster node will not boot and the clzonecluster status command shows the node(s) as offline.
Workaround: Unmount the file system in the global zone of the offline node(s): /usr/sbin/umount/gpool/zones/zone1/root/u01 (in the case above) and run the following command in the global zone of any node of the zone cluster: /usr/cluster/bin/clzonecluster boot -n offline-node zonecluster. Verify that the offline node(s) are now online by running the /usr/cluster/bin/clzonecluster status command.
Problem Summary: When the HA for Oracle VM Server for SPARC (HA for Logical Domains) resource is disabled during manual maintenance operations, the zpool export fails. This failure occurs because the bound state of the logical domain keeps the ZFS zpool, which is dependent on the failover ZFS resource, in a busy state. Switchovers and failovers are not affected.
Workaround: Perform the following steps:
Release resources from the logical domain.
# ldm unbind-dom ldom
Clear the HASP resource that is in a STOP_FAILED state.
Problem Summary: The HAStoragePlus probe does not automatically remount a Solaris ZFS filesystem, if it has been unmounted.
Workaround: Provide a mount point to the dataset, and then any manually unmounted file systems are automatically remounted by the HAStoragePlus probe. For example:
# zfs list NAME USED AVAIL REFER MOUNTPOINT pool-1 414K 417G 32K none pool-1/test1 31.5K 417G 31.5K /testmount1 pool-1/test2 31.5K 417G 31.5K /testmount2
If pool-1 is given a mount point, then any manually unmounted file systems are automatically remounted by the HAStoragePlus probe.
# zfs set -p mountpoint=/pool-1 pool-1 # zfs list NAME USED AVAIL REFER MOUNTPOINT pool-1 414K 417G 32K /pool-1 pool-1/test1 31.5K 417G 31.5K /testmount1 pool-1/test2 31.5K 417G 31.5K /testmount2
Problem Summary: When you stop the HA-OHS data service, the /bin/grep: illegal option -q message appears even though the resource goes offline successfully.
Workaround: Ensure that the OHS processes are not running before bringing the OHS application under cluster control. If the processes are not running outside the cluster control, you can ignore this message.
Problem Summary: Error messages related to removal of a CCR entry (domain configuration) appear when the resource is being deleted.
Workaround: These error messages are harmless and can be ignored.
Problem Summary: If the WebLogic Server application is configured on a failover file system, the wizard fails to create a resource. If the WebLogic Server instances are configured to listen on "All IP Addresses", the wizard also fails to create a resource.
Workaround: If the WebLogic Server application is configured on a failover filesystem and the wizard fails to create a resource, manually create the HA-WLS resource for a failover file system. If the WebLogic Server instances are configured to listen on "All IP Addresses" and the wizard fails to create a resource, configure the instances to listen only on logical hosts as recommended in the documentation or manually create the HA-WLS resource.
Problem Summary: Configuring RAC framework using the data service wizard fails in a zone cluster if the wizard is run from a node that is not a part of the zone cluster.
Workaround: If configuring the data service in a zone cluster, run the wizard from one of the cluster nodes that is hosting the zone cluster.
Problem Summary: After applying the Samba patch 119757-20 (SPARC) or 119758-20 (x86), the binary locations change from /usr/sfw/bin to /usr/bin and /usr/sfw/lib to /usr/lib/samba. This breaks the Oracle Solaris Cluster Data Service for Samba.
Workaround: If the patches listed above have been installed, then the Oracle Solaris Cluster Data Service for Samba resource needs to be reregistered (you must remove it and register it again). The /opt/SUNWscsmb/util/samba_config file must specify the new binary locations as described above. After the samba_config file has been changed, the /opt/SUNWscsmb/util/samba_register file must be executed to register the resource again.
Problem Summary: If you set the Debug_level property to 1, a start of a dialogue instance resource is impossible on any node.
Workaround: Use Debug_level=2, which is a superset of Debug_level=1.
Problem Summary: When the /etc/vfstab file entry for a cluster file system has a mount-at-boot value of noand the cluster file system is configured in a SUNW.HAStoragePlus resource that belongs to a scalable resource group, the SUNW.HAStoragePlus resource fails to come online. The resource stays in the Starting state until prenet_start_method is timed out.
Workaround: In the /etc/vfstab file's entry for the cluster file system, set the mount-at-boot value to yes.
Problem Summary: If scalable applications configured to run in different zone clusters bind to INADDR_ANY and use the same port, then scalable services cannot distinguish between the instances of these applications that run in different zone clusters.
Workaround: Do not configure the scalable applications to bind to INADDR_ANY as the local IP address, or bind them to a port that does not conflict with another scalable application.
When adding or removing a NAS device, running the clnas addor clnas removecommand on multiple nodes at the same time might corrupt the NAS configuration file.
Workaround: Run the clnas add or clnas removecommand on one node at a time.
Problem Summary: When a native brand non-global zone is added to the node list of a resource group that contains an HAStoragePlus resource with ZFS pools configured, the HAStoragePlus resource might enter the Faulted state. This problem happens only when the physical node that hosts the native zone is part of the resource-group node list.
Workaround: Restart the resource group that contains the faulted HAStoragePlus resource.
# clresourcegroup restart faulted-resourcegroup
Problem Summary: The Oracle RAC configuration wizard fails with the message, ERROR: Oracle ASM is either not installed or the installation is invalid!.
Workaround: Ensure that the “ASM” entry is first within the /var/opt/oracle/oratab file, as follows:
root@phys-schost-1:~# more /var/opt/oracle/oratab … +ASM1:/u01/app/11.2.0/grid:N # line added by Agent MOON:/oracle/ora_base/home:N
Problem Summary: The takeover operation fails when the primary cluster loses access to the storage device.
Workaround: Bring the primary cluster with lost access to storage down.
Problem Summary: When Geographic Edition switches a protection group (PG) to have the secondary role, it unmanages ASM device groups incorrectly. If the cluster is then restarted and if the LUNs are read-write, these devices groups are incorrectly re-enabled. When Geographic Edition is restarted, writes to these LUNs will be disabled and the user might see several fatal write errors on the system console. These errors do not indicate a serious problem and can be ignored. Geographic Edition will operate correctly.
Workaround: Ignore the messages.
Problem Summary: If a node leaves the cluster when the site is the primary, the projects or iSCSI LUNs are fenced off. However, after a switchover or takeover when the node joins the new secondary, the projects or iSCSI LUNs are not unfenced and the applications on this node are not able to access the file system after it is promoted to the primary.
Workaround: Reboot the node.
Problem Summary: If Oracle Solaris Cluster Geographic Edition is configured in a zone cluster, duplicate notification emails about loss of connection to partner clusters are sent from both the zone cluster and the global cluster. Emails should only be sent from the zone cluster.
Workaround: This is a side effect of the cluster event handling. It is harmless and you should ignore the duplicate emails.
Problem Summary: DR state stays reporting unknown, although DR resources are correctly reporting replication state.
Workaround: Run the geopg validate protection-group command to force a resource-group state notification to the protection group.
Problem Summary: If you use the browser user interface (BUI) to stop replication, the protection group goes to a configuration error state when protection-group validation fails.
Workaround: From the BUI, perform the following actions to stop replication:
Under the Shares tab, select the project being replicated.
Click on the Replication tab and select the Scheduled option.
Wait until the status changes to manual, then click the Enable/Disable button.
Problem Summary: When you use the centralized install, the scinstall utility fails to configure the cluster when DES authentication is enabled and nodes are specified as fully qualified host names. An error message similar to the following appears:
Updating file ("ntp.conf.cluster) on node <FQ-host-name) ... failed scinstall: Failed to configure ("ntp.conf.cluster") on node <FQ-host-name> scinstall: scinstall did NOT complete successfully!
Workaround: Rerun the scinstall utility and this time select the option to configure one node at a time. Specify the node name without the domain name. If you are configuring a two-node cluster, the quorum configuration will fail causing the install mode to not be reset. Manually reset the install mode in that case, after the nodes boot up into cluster mode.
Problem Summary: A zone cluster can be in a Ready-Offline state when it is booted because of incomplete Oracle Solaris system configuration in the zones of the zone cluster. The zones of the zone cluster are in the interactive system configuration mode to wait for the input. This occurs when there is no system configuration file (/etc/sysidcfg) or the file does not contain all required system configuration properties in the global zone on the cluster node.
Workaround: Before installing a zone cluster, create the /etc/sysidcfg file and specify all required system configuration properties on all cluster nodes. The configuration properties in the file are used to do the Oracle Solaris system configuration automatically during the first boot of the zone cluster. The list of required Oracle Solaris system configuration properties can vary according to the Oracle Solaris OS version. See the Oracle Solaris Administration: Basic Administration for more details. Alternatively, after the zone cluster is installed and booted, use zlogin -C zone to log in on all nodes of the zone cluster to manually complete the Oracle Solaris system configuration.
Problem Summary: Running the scinstall -u command leaves 'installed' zones in a 'mounted' state. This state causes an issue for live upgrade when the system is rebooted as it fails to fix the zonepaths for the alternate boot environment.
Workaround: Perform the following steps:
Run the svcadm disable zones command.
All running zones should now be in a mounted state. The service might end up in maintenance after 100 seconds, but this is not a problem.
Run the zoneadm —z zonename unmount command for all zones.
Type init 6.
Problem Summary: The Oracle Enterprise Manager Ops Center Agent for Oracle Solaris 10 uses JavaDB software for its configuration database. When installing the Oracle Solaris Cluster software by using the installer utility, the JavaDB software package is reinstalled, causing an existing agent configuration database to be deleted.
The following error messages are reported from the Ops Center Agent as a result of the package getting removed:
java.sql.SQLException: Database '/var/opt/sun/xvm/agentdb' not found. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Unknown Source)
The Agent is broken now and needs to be unconfigured or configured.
Workaround: Manually install on all cluster nodes the following additional JavaDB packages from the Oracle Solaris Cluster media:
Running the installer utility does not remove the existing JavaDB database packages.
Problem Summary: When you use the installer utility in the Simplified Chinese and Traditional Chinese locales to install Oracle Solaris Cluster software, the software that checks the system requirements incorrectly reports that the swap space is 0 Mbytes.
Workaround: Ignore this reported information. In these locales, you can run the following command to determine the correct swap space:
# df -h | grep swap
Problem Summary: The clzonecluster interactive configuration (opened by the clzonecluster configure command zcname) can crash in some circumstances when the cancel subcommand is issued. The Error executing zone configure command error message is displayed .
Workaround: You can safely ignore this problem. Only unsaved configuration data is lost due to the problem. To avoid a crash of the configuration utility, do not use the cancel command.
Problem Summary: Any environment variables that are specified in the service manifest are not recognized when the service is put under SUNW.Proxy_SMF_failover resource type control.
Workaround: Modify the service methods to set the environment variables directly.
Problem Summary: Cluster transport paths go offline with accidental use of the ifconfig unplumb command on the private transport interface.
Workaround: Disable and re-enable the cable that the disabled interface is connected to.
Determine the cable to which the interface is connected.
# /usr/cluster/bin/clinterconnect show | grep Cable
Disable the cable for this interface on this node.
# /usr/cluster/bin/clinterconnect disable cable
Re-enable the cable to bring the path online.
# /usr/cluster/bin/clinterconnect enable cable
Problem Summary: Logical hostname failover requires getting the netmask from the network if nis/ldap is enabled for the netmasks name service. This call to getnetmaskbyaddr() hangs for a while due to CR 7051511, which might hang long enough for the Resource Group Manager (RGM) to put the resource in the FAILED state. This occurs even though the correct netmask entries are in the /etc/netmasks local files. This issue affects only multi-homed clusters, such as cluster nodes that reside on multiple subnets.
Workaround: Configure the /etc/nsswitch.conf file, which is handled by an SMF service, to only use files for netmasks lookups.
# /usr/sbin/svccfg -s svc:/system/name-service/switch setprop config/netmask = astring:\"files\" # /usr/sbin/svcadm refresh svc:/system/name-service/switch