Oracle® Solaris Cluster 4.2 Release Notes

Exit Print View

Updated: September 2014, E39651-02
 
 

LDom Stop Timeout in SUNWscxvm.stop Prevents LDom Unbound (18335346)

Problem Summary: If ldm stop is timed out in SUNWscxvm.stop, the logical domain (LDom) remains bounded and prevents the cluster resource group from cleanly shutting down the domain. This behavior prevents the successful failover of the resource group. Also, currently the STOP TIMEOUT value is not taken into consideration while stopping the LDom and it automatically time outs after 60 seconds.

You might encounter one of the following error messages.

[ID 885590 daemon.notice] Domain domain_name has been forcefully terminated. 
[ID 567783 daemon.notice] domain stop result code : 0 - ldom_name stop timed out. The domain might still be in the process of shutting down. 
[ID 567783 daemon.notice] domain stop result code : 0 - Either let it continue, or specify -f to force it to stop. 
[ID 567783 daemon.notice] domain stop result code : 0 - LDom ldom_name cannot be unbound because it is stopping 
[ID 567783 daemon.notice] domain stop result code : 0 - LDom ldom_name stopped

When you run ldm list, you will see that the LDom is in a bound state.

# ldm list 
NAME STATE FLAGS CONS VCPU MEMORY UTIL NORM UPTIME 
primary active -n-cv- UART 16 15872M 0.1% 0.1% 28m 
ldom-1 active -n---- 5000 8 8G 0.0% 0.0% 28m 
ldom-2 bound ------ 5001 112 112G

When the clresourcegroup online command is issued after the LDom stop timeout messages, it hangs because it has been forcibly terminated.

Workaround: This workaround works only for nodes with at least Logical Domains Manager 3.1. To modify the LDom timeout value:

Edit the /opt/SUNWscxvm/bin/functions file on all the nodes by replacing:

${HATIMERUN} -t ${MAX_STOP_TIMEOUT} -k KILL ${LDM} stop-domain ${DOMAIN} >> $LOGFILE 2&1

with:

LDOM_TIMEOUT=$((MAX_STOP_TIMEOUT*80/100)) ${HATIMERUN} -t ${MAX_STOP_TIMEOUT} -k KILL ${LDM} stop-domain -t ${LDOM_TIMEOUT} ${DOMAIN} >> $LOGFILE 2&1

Now the ldm stop timeout is not 60 seconds but LDOM_TIMEOUT seconds. The ldm stop-domain command is issued first if the LDom does not shutdown within LDOM_TIMEOUT seconds. Then ldom stop-domain -q is issued. This command automatically forces the LDom to stop. To allow time for ldom stop-domain -q to execute, LDOM_TIMEOUT has been set to 80% of MAX_STOP_TIMEOUT.