Go to main content

Oracle® Solaris Cluster Data Services Developer's Guide

Exit Print View

Updated: September 2015
 
 

Supporting Resource Types That Perform Resource Migration From Their Stop Method

Most RGM resource types separate their stopping and starting actions using the Stop or Postnet_stop and Start or Prenet_start methods. In some cases, such as in Oracle VM Server for SPARC that supports live migration, an entire switchover of the resource is performed by the Stop method. For example, if an Oracle VM Server for SPARC logical domain starts out on node1 and you execute a switchover to node2, the RGM executes the Stop method on node1 and the resource goes offline. In addition, during the execution of the Stop method on node1, the Oracle VM Server for SPARC logical domain is switched over to node2 using live migration in which case the Start method executed on node2 is a no-op.

A problem can occur if another resource group declares a strong negative (SN) affinity for the resource's group or if the RGM load limits are in use. In such a case, the logical domain actually migrates onto the target node before the RGM can evict any resource group or groups that need to be evicted due to strong negative affinities or load limits. Hence, there is a time interval in which the hard load limit or SN affinity is violated, resulting in overload of the node and consequent failure of the live migration attempt.

The following two features allow a data service that performs a switchover using live migration from the Stop method, to find out the target node of the switchover and to trigger any required resource group evictions caused by strong negative affinities or load limits before the switchover is initiated.

  • SCHA_TARGET_NODES query

  • Pre_evict resource property

SCHA_TARGET_NODES Query

The SCHA_TARGET_NODES query tag for scha_resourcegroup_get returns a list of target node names to which the given resource group is currently in the process of switching over. This query returns a meaningful value while the resource group is in PENDING_OFFLINE state on its current master or in PENDING_ONLINE state on the target node of the switchover. This query is typically called from a Stop or Postnet_stop method of a resource within the switching resource group. The output argument type is scha_str_array_t**. The corresponding operation tag for the command line form of scha_resourcegroup_get is Target_nodes. See the scha_resourcegroup_get(1HA) and the scha_resourcegroup_open(3HA) man pages for more information.

Note the following points about the SCHA_TARGET_NODES query tag:

  • If a switchover is not in progress on the given resource group, the Target_nodes query returns an empty list.

  • If a switchover is in progress on a multi-mastered resource group, in which the Maximum_primaries setting is greater than 1, the list returned by the Target_node query contains all nodes in the targeted mastery of the resource group, excluding current masters. For a single-mastered resource group, the returned list contains at most one target node.

  • The SCHA_TARGET_NODES query returns the target nodes for failbacks, scha_control giveovers, or user-initiated resource group switch or resource group remaster commands. Resource group switches include clresourcegroup switch or clresourcegroup online commands.

  • The SCHA_TARGET_NODES query tag does not return the destination nodes for resource groups that are taken offline due to a node evacuation, a resource group eviction, a strong affinity, or a Start method failure. In such cases, the SCHA_TARGET_NODES query returns an empty list.

The SCHA_TARGET_NODES query is called by a stopping method of a resource to determine the target node for a live migration. Note that the return value of an empty list indicates that the resource group is going offline but it does not necessarily imply that the resource group will remain offline. For example, in the case of a start failure or node evacuation of a single-mastered resource group, the SCHA_TARGET_NODES query returns an empty list but the resource group might still fail over to another node. As a result of this limitation, a data service might not be able to perform live migration for node evacuations or for failovers due to the failure of Start methods.

If a resource group is switching nodes due to a strong affinity only, the SCHA_TARGET_NODES query returns an empty list. For example, if RG1 declares a strong positive (++ or +++) affinity for RG2 and you switch RG2 from node1 to node2, RG1 is also switched from node1 to node2. However, a SCHA_TARGET_NODES query on RG1 will return an empty list, which would prevent live migration from being applied. If RG1 and RG2 are in the same global cluster or zone cluster, the Oracle Solaris Cluster administrator can work around this issue by using the clresourcegroup switch command to switch both RG1 and RG2 together, instead of relying on the strong positive affinity. Having both resource groups as operands of the clresourcegroup switch command allows the Target_nodes query to work correctly on both of them.

Example 3  C Form of the SCHA_TARGET_NODES Query
    #include <scha.h>...
scha_err_t err;
scha_str_array_t *node_list;
scha_resourcegroup_t handle;
int ix;
char *rgname = "example_RG";
err = scha_resourcegroup_open(rgname, &handle);
err = scha_resourcegroup_get(handle, SCHA_TARGET_NODES, &node_list);
if (err == SCHA_ERR_NOERR) {	
      for (ix = 0; ix < node_list->array_cnt; ix++) {
		    printf("Group %s target node: %s\n", rgname,
		          node_list->str_array[ix]);
	}
}
...
Example 4  CLI Form of the SCHA_TARGET_NODES Query
# scha_resourcegroup_get -G mygroup -O Target_nodes
node2
#

Pre_evict Resource Property

The Pre_evict property allows the RGM to prestart a resource group, RG1, on a target node before RG1 has actually switched offline from its current master. The prestart action evicts the resource groups from the target node prior to RG1 executing its switchover onto the target node.

The prestart action is executed before the live migration so that the target node will satisfy the necessary load and affinity prerequisites for hosting the migrating resource group.

Note the following points for the Pre_evict resource property:

  • The RGM evicts RG2 from the target node prior to executing the switchover of RG1 under the following conditions:

    • If a resource group RG1 contains a resource with Pre_evict set to TRUE

    • If RG1 is being switched from one node to another

    • If RG1 preempts another resource group RG2 on the target node due to strong negative affinities or loadlimits and causes the eviction of RG2

    The eviction is completed before the execution of any stopping methods of resources in RG1.

  • During the pre-eviction action, the state of RG1 on its current master is set to PENDING_OFFLINE. The status message of each resource in RG1 that has Pre_evict=TRUE is set on RG1's current master node to Waiting for resource group evictions on node %s, where %s indicates the name of the switchover target node. The status message is localized to the system locale that is applicable to rgmd's environment, which is normally the default system locale.

  • The pre-eviction action is taken only for single-mastered resource groups, that is for resource groups with the Maximum_primaries property set to 1. For a resource group with Maximum_primaries property set to greater than 1, the Pre_evict settings of its resources is ignored, and prestart/eviction actions do not take place on the target node or nodes.

  • If RG1 is only being brought online on a new master but not being switched off of any current master, resource group evictions take place on the new master before bringing RG1's resources online, regardless of the setting of the Pre_evict resource property. In such a case, pre-eviction is equivalent to the default eviction because there is no stopping phase for RG1.

  • If RG1 goes offline due to a Startmethod failure, a resource group eviction, a strong affinity, or a node evacuation, the prestart/eviction action will not take place regardless of the setting of Pre_evict.

    For example, if RG1 declares a strong positive (++ or +++) affinity for RG2 and you switch RG2 from node1 to node2, RG1 is also switched from node1 to node2 but pre-eviction is not performed on RG1. This prevents live migration from being applied to RG1. If RG1 and RG2 are in the same global cluster or zone cluster, the Oracle Solaris Cluster administrator can work around this issue by using the clrresourcegroup switch command to switch both RG1 and RG2 together instead of relying on the strong positive affinity. Having both resource groups as operands of the clrresourcegroup switch command allows the pre-eviction feature to work correctly on both of them.

Note that in such cases where pre-eviction is not performed, the evictions are done in the default manner. Evictions occur after stopping the switching resource group on its original master and before starting it on the target node.

For information about the Pre_evict resource property, see the r_properties(5) man page.

Example of a Giveover of a Resource Group With Live Migration

The following figure shows an example of a giveover of the resource group RG1, which contains an Oracle VM Server for SPARC resource.

Figure 2  Example of a Giveover of a Resource Group

image:The diagram shows an example of a giveover of resource group RG1 							which contains an Oracle VM Server for SPARC resource.

The giveover switches RG1 from Node1 to Node2, which results in eviction of resource group RG2 from Node2.

With the default RGM behavior, RG1 would execute its stopping methods on Node1 before RG2 is evicted from Node2.

With the newly-implemented behavior, RG2 is evicted from Node2 before RG1 even begins to stop on Node1. This clears the way for RG1 to use live migration without confronting the workload imposed by RG2 on Node2. The Stop method of the resource type, SUNW.ldom, can use the SCHA_TARGET_NODES query to find out the switchover target node, Node2, to which the live migration needs to be performed.

Error Handling

During a switchover of a resource group in which the prestart and eviction actions are triggered, if the evicted resource group encounters Stop method failures, the switchover attempt returns failure and the switchover operand remains on its current master.