23 Using Whole Server Migration and Service Migration in an Enterprise Deployment

The Oracle WebLogic Server migration framework supports Whole Server Migration and Service Migration. The following sections explain how these features can be used in an Oracle Fusion Middleware enterprise topology.

About Whole Server Migration and Automatic Service Migration in an Enterprise Deployment

Oracle WebLogic Server provides a migration framework that is an integral part of any highly available environment. The following sections provide more information about how this framework can be used effectively in an enterprise deployment.

Understanding the Difference between Whole Server and Service Migration

The Oracle WebLogic Server migration framework supports two distinct types of automatic migration:

  • Whole Server Migration, where the Managed Server instance is migrated to a different physical system upon failure.

    Whole server migration provides for the automatic restart of a server instance, with all its services, on a different physical machine. When a failure occurs in a server that is part of a cluster which is configured with server migration, the server is restarted on any of the other machines that host members of the cluster.

    For this to happen, the servers must use a floating IP as listen address and the required resources (transactions logs and JMS persistent stores) must be available on the candidate machines.

    See Whole Server Migration in Administering Clusters for Oracle WebLogic Server.

  • Service Migration, where specific services are moved to a different Managed Server within the cluster.

    To understand service migration, it's important to understand pinned services.

    In a WebLogic Server cluster, most subsystem services are hosted homogeneously on all server instances in the cluster, enabling transparent failover from one server to another. In contrast, pinned services, such as JMS-related services, the JTA Transaction Recovery Service, and user-defined singleton services, are hosted on individual server instances within a cluster—for these services, the WebLogic Server migration framework supports failure recovery with service migration, as opposed to failover.

    See Understanding the Service Migration Framework in Administering Clusters for Oracle WebLogic Server.

Implications of Using Whole Server Migration or Service Migration in an Enterprise Deployment

Using Whole Server Migration (WSM) or Automatic Service Migration (ASM) in an Enterprise Deployment has implications in the infrastructure and configuration requirements.

The implications are:

  • The resources used by servers must be accessible to both the original and failover system

    In its initial status, resources are accessed by the original server or service. When a server or service is failed over/restarted in another system, the same resources (such as external resources, databases, and stores) must be available in the failover system. Otherwise, the service cannot resume the same operations. It is for this reason, that both whole server and service migration require that all members of a WebLogic cluster have access to the same transaction and JMS persistent stores (whether the persistent store is file-based or database-based).

    Oracle allows you to use JDBC stores, which leverage the consistency, data protection, and high availability features of an oracle database and makes resources available for all the servers in the cluster. Alternatively, you can use shared storage. When you configure persistent stores properly in the database or in shared storage, you must ensure that if a failover occurs (whole server migration or service migration), the failover system is able to access the same stores without any manual intervention.

  • Leasing Datasource

    Both server migration and service migration (whether in static or dynamic clusters) require the configuration of a leasing datasource that is used by servers to store alive timestamps. These timestamps are used to determine the health of a server or service, and are key to the correct behavior of server and service migration (they are used to marks servers or services as failed and trigger failover).

    Note:

    Oracle does not recommend that you use consensus leasing for HA purposes.

  • Virtual IP address

    In addition to shared storage, Whole Server Migration requires the procurement and assignment of a virtual IP address (VIP) for each individual server and the corresponding Virtual Host Name which is mapped to this IP and used as the listen address for the involved server. When a Managed Server fails over to another machine, the VIP is enabled in the failover node by Node Manager. Service migration does not require a VIP.

Since server migration requires a full restart of a managed server, it involves a higher failover latency than service migration. Table 23-1 summarizes the different aspects.

Table 23-1 Different Aspects of WSM and ASM

Cluster Protection Failover Time Capacity Planning Reliability Shared Storage/DB  VIP per Managed Server

WSM

4–5 mins

Full Server running

DB Leasing

 

Yes

 

Yes

 

ASM

30 secs

 

Mem/CPU of services

 

DB Leasing

 

Yes

 

No

 

Understanding Which Products and Components Require Whole Server Migration and Service Migration

Note that the table lists the recommended best practice. It does not preclude you from using Whole Server or Automatic Server Migration for those components that support it.

Component Whole Server Migration (WSM) Automatic Service Migration (ASM)

Oracle Web Services Manager (OWSM)

NO

NO

Oracle SOA Suite

NO

YES

Oracle Service Bus

NO

YES

Oracle Business Process Management

NO

YES

Oracle Enterprise Scheduler

NO

NO

Oracle Business Activity Monitoring

NO

YES

Oracle B2B

NO

YES

Managed File Transfer

NO

YES

Creating a GridLink Data Source for Leasing

Whole Server Migration and Automatic Service Migration require a data source for the leasing table, which is a tablespace created automatically as part of the Oracle WebLogic Server schemas by the Repository Creation Utility (RCU).

Note:

To accomplish data source consolidation and connection usage reduction, you can reuse the WLSSchemaDatasource as is for database leasing. This datasource is already configured with the FMW1221_WLS_RUNTIME schema, where the leasing table is stored.

For an enterprise deployment, you should create a GridLink data source:

  1. Log in to the Oracle WebLogic Server Administration Console.
  2. If you have not already done so, in the Change Center, click Lock & Edit.
  3. In the Domain Structure tree, expand Services, then select Data Sources.
  4. On the Summary of Data Sources page, click New and select GridLink Data Source, and enter the following:
    • Enter a logical name for the data source in the Name field. For example, Leasing.

    • Enter a name for JNDI. For example, jdbc/leasing.

    • For the Database Driver, select Oracle's Driver (Thin) for GridLink Connections Versions: Any.

    • Click Next.

  5. In the Transaction Options page, clear the Supports Global Transactions check box, and then click Next.
  6. In the GridLink Data Source Connection Properties Options screen, select Enter individual listener information and click Next.
  7. Enter the following connection properties:
    • Service Name: Enter the service name of the database with lowercase characters. For a GridLink data source, you must enter the Oracle RAC service name. For example:

      soaedg.example.com

    • Host Name and Port: Enter the SCAN address and port for the RAC database, separated by a colon. For example:

      db-scan.example.com:1521
      

      Click Add to add the host name and port to the list box below the field.

      Figure 23-1 Specifying SCAN Address for the RAC Database

      Description of Figure 23-1 follows
      Description of "Figure 23-1 Specifying SCAN Address for the RAC Database"

      You can identify the SCAN address by querying the appropriate parameter in the database using the TCP Protocol:

      SQL>show parameter remote_listener;
      
      NAME                 TYPE        VALUE
       
      --------------------------------------------------
       
      remote_listener     string      db-scan.example.com

      Note:

      For Oracle Database 11g Release 1 (11.1), use the virtual IP and port of each database instance listener, for example:

      dbhost1-vip.mycompany.com (port 1521) 

      and

      dbhost2-vip.mycompany.com (1521)
      

      For Oracle Database 10g, use multi data sources to connect to an Oracle RAC database. For information about configuring multi data sources, see Using Multi Data Sources with Oracle RAC.

    • Database User Name: Enter the following:

      FMW1221_WLS_RUNTIME

      In this example, FMW1221 is the prefix you used when you created the schemas as you prepared to configure the initial enterprise manager domain.

      Note that in previous versions of Oracle Fusion Middleware, you had to manually create a user and tablespace for the migration leasing table. In Fusion Middleware 12c (12.2.1), the leasing table is created automatically when you create the WLS schemas with the Repository Creation Utility (RCU).

    • Password: Enter the password you used when you created the WLS schema in RCU.

    • Confirm Password: Enter the password again and click Next.

  8. On the Test GridLink Database Connection page, review the connection parameters and click Test All Listeners.

    Here is an example of a successful connection notification:

    Connection test for jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=db-scan.example.com)
    (PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=soaedg.example.com))) succeeded.
    

    Click Next.

  9. In the ONS Client Configuration page, do the following:
    • Select FAN Enabled to subscribe to and process Oracle FAN events.

    • Enter the SCAN address in the ONS Host and Port field, and then click Add.

      This value should be the ONS host and ONS remote port for the RAC database. To find the ONS remote port for the database, you can use the following command on the database host:

      [orcl@db-scan1 ~]$ srvctl config nodeapps -s
       
      ONS exists: Local port 6100, remote port 6200, EM port 2016
      
    • Click Next.

    Note:

    For Oracle Database 11g Release 1 (11.1), use the hostname and port of each database's ONS service, for example:

    custdbhost1.example.com (port 6200)
    

    and

    custdbhost2.example.com (6200)
    
  10. On the Test ONS Client Configuration page, review the connection parameters and click Test All ONS Nodes.

    Here is an example of a successful connection notification:

    Connection test for db-scan.example.com:6200 succeeded.

    Click Next.

  11. In the Select Targets page, select the cluster that you are configuring for Whole Server Migration or Automatic Service Migration, and then select All Servers in the cluster.
  12. Click Finish.
  13. Click Activate Changes.

Configuring Whole Server Migration for an Enterprise Deployment

After you have prepared your domain for whole server migration or automatic service migration, you can configure Whole Server Migration for specific Managed Servers within a cluster.

Note:

As mentioned earlier, for migration to work, servers must use a virtual hostname that matches a floating IP, as the listen address. You can specify the listen address directly in the Configuration Wizard or update it in the administration console.

Editing the Node Manager's Properties File to Enable Whole Server Migration

Use the section to edit the Node Manager properties file on the two nodes where the servers are running.

  1. Locate and open the following file with a text editor:
    MSERVER_HOME/nodemanager/nodmeanager.properties
    
  2. If not done already, set the StartScriptEnabled property in the nodemanager.properties file to true.

    This is required to enable Node Manager to start the managed servers.

  3. Add the following properties to the nodemanager.properties file to enable server migration to work properly:
    • Interface

      Interface=eth0
      

      This property specifies the interface name for the floating IP (eth0, for example).

      Note:

      Do not specify the sub interface, such as eth0:1 or eth0:2. This interface is to be used without the :0, or :1.

      The Node Manager's scripts traverse the different :X enabled IPs to determine which to add or remove. For example, the valid values in Linux environments are eth0, eth1, or, eth2, eth3, ethn, depending on the number of interfaces configured.

    • NetMask

      NetMask=255.255.255.0
      

      This property specifies the net mask for the interface for the floating IP.

    • UseMACBroadcast

      UseMACBroadcast=true
      

      This property specifies whether to use a node's MAC address when sending ARP packets, that is, whether to use the -b flag in the arping command.

  4. Restart the Node Manager.
  5. Verify in the output of Node Manager (the shell where the Node Manager is started) that these properties are in use. Otherwise, problems may occur during migration. The output should be similar to the following:
    ...
    SecureListener=true
    LogCount=1
    eth0=*,NetMask=255.255.255.0
    ...

Setting Environment and Superuser Privileges for the wlsifconfig.sh Script

Use this section to set the environment and superuser privileges for the wlsifconfig.sh script, which is used to transfer IP addresses from one machine to another during migration. It must be able to run ifconfig, which is generally only available to superusers.

For more information about the wlsifconfig.sh script, see Configuring Automatic Whole Server Migration in Administering Clusters for Oracle WebLogic Server.

Refer to the following sections for instructions on preparing your system to run the wlsifconfig.sh script.

Setting the PATH Environment Variable for the wlsifconfig.sh Script

Ensure that the commands listed in the following table are included in the PATH environment variable for each host computers.

File Directory Location

wlsifconfig.sh

MSERVER_HOME/bin/server_migration

wlscontrol.sh

WL_HOME/common/bin

nodemanager.domains

MSERVER_HOME/nodemanager
Granting Privileges to the wlsifconfig.sh Script

Grant sudo privilege to the operating system user (for example, oracle) with no password restriction, and grant execute privilege on the /sbin/ifconfig and /sbin/arping binaries.

Note:

For security reasons, sudo should be restricted to the subset of commands required to run the wlsifconfig.sh script.

Ask the system administrator for the sudo and system rights as appropriate to perform this required configuration task.

The following is an example of an entry inside /etc/sudoers granting sudo execution privilege for oracle to run ifconfig and arping:

Defaults:oracle !requiretty
oracle ALL=NOPASSWD: /sbin/ifconfig,/sbin/arping

Configuring Server Migration Targets

To configure migration in a cluster:

  1. Sign in to the Oracle WebLogic Server Administration Console.

  2. In the Domain Structure window, expand Environment and select Clusters. The Summary of Clusters page is displayed.

  3. Click the cluster for which you want to configure migration in the Name column of the table.

  4. Click the Migration tab.

  5. Click Lock & Edit.

  6. Select Database as Migration Basis. From the drop-down list, select Leasing as Data Source For Automatic Migration.

  7. Under Candidate Machines For Migratable Server, in the Available filed, select the Managed Servers in the cluster and click the right arrow to move them to Chosen.

  8. Click Save.

  9. Set the Candidate Machines for Server Migration. You must perform this task for all of the managed servers as follows:

    1. In Domain Structure window of the Oracle WebLogic Server Administration Console, expand Environment and select Servers.

    2. Select the server for which you want to configure migration.

    3. Click the Migration tab.

    4. Select Automatic Server Migration Enabled and click Save.

      This enables the Node Manager to start a failed server on the target node automatically.

      For information on targeting applications and resources, see Using Multi Data Sources with Oracle RAC.

    5. In the Available field, located in the Migration Configuration section, select the machines to which to allow migration and click the right arrow.

      In this step, you are identifying the host to which the Managed Server should failover if the current host is unavailable. For example, for the Managed Server on the HOST1, select HOST2; for the Managed Server on HOST2, select HOST1.

    Tip:

    Click Customize this table in the Summary of Servers page, move Current Machine from the Available Window to the Chosen window to view the machine on which the server is running. This is different from the configuration if the server is migrated automatically.

  10. Click Activate Changes.

  11. Restart the Administration Server and the servers for which server migration has been configured.

Testing Whole Server Migration

Perform the steps in this section to verify that automatic whole server migration is working properly.

To test from Node 1:

  1. Stop the managed server process.

    kill -9 pid
    

    pid specifies the process ID of the managed server. You can identify the pid in the node by running this command:

  2. Watch the Node Manager console (the terminal window where you performed the kill command): you should see a message indicating that the managed server's floating IP has been disabled.

  3. Wait for the Node Manager to try a second restart of the Managed Server. Node Manager waits for a period of 30 seconds before trying this restart.

  4. After node manager restarts the server and before it reaches Running state, kill the associated process again.

    Node Manager should log a message indicating that the server will not be restarted again locally.

    Note:

    The number of restarts required is determined by the RestartMax parameter in the following configuration file:

    The default value is RestartMax=2.

To test from Node 2:

  1. Watch the local Node Manager console. After 30 seconds since the last try to restart the managed server on Node 1, Node Manager on Node 2 should prompt that the floating IP for the managed server is being brought up and that the server is being restarted in this node.

  2. Access a product URL by using the same IP address. If the URL is successful, then the migration was successful.

Verification From the Administration Console

You can also verify migration using the Oracle WebLogic Server Administration Console:

  1. Log in to the Administration Console.
  2. Click Domain on the left console.
  3. Click the Monitoring tab and then the Migration subtab.

    The Migration Status table provides information on the status of the migration.

Note:

After a server is migrated, to fail it back to its original machine, stop the managed server from the Oracle WebLogic Administration Console and then start it again. The appropriate Node Manager starts the managed server on the machine to which it was originally assigned.

Configuring Automatic Service Migration in an Enterprise Deployment

You may need to configure automatic service migration for specific services in an enterprise deployment.

Setting the Leasing Mechanism and Data Source for an Enterprise Deployment Cluster

Before you can configure automatic service migration, you must verify the leasing mechanism and data source that is used by the automatic service migration feature. You must configure the leasing mechanism and datasource for both static and dynamic clusters.

Note:

To accomplish data source consolidation and connection usage reduction, you can reuse the WLSSchemaDatasource datasource as is for database leasing. This datasource is already configured with the FMW1221_WLS_RUNTIME schema, where the leasing table is stored.

The following procedure assumes that you have configured the Leasing data source either by reusing the WLSSChemaDatasource or a custom datasource that you created as described in Creating a GridLink Data Source for Leasing.

  1. Log in to the Oracle WebLogic Server Administration Console.
  2. Click Lock & Edit.
  3. In the Domain Structure window, expand Environment and select Clusters.
    The Summary of Clusters page appears.
  4. In the Name column of the table, click the cluster for which you want to configure migration.
  5. Click the Migration tab.
  6. Verify that Database is selected in the Migration Basis drop-down menu.
  7. From the Data Source for Automatic Migration drop-down menu, select the Leasing data source that you created in Creating a GridLink Data Source for Leasing. Select the WLSSchemaDatasource for data source consolidation.
  8. Click Save.
  9. Activate changes.
  10. Restart the managed servers for the changes to be effective. If you are configuring other aspects of ASM in the same configuration change session, you can use a final unique restart to reduce downtime.

After you complete the database leasing configuration, continue with the configuration of the service migration, with static or dynamic cluster:

Configuring Automatic Service Migration for Static Clusters

After you have configured the leasing for the cluster as described in Setting the Leasing Mechanism and Data Source for an Enterprise Deployment Cluster, you can configure automatic service migration for specific services in an enterprise deployment. The following sections explain how to configure and validate Automatic Service Migration for static clusters.

Changing the Migration Settings for the Managed Servers in the Cluster

After you set the leasing mechanism and data source for the cluster, you can then enable automatic JTA migration for the Managed Servers that you want to configure for service migration. Note that this topic applies only if you are deploying JTA services as part of your enterprise deployment.

To change the migration settings for the Managed Servers in each cluster:
  1. If you haven’t already, log in to the Administration Console, and click Lock & Edit.
  2. In the Domain Structure pane, expand the Environment node and then click Servers.
    The Summary of Servers page appears.
  3. Click the name of the server you want to modify in Name column of the table.
    The settings page for the selected server appears and defaults to the Configuration tab.
  4. Click the Migration tab.
  5. From the JTA Migration Policy drop-down menu, select Failure Recovery.
  6. In the JTA Candidate Servers section of the page, select the Managed Servers in the Available list box, and then click the move button to move them into the Chosen list box.
  7. In the JMS Service Candidate Servers section of the page, select the Managed Servers in the Available list box, and then click the move button to move them into the Chosen list box.
  8. Click Save.
  9. Restart the managed servers and the Administration Server for the changes to be effective. If you are configuring other aspects of ASM in the same configuration change session, you can use a final unique restart to reduce downtime.
About Selecting a Service Migration Policy

When you configure Automatic Service Migration, you select a Service Migration Policy for each cluster. This topic provides guidelines and considerations when selecting the Service Migration Policy.

For example, products or components running singletons or using Path services can benefit from the Auto-Migrate Exactly-Once policy. With this policy, if at least one Managed Server in the candidate server list is running, the services hosted by this migratable target are active somewhere in the cluster if servers fail or are administratively shut down (either gracefully or forcibly). This can cause multiple homogenous services to end up in one server on startup.

When you use this policy, you should monitor the cluster startup to identify what servers are running on each server. You can then perform a manual failback, if necessary, to place the system in a balanced configuration.

Other Fusion Middleware components are better suited for the Auto-Migrate Failure-Recovery Services policy.

Based on these guidelines, the following policies are recommended for an Oracle SOA Suite enterprise topology:

  • SOA_Cluster: Auto-Migrate Failure-Recovery Services

  • OSB_Cluster: Auto-Migrate Failure-Recovery Services

  • BAM_Cluster: Auto-Migrate Exactly-Once Services

  • MFT_Cluster: Auto-Migrate Failure-Recovery Services

See Policies for Manual and Automatic Service Migration in Administering Clusters for Oracle WebLogic Server.

Setting the Service Migration Policy for Each Managed Server in the Cluster
After you modify the migration settings for each server in the cluster, you can then identify the services and set the migration policy for each Managed Server in the cluster, using the WebLogic Administration Console:
  1. If you have not already, log in to the Administration Console, and click Lock & Edit.
  2. In the Domain Structure pane, expand Environment, then expand Clusters, then select Migratable Targets.
  3. Click the name of the first Managed Server in the cluster.
  4. Click the Migration tab.
  5. From the Service Migration Policy drop-down menu, select the appropriate policy for the cluster.
  6. Click Save.
  7. Repeat steps 2 through 6 for each of the additional Managed Servers in the cluster.
  8. Activate the changes.
  9. Restart the managed servers for the changes to be effective. If you are configuring other aspects of ASM in the same configuration change session, you can use a final unique restart to reduce downtime.
Validating Automatic Service Migration in Static Clusters
After you configure automatic service migration for your cluster and Managed Servers, validate the configuration, as follows:
  1. If you have not already done so, log in to the Administration Console.
  2. In the Domain Structure pane, expand Environment, and then expand Clusters.
  3. Click Migratable Targets.
  4. Click the Control tab.
    The console displays a list of migratable targets and their current hosting server.
  5. In the Migratable Targets table, select a row for the one of the migratable targets.
  6. Note the value in the Current Hosting Server column.
  7. Use the operating system command line to stop the first Managed Server.

    Use the following command to end the Managed Server Process and simulate a crash scenario:

    kill -9 pid
    

    In this example, replace pid with the process ID (PID) of the Managed Server. You can identify the PID by running the following UNIX command:

    ps -ef | grep managed_server_name
    

    Note:

    After you kill the process, the Managed Server might be configured to start automatically. In this case, you must kill the second process using the kill –9 command again.

  8. Watch the terminal window (or console) where the Node Manager is running.

    You should see a message indicating that the selected Managed Server has failed. The message is similar to the following:

    <INFO> <domain_name> <server_name> 
    <The server 'server_name' with process id 4668 is no longer alive; waiting for the process to die.>
    <INFO> <domain_name> <server_name> 
    <Server failed during startup. It may be retried according to the auto restart configuration.>
    <INFO> <domain_name> <server_name>
    <Server failed but will not be restarted because the maximum number of restart attempts has been exceeded.>
  9. Return to the Oracle WebLogic Server Administration Console and refresh the table of migratable targets; verify that the migratable targets are transferred to the remaining, running Managed Server in the cluster:
    • Verify that the Current Hosting Server for the process you killed is now updated to show that it has been migrated to a different host.
    • Verify that the value in the Status of Last Migration column for the process is Succeeded.
  10. Open and review the log files for the Managed Servers that are now hosting the services; look for any JTA or JMS errors.

    Note:

    For JMS tests, it is a good practice to get message counts from destinations and make sure that there are no stuck messages in any of the migratable targets:

    For example, for uniform distributed destinations (UDDs):

    1. Access the JMS Subdeployment module in the Administration Console:

      In the Domain Structure pane, select Services, then Messaging, and then JMS Modules.

    2. Click the JMS Module.

    3. In the Summary of Resources table, click Destinations, and then click the Monitoring tab.

    4. Review the Messages Total and Messages Pending values. Click Customize table to add these columns to the table, if these values do not appear in the table.

Failing Back Services After Automatic Service Migration

When Automatic Service Migration occurs, Oracle WebLogic Server does not support failing back services to their original server when a server is back online and rejoins the cluster.

As a result, after the Automatic Service Migration migrates specific JMS services to a backup server during a fail-over, it does not migrate the services back to the original server after the original server is back online. Instead, you must migrate the services back to the original server manually.

To fail back a service to its original server, follow these steps:

  1. If you have not already done so, in the Change Center of the Administration Console, click Lock & Edit.

  2. In the Domain Structure tree, expand Environment, expand Clusters, and then select Migratable Targets.

  3. To migrate one or more migratable targets at once, on the Summary of Migratable Targets page:

    1. Click the Control tab.

    2. Use the check boxes to select one or more migratable targets to migrate.

    3. Click Migrate.

    4. Use the New hosting server drop-down to select the original Managed Server.

    5. Click OK.

      A request is submitted to migrate the JMS-related service. In the Migratable Targets table, the Status of Last Migration column indicates whether the requested migration has succeeded or failed.

    6. Release the edit lock after the migration is successful.

Configuring Automatic Service Migration for Dynamic Clusters

After you have configured the leasing for the cluster as described in Setting the Leasing Mechanism and Data Source for an Enterprise Deployment Cluster, you can continue with the Service Migration configuration.

Dynamic Clusters simplify the configuration for service migration because the services are targeted to the entire cluster. However, you still have to configure the migration policy at the custom persistent store level and for the JTA service. These policies determine the migration behavior of JMS and JTA services, respectively.

About Selecting a Service Migration Policy for Dynamic Clusters

When you configure service migration for dynamic clusters, you select a Service Migration Policy for each persistent store. This topic provides guidelines and considerations when you select the Service Migration Policy. The following options are available:

  • Off: Disables migration and restart support for cluster-targeted JMS service objects, including the ability to restart a failed persistent store instance and its associated services. You cannot combine this policy with the Singleton Migration Policy.

  • On-Failure: Enables automatic migration and restart of instances on the failure of a subsystem Service or the WebLogic Server instance, including automatic fail-back and load balancing of instances.

  • Always: Provides the same behavior as On-Failure and automatically migrates instances even if a graceful shutdown or a partial cluster start occurs.

Products or components that run singletons or use Path services can benefit from the Always policy. With this policy, if at least one Managed Server is running, the instances remain active somewhere in the cluster if servers fail or are administratively shut down (either gracefully or forcibly). This type of failure or shutdown can cause multiple homogenous services to end up in one server on startup.

Other Fusion Middleware components are better suited for the On-Failure policy.

Based on these guidelines, the following policies are recommended for an Oracle SOA Suite enterprise topology:

  • SOA_Cluster: On-Failure

  • OSB_Cluster: On-Failure

  • MFT_Cluster: On-Failure

For information about the JMS configuration for high availability, see Simplified JMS Cluster and High Availability Configuration.

Changing the Migration Settings for the Persistent Stores
After you choose the migration policy for each cluster, you can identify the persistent stores of the cluster and set the migration policy for each cluster by using the WebLogic Administration Console:
  1. Log in to the Administration Console, if you have not already done so, and click Lock & Edit.
  2. In the Domain Structure pane, expand Environment, expand Services, and then select Persistent Stores.
  3. Click the name of the Persistent Store that you want to modify.

    Note:

    When you use JDBC persistent stores, additional unused File Stores are automatically created but are not targeted to your clusters. Ignore these File Stores.

  4. Click the High Availability tab.
  5. From the Migration Policy drop-down menu, select the appropriate policy for the cluster. See About Selecting a Service Migration Policy for Dynamic Clusters.
  6. Click Save.
  7. Repeat steps 2 through 6 for each additional persistent store in the cluster.
  8. Click Activate Changes.
  9. Restart the managed servers for the changes to be effective. If you are configuring other aspects of service migration in the same configuration change session, you can use a final unique restart to reduce downtime.
Changing the Migration Settings for the JTA Service
You must set the appropriate migration policy for the JTA service in each server so that any member in the cluster can resume the XA logs in the event of a failure or shutdown of one of the members of the dynamic cluster. To set the migration policy for the servers in a dynamic cluster, follow these steps:
  1. Log in to the FMW Control Console by accessing ADMINVHN:7001/em and by using the required credentials.
  2. Click the lock icon on the upper right corner and click Lock & Edit.
  3. On the target navigation tree on the left, select the relevant domain.
  4. Click Weblogic Domain > Environment > Server templates.
  5. Click the relevant template and then, click the Migration tab.
  6. From the JTA Migration Policy drop-down list, select the required migration policy for the service. The settings required for each SOA component is as follows. (Some may not be shown, depending on what has been installed.):
    • SOA_Cluster: Failure Recovery

    • OSB_Cluster: Failure Recovery

    • MFT_Cluster: Failure Recovery

  7. Click Save.
  8. Click the lock icon on the upper right corner and click Activate Changes.
  9. Restart the managed servers and the Administration Server for the changes to be effective.
Validating Automatic Service Migration in Dynamic Clusters
After you configure service migration for your dynamic cluster, validate the configuration, as follows:
  1. Log in to the Administration Console, if you have not already done so.
  2. In the Domain Structure pane, select Environment, and then Clusters.
  3. Click in the cluster where you want to verify the service migration.
  4. Click the Monitoring tab, then Health.
    The console displays a list of the servers of the cluster and their state.
  5. Expand each managed server and verify that its persistent stores are okay.
  6. In Domain Structure pane, select Environment > Services > Messaging > JMS Servers.
  7. Click on one of the JMS Servers of the cluster, and then click the Monitoring tab.

    Verify that you see two instances (one per dynamic server) and each instance is running on one of the dynamic servers.

  8. Use the operating system command line to stop the first Managed Server. Use the following command to end the Managed Server process and simulate a crash scenario:
    kill -9 pid
    In this example, replace pid with the process ID (PID) of the Managed Server. You can identify the PID by running the following UNIX command:
    ps -ef | grep managed_server_name

    Note:

    You can configure the Managed Server to start automatically after you initially kill the process. In this case, you must kill the second process by using the kill –9 command again.

  9. Watch the terminal window (or console) where the Node Manager is running.
    You see a message indicating that the selected Managed Server has failed. The message appears as follows:
    <INFO> <domain_name> <server_name>
    <The server 'server_name' with process id 4668 is no longer alive; waiting for the process to die.>
    <INFO> <domain_name> <server_name>
    <Server failed during startup. It may be retried according to the auto restart configuration.>
    <INFO> <domain_name> <server_name>
    <Server failed but will not be restarted because the maximum number of restart attempts has been exceeded.>
  10. Return to the Oracle WebLogic Server Administration Console and refresh the table of Cluster > Monitoring > Health. Verify that the persistent stores are now running in the remaining Managed Server that is still running.
  11. In Domain Structure pane, select Environment > Services > Messaging > JMS Servers.
  12. Click on one of the JMS Servers of the cluster, and then click the Monitoring tab.

    Verify that both the instances continue to run on the remaining Managed Server that is still running.

  13. Open and review the log files for the Managed Servers that are now hosting the services. Look for any JTA or JMS errors.

    Note:

    For JMS tests, it is a good practice to get message counts from destinations and ensure that messages are not stuck in the migratable targets. For example, for uniform distributed destinations (UDDs):

    1. Access the JMS Subdeployment module in the Administration Console.

    2. In the Domain Structure pane, select Services > Messaging > JMS Modules.

    3. Click the JMS Module.

    4. In the Summary of Resources table, click Destinations, and then click the Monitoring tab. Review the Messages Total and Messages Pending values.

    Click Customize table to add these columns to the table, if these values do not appear in the table.

  14. Review the logs. The messages appear as follows in the remaining server:
    <Info> <Cluster> <soahost1> <WLS_SOA1> <[STANDBY] ExecuteThread: 
    '43' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> 
    <> <49c99f17-a5d6-487d-a710-65eef0262ebc-0000063c> <1489481002608> 
    <[severity-value: 64] [rid: 0] [partition-id: 0] [partition-name: DOMAIN] > <BEA-000189> 
    <The Singleton Service UMSJMSJDBCStore_auto_1_WLS_SOA2 is now active on this server.>
    
    <Info> <Cluster> <soahost1> <WLS_SOA1> <[STANDBY] ExecuteThread: 
    '43' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> 
    <> <49c99f17-a5d6-487d-a710-65eef0262ebc-0000063c> <1489481002609> 
    <[severity-value: 64] [rid: 0] [partition-id: 0] [partition-name: DOMAIN] > <BEA-003130> 
    <UMSJMSJDBCStore_auto_1_WLS_SOA2 successfully activated on server WLS_SOA1.>

    For more information, you can debug with the following flags:

    -Dweblogic.debug.DebugSingletonServices=true -
    Dweblogic.debug.DebugServerMigration=true
Failing Back Services After Automatic Service Migration

With dynamic clustering, when a distributed instance is migrated from its preferred server, it tries to fail back when the preferred server is restarted. Therefore, after the service migration process migrates specific persistent store services to a backup server during a failover, it migrates the services back to the original server after the original server is back online.