Oracle® Enterprise Manager Ops Center

Recovering Logical Domains from a Failed Server

12c Release 2 (12.2.0.0.0)

E48171-01

January 2014

This guide provides an end-to-end example for how to use Oracle Enterprise Manager Ops Center.

Introduction

Oracle Enterprise Manager Ops Center provides options to install and configure Oracle VM Server for SPARC systems, create logical domains, and provision Oracle Solaris operating systems on the logical domains. You can pool the Oracle VM Server for SPARC systems in a server pool which provides load balancing, high availability capabilities, and sharing resources with all the members of the pool.

The high availability capability for an Oracle VM Server for SPARC server pool is enhanced by allowing the automatic recovery of the logical domains on a failed server.

In Oracle Enterprise Manager Ops Center, you can recover the logical domains from failed and unreachable Oracle VM Server for SPARC systems. You can enable automatic recovery for the logical domains and set the priority of recovery. The automatic recovery priority decides the order of recovery of the logical domains. Zero (0) is the lowest automatic recovery priority while 100 is the highest, the logical domain with the highest priority is the first to recover.

When an Oracle VM Server control domain fails, each of its logical domains configured to automatically recover is started on another control domain in the server pool according to its priority. The selection of the control domain is done according to the placement policy of the server pool until no enough available resources exist in the server pool. If the recovery process fails for a logical domain, a critical incident is raised for the logical domain.

For a logical domain with redundant storage access, you can authorize to recover the logical domain without the redundant access configuration if no control domains in the server pool can provide it. If you authorize to recover a logical domain without the redundant access configuration, the logical domain is automatically recovered and a warning incident is raised for the logical domain to inform about the loss of the redundant configuration.

There are many scenarios and conditions that determine the recovery of the logical domains. In this example, two such scenarios are described:

  • Automatic recovery of logical domains

  • Manual recovery of logical domains

Some logical domains can not be automatically or manually recovered because they have one or more of the following conditions:

  • They have non-shared storage.

  • They have non-shared metadata.

  • They have attached I/O bus, PCI endpoint, or virtual functions.

In this example, the control domain is placed in a server pool and has logical domains running in it. When the control domain becomes unreachable, the logical domains that have been enabled for automatic recovery are recovered and started in another control domain in the server pool automatically. When the logical domains are not enabled for automatic recovery, the logical domains can be recovered from the failed control domain using the manual procedure described in this guide.

See Related Articles and Resources for links to related information and articles

What You Will Need?

You will need the following for showcasing the recovery of the logical domains:

  • Two Oracle VM Server for SPARC servers installed and configured with the Oracle VM Server for SPARC agent.

  • The Oracle VM Server for SPARC servers are placed in a server pool with the option to power-off a failed server on automatic recovery enabled. Both servers must use the same Oracle VM Server for SPARC version.

  • Two logical domains installed and configured on one of the Oracle VM Server for SPARC system using Oracle Enterprise Manager Ops Center.

Hardware and Software Configuration

The Oracle VM Servers for SPARC are of the following configuration:

  • In this example, the servers named as smt4-14 and smt4-15 are installed and configured with Oracle VM Server for SPARC 3.0.0.4 using Oracle Enterprise Manager Ops Center.

    Description of ldom_server_version.png follows
    Description of the illustration ldom_server_version.png

  • Control domains are placed in a server pool with the following policies:

    • Place guest in Oracle VM Server with lowest relative load.

    • Do not automatically balance the server pool.

    • Power off a failed server from Service Processor, given capabilities, before automatic recovery of attached logical domains.

  • Two logical domains, guest1 and guest2 are created in the control domain smt4-15.

    Description of logical_domain_view.png follows
    Description of the illustration logical_domain_view.png

Recovering Logical Domains

In this example, the following two scenarios are described:

  • Automatic recovery

  • Manual recovery

There are two logical domains guest1 and guest2 in this example. The logical domain guest1 is designed for manual recovery and the logical domain guest2 for an automatic recovery. The control domain smt4-15 in which the logical domains resides becomes unreachable. Select a topic to see how the recovery procedures are executed:

Automatic Recovery of Logical Domains

To recover the logical domains automatically, you must have enabled the automatic recovery of the logical domains. You can enable the automatic recovery of logical domains in the following ways:

  • Set the automatic recovery option when you create the logical domain profile. Select the automatic recovery and the provide the priority value in the logical domain profile.

  • Select the logical domain and use the option Enable Automatic Recovery in the Actions pane to trigger the recovery of logical domains automatically when a server fails. Edit the Automatic Recovery Priority using the Edit Attributes option for a logical domain. The Enable Automatic Recovery is shown in the figure below.

    Description of disabled_auto.png follows
    Description of the illustration disabled_auto.png

The option to enable automatic recovery is not be enabled for logical domains that are not recoverable.

In this example, the logical domain guest2 is enabled for an automatic recovery with an Automatic Recovery Priority of 100, which is the highest priority for recovery.

Description of guest2_recovery.png follows
Description of the illustration guest2_recovery.png

When an Oracle VM Server for SPARC in the server pool fails, the logical domains that have been enabled for automatic recovery are recovered and started on another Oracle VM Server in the server pool without any user intervention.

When the control domain smt4-15 fails and becomes unreachable, the automatic recovery of the logical domain guest2 is triggered. The status of smt4-15 is unreachable as shown in the figure below. According to the server pool policy for this example, if Oracle Enterprise Manager Ops Center fails to power off the server, then you need to execute a manual recovery.

Description of unreach_status_view.png follows
Description of the illustration unreach_status_view.png

You can view the job running in the job pane.

Description of auto_recov_job.png follows
Description of the illustration auto_recov_job.png

Select the job and view the job details such as the task flow execution.

Description of recovery_job_details.png follows
Description of the illustration recovery_job_details.png

From the job details, you can view that the server smt4-15 is powered off according to the server pool policy. The logical domain guest2 recovery is initiated and created successfully on the control domain smt4-14 in the server pool. When the logical domain guest2 is recovered, the server pool status is as in the following figure:

Description of after_recovery1.png follows
Description of the illustration after_recovery1.png

You can view the logical domain guest2 recovered and running on the control domain smt4-14. The control domain smt4-15 is in unreachable status and the logical domain guest1 has disappeared from the list.

When the logical domain is recovered on the other host in the server pool, Oracle Enterprise Manager Ops Center takes care to auto boot the operating system of the logical domain. Allow some time for the logical domain to get started on the new virtualization host as its operating system gets booted.

If there are no resources available in the server pool to recover the logical domains, Oracle Enterprise Manager Ops Center checks periodically for free resources to retry the automatic recovery mechanism.

When the failed server is repaired and restarted, the logical domains that were not recovered are started in the control domain. For the logical domains that are recovered and running on other servers, Oracle Enterprise Manager Ops Center cleans up the repaired server and removes those logical domains.

In a scenario where you cannot repair the failed server, you must manually recover the logical domains.

Manual Recovery of Logical Domains

When you have not enabled automatic recovery for logical domains or you do not have enough resources to recover the logical domains in a server pool, then use the manual procedure to recover the logical domains.

When the control domain smt4-15 becomes unreachable, do not try to remove it from the server pool using the option Remove from Server Pool. You cannot remove a control domain with running guests from a server pool.

Description of cannot_remove_frm_pool.png follows
Description of the illustration cannot_remove_frm_pool.png

As described in the previous section, the logical domain guest1 was not enabled for automatic recovery. Use the following procedure to manually recover the logical domain.

  1. Isolate the failed server.

    You can log in to the ALOM or ILOM of the physical server and shut down the server.

  2. Power-off the failed server.

    Description of power_off.png follows
    Description of the illustration power_off.png

  3. Check whether the failed server is flagged as unavailable in Oracle Enterprise Manager Ops Center UI. This status is updated within 5 minutes approximately.Select All Assets as filter in the Navigation pane.

  4. Select Managed Assets tab in the center pane.

  5. Select the unreachable control domain from the list. Ensure that you select the control domain and not the operating system of the control domain. Verify the value of the Description column is Oracle VM Server for SPARC.

  6. Click Delete Assets to delete the asset.

    Description of delete_asset.png follows
    Description of the illustration delete_asset.png

  7. Click Delete to confirm the delete action.

    Description of delete_the_asset.png follows
    Description of the illustration delete_the_asset.png

    After the delete asset job completes, the service processor and the control domain disappear from the assets tree.

  8. Select the server pool in which the control domain was originally placed. Verify that the logical domain guest1 appears in the server pool under the Shutdown Guests list.

    Description of result_guest1.png follows
    Description of the illustration result_guest1.png

    From the figure, you can see that the logical domain guest2 which was enabled for automatic recovery was recovered and running in another control domain in the server pool. The logical domain guest1 is also recovered and available as shutdown guest in the server pool.

What's Next?

Use the option Start to start the shutdown logical domain on another Oracle VM Server of the server pool.

You must do the following when you want to re-introduce the repaired Oracle VM Server back into Oracle Enterprise Manager Ops Center:

  1. Start the failed server manually.

  2. Login to the console of the failed server to verify that logical domains recovered on another control domain in the server pool were started but with their OS not booted to prevent any data corruption.

    Logical domains are started with their OS not booted until the Oracle Enterprise Manager Ops Center Agent Controller starts and verifies for each logical domain if the logical domain was recovered on another server. The Ops Center agent:

    • Removes the logical domains that were recovered on other servers without ever booting its OS since last startup.

    • Boots the OS of the logical domains that were not recovered on other servers.

  3. Wait until the agent finished its startup on the control domain.

    You can wait until the Oracle Solaris command /usr/bin/svcs -xv doesn't show anymore the service svc:/application/management/common-agent-container-1:scn-agent is starting.

  4. Discover the control domain in Oracle Enterprise Manager Ops Center selecting the option Enable Oracle VM for Sparc management. Refer to Discovering Existing Oracle VM Server for SPARC Environments for more information about the procedure.

  5. Add the control domain to the server pool using the Add Oracle VM Servers action. Refer to Exploring your Server Pools for an example on adding an Oracle VM Server.

Related Articles and Resources

See the following guides for more information:

For more end-to-end examples, see the workflows and how to documentation in the Operate How To library at http://docs.oracle.com/cd/E40871_01/nav/operatehowto.htm.

For more information, see the Oracle Enterprise Manager Ops Center Documentation Library at http://docs.oracle.com/cd/E40871_01/index.htm.

Documentation Accessibility

For information about Oracle's commitment to accessibility, visit the Oracle Accessibility Program website at http://www.oracle.com/pls/topic/lookup?ctx=acc&id=docacc.

Access to Oracle Support

Oracle customers have access to electronic support through My Oracle Support. For information, visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=info or visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=trs if you are hearing impaired.


Oracle Enterprise Manager Ops Center Recovering Logical Domains from a Failed Server, 12c Release 2 (12.2.0.0.0)

E48171-01

Copyright © 2007, 2014, Oracle and/or its affiliates. All rights reserved.

This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited.

The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing.

If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, the following notice is applicable:

U.S. GOVERNMENT END USERS: Oracle programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, delivered to U.S. Government end users are "commercial computer software" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, use, duplication, disclosure, modification, and adaptation of the programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, shall be subject to license terms and license restrictions applicable to the programs. No other rights are granted to the U.S. Government.

This software or hardware is developed for general use in a variety of information management applications. It is not developed or intended for use in any inherently dangerous applications, including applications that may create a risk of personal injury. If you use this software or hardware in dangerous applications, then you shall be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure its safe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of this software or hardware in dangerous applications.

Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group.

This software or hardware and documentation may provide access to or information on content, products, and services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third-party content, products, and services. Oracle Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred due to your access to or use of third-party content, products, or services.