8 Managing a Topology

8.1 Exalogic Startup and Shutdown Procedure

It is important to follow the proper sequence in order to startup and shutdown Exalogic and its components.

Refer to the startup sequence and shutdown procedure, ZFS Storage Appliance Power On and Off Procedure and procedures to Start up or Shutdown Exalogic, Control Stack, and Guest vServers.

This section contains the following topics:

8.1.1 Exalogic Startup Sequence

Startup using the following sequence of steps.

  1. Power on the PDUs of the Exalogic rack
  2. Network switches

    Note:

    Ensure that the switches have the power applied for a few minutes to complete the power-on configuration before starting the storage nodes and compute nodes. On the rear side of the InfiniBand Gateway switch (the side on which the InfiniBand cables are plugged in), there are status LEDs. The OK LED on the right bottom (above the USB port) must be steady green. This implies that the gateway is functional without any fault. You can also SSH one of the InfiniBand switches and ensure that ibswitches shows all the InfiniBand Gateway switches.
  3. Storage nodes
  4. Compute nodes
  5. Exalogic Control stack vServers and services (If Using virtual Exalogic)
  6. All guest vServers (If Using virtual Exalogic)
  7. User application services

8.1.2 Exalogic Shutdown Sequence

Shutdown using the following sequence of steps.

  1. All user application services
  2. All guest vServers (If Using virtual Exalogic)
  3. All control Stack services (If Using virtual Exalogic)
  4. Exalogic Control vServers (If Using virtual Exalogic)
  5. Power off the host (OVS) of all compute nodes and storage nodes
    • First shutdown and power off the standby node

    • Power off the active storage node

  6. Network switches
  7. PDUs

8.1.3 ZFS Storage Appliance Power On and Off Procedure

To power off the ZFS appliance, perform the following sequence on the storage nodes.

  1. Shutdown and power off the stand-by storage node.
  2. Shutdown and power off the active node.
    This avoids unnecessary failover of network and storage back and forth.
  3. Power Off:
    • From CLI: maintenance system power off

    • From BUI: Click the Power off appliance icon

  4. Power On: To power on the storage nodes:
    • Push controller power button

    • ILOM: Start /SYS

8.1.4 Procedures to Start up or Shutdown Exalogic, Control Stack, and Guest vServers

Refer to the specific sections of startup and shutdown Exalogic machine, control and guest virtual servers for further information.

References to Startup and Shutdown Exalogic Machine

References to Startup and Shutdown Exalogic Control

  • ExaBR provides convenient way to stop and start the control stack. You must first install the Exalogic Lifecycle toolkit. Refer to Section Lifecycle Management Tools for more information.

  • For manual procedure, see My Oracle Support document ID 1594223.1 How To Stop and Start the Entire Exalogic Control Stack In An Exalogic EECS v2.0.6.0.0 and later Virtual releases.

    Note:

    To open a master note, perform the following steps:
    • Select My Oracle Support document ID, and press Ctrl + F9. The Attributes dialog opens.

    • In the Attribute Value field for the Url attribute, enter this URL:

      https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=note_id

    • Enter the Note ID or keyword in the search field at the top of the screen.

    • Click Set Value.

References to Startup and Shutdown Guest Virtual Servers

  • Always start and stop the guest vServers via Enterprise Manager Ops Center. Use either the BUI or IaaS CLI as described in Oracle Exalogic Elastic Cloud Administrator’s Guide at http://docs.oracle.com/cd/E18476_ 01/doc.220/e25258/proc.htm#BABDCBHC

    Note:

    It is important to start and stop vServers under vDC Management rather than from the Assets accordion in the EMOC UI.
  • Do not use the xm commands.

  • Do not use OS level shutdown command. Otherwise, vServers that are marked HA will be restarted by EMOC automatically.

8.2 Maintenance Procedures

Maintenance procedures provides information about Lifecycle Management Tools, ExaChk, ExaLogs and Patching.

Detailed information on each section is provided in the following topics.

8.2.1 Lifecycle Management Tools

Oracle Exalogic Lifecycle (ELLC) toolkit is a collection of tools that simplify, automate, and standardize lifecycle management on an Oracle Exalogic Elastic Cloud machine.

For more information, see My Oracle Support document ID 1912063.1 Exalogic Lifecycle Toolkit Release 14.2.
Exalogic Tools New Features and Enhancements
EMAgent PreSetup A new tool to prepare the Exalogic rack for Enterprise Manager 12c discovery and monitoring.
ExaBR

EECS 2.0.4 Support

STIG-hardened Linux Compute Nodes

All-ILOM Target

ExaPatch Improved platform patching
ExaLogs

Solaris Support

Credentials (access) option

Network Usage Order

ExaPasswd A new tool to automate password changes to Exalogic system components
STIGfix A new tool to make Exalogic guest vServers and Physical Linux Nodes STIG compliant
ModifyLVMImg A new tool to resize LVM-based vServers (root/swap volumes), and add or remove Linux RPMs
ExaChk

Enhanced Exalogic Health Check tool Support for No DNS

Revised scoring

Diff comparison

8.2.2 ExaChk

Exachk is a health-check tool that is designed to audit important configuration settings in an Exalogic machine.

  • Runs every quarter according to the PSU cycle

  • Before and after a maintenance

  • Attach Exachk report to the Service Request and save time

  • On a scheduled basis for comparison

For more information, see My Oracle Support document ID:

Note:

To open a master note, perform the following steps:
  • Select My Oracle Support document ID, and press Ctrl + F9. The Attributes dialog opens.

  • In the Attribute Value field for the Url attribute, enter this URL:

    https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=note_id

  • Enter the Note ID or keyword in the search field at the top of the screen.

  • Click Set Value.

8.2.3 ExaLogs

ExaLogs is a command-line tool for gathering logs, diagnostics, environment and configuration information and other data from key components in an Exalogic physical or virtual configuration.

  • For more information, see My Oracle Support document ID 1912063.1Exalogic Lifecycle Toolkit Release 14.2.

  • Run ExaLogs before and after patching (PSUs), upgrades, or when a problem arises.

  • ExaLogs are required to open up an SR.

8.2.4 Patching

Oracle Elastic Exalogic Cloud Software Recommended Patches are available within My Oracle Support.

Exalogic Patch Set Updates (PSU) are a collection of Oracle recommended patches. The PSUs are cumulative and released on quarterly schedule.

This section contains the following topic:

8.2.4.1 Patching Recommendation

To ensure that the Exalogic system continues to perform optimally, Oracle periodically provides comprehensive and well-tested patches to the system as a whole.

An Exalogic Patch Set Update (PSU) is released quarterly (January, April, July, and October) with the following features:
  • PSU is a single download that contains patches for all Exalogic components (firmware, software and OS) as necessary.

  • PSU is a highly recommended update for all Exalogic customers.

  • In addition to patches or updates for the Exalogic Infrastructure components (on-node components such as Operating System, ILOM, InfiniBand and RAID controller cards, and off-node components such as InfiniBand switches and ZFS Storage Appliance), patches for Middleware components (WLS, Coherence, JDK) are also included.

  • PSU contains optional patches for the guest OS image. They can be applied, when the schedule allows.

Exalogic users should ensure that they align their systems with Oracle’s Exalogic releases and recommended patch levels, and should refrain from applying patches which are outside of recommendations for Exalogic. For instance, if a new version of the Oracle ZFS Storage Appliance software is released and is not part of an Exalogic recommended patch or PSU, you must not update the racks with the patch. Applying patches that are not recommended can adversely affect not only the functionality but also the performance of the Exalogic system.

For systems that are in production or in late testing stages before production:

8.2.5 Troubleshooting and Action Plan

The MOS note provides information on common Exalogic outages and restoration steps to recover from those outages for Exalogic Platinum users.

Each outage is categorized as either partial or complete outage. The MOS note also provides information about troubleshooting steps to debug the problem and post issue Root Cause Analysis (RCA) data collection needed for root cause analysis of outage.

For more information, see My Oracle Support document ID 1492461.1 Exalogic Platinum Customer Outage Classifications and Restoration Action Plans.

8.3 Backup and Recovery Procedures

This provides guidelines for Exalogic system backup and recovery procedures.

Whilst hot standby systems are extremely useful for business continuity, they are expensive to maintain and need additional infrastructure. In some circumstances such as simple user errors, it may be quicker to fix the issue than to failover to the DR system especially when DNS needs updating.

Taking regular backups of a system is part of standard operating procedure for most production systems and is done irrespective of whether or not the site has a disaster recovery solution. It allows the flexibility to restore individual files should something happen to the original or the system as whole. Backups can also be stored off site in a secure location. Backups on Exalogic can be within the Exalogic system, disk-to-disk and disk-to-tape.

The data contained in an Oracle Exalogic Machine which needs to be backed up consists of:
  • Exalogic Operating System

  • Software Binaries

  • Configuration Information

  • Transactional Data, such as transaction logs as JMS queues

  • Switch Configuration

  • Other Artifacts which are stored on the disk. These objects can be backed up to:
    • Disk within the same storage appliance

    • Disk on a remote machine, which utilizes the same storage type (ZFS)

    • Disk on a remote machine, which utilizes a different storage type

  • Tape

Backup and Recovery Concepts

Volatility

Objects can be grouped by volatility. For example, the operating system changes very infrequently and therefore does not need backing up as frequently as transactional data, which changes on a frequent basis. In a typical Exalogic deployment objects can be grouped into the following categories:
Volatility Groups Volatility Example Objects
Low Oracle Binaries
Operating System
Medium Configuration Information - WLS Domain
Oracle Instance
High File based JMS Queues
Persistent Stores

Backup Frequency

The volatility of the data can be used to determine the backup frequency. In addition to volatility the following may impact the frequency in which data is backed up:
  • Volume of data to be backed up

  • Available backup windows

  • Regulatory requirements

Using the above volatility groups, the following is a sensible backup frequency.

Table 8-1 Backup Schedule

Volatility Group Backup Frequency
Low Monthly
Medium Weekly
High Daily

In addition to the scheduled backups, it makes sense to perform ad-hoc backups when major events occur. For example, it is appropriate to take an additional backup of the Oracle binaries after patching or upgrade.

Retention Periods

In determining a backup strategy, you need to factor how long you wish to keep the backups for. This is mainly dependent on your business and regulatory requirements. Using the examples above the following may be appropriate values.

Table 8-2 Retention Periods

Volatility Group Retention Periods
Low 3 years
Medium 6 months
High 7 days

Backup Types

There are two different types of backups available, full backups and incremental backups. A full backup backs up the entire file system as it is at that moment in time. An incremental backup backs up only the data that has changed since the last backup. Incremental backups can be either cumulative or differential. A cumulative backup backs up all the changes since the last full backup, whilst a differential backup backs up the changes since the last differential – differential backups are not supported on ZFS storage appliances, they are however widely available when backing up to tape.

Incremental backups can be leveled. You can perform a level 0 (full backup) each month, a level 1 (incremental) each Sunday and a level 2 (incremental) each weekday. If this type of strategy is implemented then only the data that has changed since the last level -1 backup is backed up. For example on Tuesday, the data backed up is the data, which has changed since the last level 1 backup that was taken on the previous Sunday. Incremental backups are useful when the volume of data to be backed up is significant.

The advantage of a full backup is that the backup contains all of the information required to perform a restore. In an incremental backup strategy, a restore is likely to use several backups. In the 3 level backup strategy above you would need, the Last level 0 backup, plus the last level 1 backup taken, plus the last level 2 backup taken. If the volume of data to be backed up is small, then it may be easier to perform a full backup each time rather than an incremental one, but this will be determined by the volumes of data being backed up. Incremental backups are supported by Oracle Secure Backup and the operating system dump command.

Recovery Point Objective (RPO) and Recovery Time Objective (RTO) determine the frequency of backups. They are critical factors in an effective business continuity plan. Refer to the following documents for additional information: