2 Maintaining Database Servers of Oracle Exadata Database Machine

This chapter contains the following topics:

Note:

For ease of reading, the name "Oracle Exadata Rack" is used when information refers to both Oracle Exadata Database Machine and Oracle Exadata Storage Expansion Rack.

2.1 Management Server on Database Servers

Management Server (MS) running on database servers provides monitoring, alerting, and other administrative capabilities. It also provides the DBMCLI command-line administration tool.

See Also:

2.2 Maintaining the Hard Disks of Oracle Database Servers

Repair of the hard disks does not require the database server in Oracle Exadata Database Machine to be shut down.

No downtime of the rack is required, however individual servers may require downtime, and be taken outside of the cluster temporarily.

2.2.1 Verifying the Database Server Configuration

The disks are configured RAID-5 configuration.

The disk drives in each database server are controlled by a MegaRAID SAS disk controller.

Table 2-1 Disk Configurations for Exadata Database Machine Two-Socket Systems

Server Type RAID Controller Disk Configuration

Oracle Exadata Database Machine X8-2

MegaRAID SAS 9361-16i

4 disk drives in each database server

Oracle Exadata Database Machine X7-2

MegaRAID SAS 9361-16i

4 disk drives in each database server

Oracle Exadata Database Machine X6-2

MegaRAID SAS 9361-8i

4 disk drives in each database server

Oracle Exadata Database Machine X5-2

MegaRAID SAS 9361-8i

4 disk drives in each database server

Oracle Exadata Database Machine X4-2

MegaRAID SAS 9261-8i

4 disk drives in each database server

Oracle Exadata Database Machine X3-2

MegaRAID SAS 9261-8i

4 disk drives in each database server

Oracle Exadata Database Machine X2-2

MegaRAID SAS 9261-8i

4 disk drives in each database server

Table 2-2 Disk Configurations for Exadata Database Machine Eight-Socket Systems

Server Type RAID Controller Disk Configuration

Oracle Exadata Database Machine X8-8

N/A

Two NVMe flash accelerator cards in each database server

Oracle Exadata Database Machine X7-8

N/A

Two NVMe flash accelerator cards in each database server

Oracle Exadata Database Machine X5-8

MegaRAID SAS 9361-8i

8 disk drives in each database server with one virtual drive created across the RAID set

Oracle Exadata Database Machine X4-8

MegaRAID SAS 9261-8i

7 disk drives in each database server configured as one 6-disk RAID-5 with one global hot spare drive by default

Oracle Exadata Database Machine X3-8

MegaRAID SAS 9261-8i

8 disk drives in each database server with one virtual drive created across the RAID set

Oracle recommends verifying the status of the database server RAID devices to avoid possible performance impact, or an outage. The impact of validating the RAID devices is minimal. The impact of corrective actions will vary depending on the specific issue uncovered, and may range from simple reconfiguration to an outage.

2.2.1.1 Verifying Disk Controller Configuration on Oracle Exadata Database Machine X7-8 or Later Systems
  • Query mdstat to view the database server disk controller configuration on Oracle Exadata Database Machine X7-8 or later systems.
    [root@dbnode01adm01 ~]# cat /proc/mdstat 
    Personalities : [raid1] 
    md34 : active raid1 nvme3n1[1] nvme1n1[0]
          3125613568 blocks super external:/md126/0 [2/2] [UU]
          
    md24 : active raid1 nvme2n1[1] nvme0n1[0]
          262144000 blocks super external:/md127/0 [2/2] [UU]
          
    md25 : active raid1 nvme2n1[1] nvme0n1[0]
          2863467520 blocks super external:/md127/1 [2/2] [UU]
          
    md126 : inactive nvme3n1[1](S) nvme1n1[0](S)
          6306 blocks super external:imsm
           
    md127 : inactive nvme2n1[1](S) nvme0n1[0](S)
          6306 blocks super external:imsm
           
    unused devices: <none> 

If the output you see is different, then investigate and correct the problem. Degraded virtual drives usually indicate absent or failed physical disks. Disks that show [1/2] and [U_] or [_U] for the state indicate that one of the NVME disks is down. Failed disks should be replaced quickly.

2.2.1.2 Verifying Disk Controller Configuration on Oracle Exadata Database Machine X6-8 and Earlier

For Oracle Exadata Database Machine X4-2, Oracle Exadata Database Machine X3-2, and Oracle Exadata Database Machine X2-2, the expected output is one virtual drive, none degraded or offline, five physical devices (one controller and four disks), four disks, and no critical or failed disks.

For Oracle Exadata Database Machine X3-8 Full Rack and Oracle Exadata Database Machine X2-8 Full Rack, the expected output is one virtual drive, none degraded or offline, 11 physical devices (one controller, two SAS2 expansion ports, and eight disks), eight disks, and no critical or failed disks.

If your output is different, then investigate and correct the problem. Degraded virtual drives usually indicate absent or failed physical disks. Critical disks should be replaced immediately to avoid the risk of data loss if the number of failed disks in the node exceed the count needed to sustain the operations of the system. Failed disks should also be replaced quickly.

Note:

If additional virtual drives or a hot spare are present, then it may be that the procedure to reclaim disks was not performed at deployment time or that a bare metal restore procedure was performed without using the dualboot=no qualifier. Contact Oracle Support Services and reference My Oracle Support note 1323309.1 for additional information and corrective steps.

When upgrading a database server that has a hot spare to Oracle Exadata System Software release 11.2.3.2.0 or later, the hot spare is removed, and added as an active drive to the RAID configuration. The database servers have the same availability in terms of RAID5 redundancy, and can survive the loss of one drive. When a drive failure happens, Oracle Auto Service Request (ASR) sends out a notification to replace the faulty drive at the earliest opportunity.

  • Use the following command to verify the database server disk controller configuration on all systems prior to Oracle Exadata Database Machine X7-8:
    if [[ -d /proc/xen && ! -f /proc/xen/capabilities ]]
    then
      echo -e "\nThis check will not run in a user domain of a virtualized environment.  Execute this check in the management domain.\n"
    else
      if [ -x /opt/MegaRAID/storcli/storcli64 ]
      then
        export CMD=/opt/MegaRAID/storcli/storcli64
      else
        export CMD=/opt/MegaRAID/MegaCli/MegaCli64
      fi
      RAW_OUTPUT=$($CMD AdpAllInfo -aALL -nolog | grep "Device Present" -A 8);
      echo -e "The database server disk controller configuration found is:\n\n$RAW_OUTPUT";
    fi;

    Note:

    This check is not applicable to Oracle Exadata Database Machine X7-8 or later database servers because they do not have any conventional disk drives.

Example 2-1 Checking the disk controller configuration for Oracle Exadata Database Machine 2-socket system (X2-2 or later) without the disk expansion kit

The following is an example of the output from the command for Oracle Exadata Database Machine 2-socket system (X2-2 or later) without the disk expansion kit.

                Device Present
                ================
Virtual Drives    : 1 
  Degraded        : 0 
  Offline         : 0 
Physical Devices  : 5 
  Disks           : 4 
  Critical Disks  : 0 
  Failed Disks    : 0 

Example 2-2 Checking the disk controller configuration for Oracle Exadata Database Machine X4-8 Full Rack

The following is an example of the output from the command for Oracle Exadata Database Machine X4-8 Full Rack:

                Device Present
                ================
Virtual Drives    : 1
  Degraded        : 0
  Offline         : 0
Physical Devices  : 8
  Disks           : 7
  Critical Disks  : 0
  Failed Disks    : 0

Example 2-3 Checking the disk controller configuration for Oracle Exadata Database Machine X5-8 or X6-8 Full Rack

The following is an example of the output from the command for Oracle Exadata Database Machine X5-8 or X6-8 Full Rack:

                Device Present
                ================
Virtual Drives   : 1
  Degraded       : 0
  Offline        : 0
Physical Devices : 9
  Disks          : 8
  Critical Disks : 0
  Failed Disks   : 0
2.2.1.3 Verifying Virtual Drive Configuration

To verify the virtual drive configuration, use the following command to verify the virtual drive configuration:

/opt/MegaRAID/MegaCli/MegaCli64 CfgDsply -aALL | grep "Virtual Drive:";    \
/opt/MegaRAID/MegaCli/MegaCli64 CfgDsply -aALL | grep "Number Of Drives";  \
/opt/MegaRAID/MegaCli/MegaCli64 CfgDsply -aALL | grep "^State" 

The following is an example of the output for Oracle Exadata Database Machine X4-2, Oracle Exadata Database Machine X3-2 and Oracle Exadata Database Machine X2-2. The virtual device 0 should have four drives, and the state is Optimal.

Virtual Drive                 : 0 (Target Id: 0)
Number Of Drives              : 4
State                         : Optimal

The expected output for Oracle Exadata Database Machine X3-8 Full Rack and Oracle Exadata Database Machine X2-8 Full Rack displays the virtual device has eight drives and a state of optimal.

Note:

If a disk replacement was performed on a database server without using the dualboot=no option, then the database server may have three virtual devices. Contact Oracle Support and reference My Oracle Support note 1323309.1 for additional information and corrective steps.

2.2.1.4 Verifying Physical Drive Configuration

Check your system for critical, degraded, or failed disks.

To verify physical drive configuration, use the following command to verify the database server physical drive configuration:

/opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL | grep "Firmware state"

The following is an example of the output for Oracle Exadata Database Machine X4-2, Oracle Exadata Database Machine X3-2, and Oracle Exadata Database Machine X2-2:

Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up

The drives should show a state of Online, Spun Up. The order of the output is not important. The output for Oracle Exadata Database Machine X3-8 Full Rack or Oracle Exadata Database Machine X2-8 Full Rack should be eight lines of output showing a state of Online, Spun Up.

If your output is different, then investigate and correct the problem.

Degraded virtual drives usually indicate absent or failed physical disks. Critical disks should be replaced immediately to avoid the risk of data loss if the number of failed disks in the node exceed the count needed to sustain the operations of the system. Failed disks should be replaced quickly.

2.2.2 Monitoring a Database Server RAID Set Rebuilding

If a drive in a database server RAID set is replaced, then the progress of the RAID set rebuild should be monitored.

Use the following command on the database server that has the replaced disk. The command is run as the root user.

/opt/MegaRAID/MegaCli/MegaCli64 -pdrbld -showprog -physdrv \
[disk_enclosure:slot_number] -a0

In the preceding command, disk_enclosure and slot_number indicate the replacement disk identified by the MegaCli64 -PDList command. The following is an example of the output from the command:

Rebuild Progress on Device at Enclosure 252, Slot 2 Completed 41% in 13 Minutes.

2.2.3 Reclaiming a Hot Spare Drive After Upgrading to Oracle Exadata System Software Release 12.1.2.1.0 or Later

Oracle Exadata Database Machines upgraded to Oracle Exadata System Software release 12.1.2.1.0 or later that have a hot spare drive cannot use the reclaimdisks.sh script to reclaim the drive. The following procedure describes how to manually reclaim the drive:

Note:

During the procedure, the database server is restarted twice. The steps in the procedure assume that the Oracle Grid Infrastructure restart is disabled after the server restart.

The sample output shows Oracle Exadata Database Machine X2-2 database server with four disks. The enclosure identifier, slot number, and such may be different for your system.

  1. Identify the hot spare drive.
    # /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL
    

    The following is an example of the output from the command for the hot spare drive:

    ...
    Enclosure Device ID: 252
    Slot Number: 3
    Enclosure position: N/A
    Device Id: 8
    WWN: 5000CCA00A9FAA5F
    Sequence Number: 2
    Media Error Count: 0
    Other Error Count: 0
    Predictive Failure Count: 0
    Last Predictive Failure Event Seq Number: 0
    PD Type: SAS
    Hotspare Information:
    Type: Global, with enclosure affinity, is revertible
     
    Raw Size: 279.396 GB [0x22ecb25c Sectors]
    Non Coerced Size: 278.896 GB [0x22dcb25c Sectors]
    Coerced Size: 278.464 GB [0x22cee000 Sectors]
    Sector Size: 0
    Logical Sector Size: 0
    Physical Sector Size: 0
    Firmware state: Hotspare, Spun down
    Device Firmware Level: A2A8
    Shield Counter: 0
    Successful diagnostics completion on : N/A
    ...
    

    The command identified the hot spare drive on enclosure identifier 252, slot 3.

  2. Obtain the virtual drive information.
    # /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -Aall
    

    The following is an example of the output from the command:

    Adapter 0 -- Virtual Drive Information:
    Virtual Drive: 0 (Target Id: 0)
    Name :DBSYS
    RAID Level : Primary-5, Secondary-0, RAID Level Qualifier-3
    Size : 556.929 GB
    Sector Size : 512
    Is VD emulated : No
    Parity Size : 278.464 GB
    State : Optimal
    Strip Size : 1.0 MB
    Number Of Drives : 3
    Span Depth : 1
    Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
    Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
    Default Access Policy: Read/Write
    Current Access Policy: Read/Write
    Disk Cache Policy : Disabled
    Encryption Type : None
    Is VD Cached: No
    

    The command identified a RAID 5 configuration for virtual drive 0 on adapter 0.

  3. Remove the hot spare drive.
    # /opt/MegaRAID/MegaCli/MegaCli64 -PDHSP -Rmv -PhysDrv[252:3] -a0
    
  4. Add the drive as an active RAID 5 drive.
    # /opt/MegaRAID/MegaCli/MegaCli64 -LDRecon -Start -r5     \
      -Add -PhysDrv[252:3] -L0 -a0
    
    Start Reconstruction of Virtual Drive Success.
    Exit Code: 0x00
    

    Note:

    If the message Failed to Start Reconstruction of Virtual Drive is displayed, then follow the instructions in My Oracle Support note 1505157.1.

  5. Monitor the progress of the RAID reconstruction.
    # /opt/MegaRAID/MegaCli/MegaCli64 -LDRecon -ShowProg -L0 -a0
    
    Reconstruction on VD #0 (target id #0) Completed 1% in 2 Minutes.
    

    The following output shows the output of the command after the hot spare drive is added to the RAID 5 configuration, and the reconstruction is finished:

    Reconstruction on VD #0 is not in Progress.
    
  6. Verify the number of drives.
    # /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -Aall
    

    The following is an example of the output from the command:

    Adapter 0 -- Virtual Drive Information:
    Virtual Drive: 0 (Target Id: 0)
    Name :DBSYS
    RAID Level : Primary-5, Secondary-0, RAID Level Qualifier-3
    Size : 835.394 GB
    Sector Size : 512
    Is VD emulated : No
    Parity Size : 278.464 GB
    State : Optimal
    Strip Size : 1.0 MB
    Number Of Drives : 4
    Span Depth : 1
    Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
    Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
    Default Access Policy: Read/Write
    Current Access Policy: Read/Write
    Disk Cache Policy : Disabled
    Encryption Type : None
    Is VD Cached: No
    
  7. Check the size of the RAID.
    # parted /dev/sda print
    
    Model: LSI MR9261-8i (scsi)
    Disk /dev/sda: 598GB
    Sector size (logical/physical): 512B/4096B
    Partition Table: msdos
     
    Number Start End Size Type File system Flags
    1 32.3kB 132MB 132MB primary ext3 boot
    2 132MB 598GB 598GB primary lvm 
    
  8. Restart the server in order for the changes to take effect.
  9. Check the size of the RAID again.
    # parted /dev/sda print
    
    Model: LSI MR9261-8i (scsi)
    Disk /dev/sda: 897GB
    Sector size (logical/physical): 512B/4096B
    Partition Table: msdos
     
    Number Start End Size Type File system Flags
    1 32.3kB 132MB 132MB primary ext3 boot
    2 132MB 598GB 598GB primary lvm
    

    The increased RAID size allows for extending the volume group. To extend the volume group, you must add an additional partition to the drive.

  10. Obtain the new size, in sectors.
    # parted /dev/sda
    
    GNU Parted 2.1
    Using /dev/sda
    Welcome to GNU Parted! Type 'help' to view a list of commands.
    (parted) unit s
    (parted) print
    Model: LSI MR9261-8i (scsi)
    Disk /dev/sda: 1751949312s
    Sector size (logical/physical): 512B/4096B
    Partition Table: msdos
     
    Number Start End Size Type File system Flags
    1 63s 257039s 256977s primary ext3 boot
    2 257040s 1167957629s 1167700590s primary lvm
    

    In the preceding example, a third partition can be created starting at sector 1167957630, and ending at the end of the disk at sector 1751949311.

  11. Create an additional partition on the drive.
    # parted /dev/sda
    
    GNU Parted 2.1
    Using /dev/sda
    Welcome to GNU Parted! Type 'help' to view a list of commands.
    (parted) unit s
     
    (parted) mkpart
     
    Partition type? primary/extended? primary
    File system type? [ext2]? ext2 
    Start? 1167957630
    End? 1751949311
    Warning: The resulting partition is not properly aligned for best performance.
    Ignore/Cancel? Ignore
    Warning: WARNING: the kernel failed to re-read the partition table on /dev/sda (Device or resource busy). As a
    result, it may not reflect all of your changes until after reboot.
    (parted)
     
    (parted) print
    Model: LSI MR9261-8i (scsi)
    Disk /dev/sda: 1751949312s
    Sector size (logical/physical): 512B/4096B
    Partition Table: msdos
     
    Number Start End Size Type File system Flags
    1 63s 257039s 256977s primary ext3 boot
    2 257040s 1167957629s 1167700590s primary lvm
    3 1167957630s 1751949311s 583991682s primary
     
    (parted) set 3 lvm on 
     
    Warning: WARNING: the kernel failed to re-read the partition table on /dev/sda (Device or resource busy). As a
    result, it may not reflect all of your changes until after reboot.
    (parted) print
    Model: LSI MR9261-8i (scsi)
    Disk /dev/sda: 1751949312s
    Sector size (logical/physical): 512B/4096B
    Partition Table: msdos
     
    Number Start End Size Type File system Flags
    1 63s 257039s 256977s primary ext3 boot
    2 257040s 1167957629s 1167700590s primary lvm
    3 1167957630s 1751949311s 583991682s primary lvm
    
  12. Restart the database server.
  13. Create the physical volume.
    # pvcreate /dev/partition_name
    
  14. Add the physical volume to the existing volume group.

    In the following example, substitute the actual names for the volume_group, partition_name, and volume_name.

    # vgextend volume_group /dev/partition_name
     
    Volume group "volume_name" successfully extended 
    
  15. Resize the logical volume and file systems as described in "Extending LVM Partitions."

2.2.4 Understanding Automated File Deletion Policy

Management Server (MS) includes a file deletion policy for the / (root) directory on the database servers that is triggered when file system utilization is high. Deletion of files is triggered when file utilization is 80 percent, and an alert is sent before the deletion begins. The alert includes the name of the directory, and space usage for the subdirectories. In particular, the deletion policy is as follows:

Files in the following directories are deleted using a policy based on the file modification time stamp.

  • /opt/oracle/dbserver/log

  • /opt/oracle/dbserver/dbms/deploy/config/metrics

  • /opt/oracle/dbserver/dbms/deploy/log

Files that are older than the number of days set by the metricHistoryDays attribute are deleted first, then successive deletions occur for earlier files, down to files with modification time stamps older than or equal to 10 minutes, or until file system utilization is less than 75 percent. The metricHistoryDays attribute applies to files in /opt/oracle/dbserver/dbms/deploy/config/metrics. For the other log and trace files, use the diagHistoryDays attribute.

Starting with Oracle Exadata System Software release 12.1.2.2.0, the maximum amount of space for ms-odl.trc and ms-odl.log files is 100 MB (twenty 5 MB files) for *.trc files and 100 MB (twenty 5 MB files) for *.log files. Previously, it was 50 MB (ten 5 MB files) for both *.trc and *.log files.

ms-odl generation files are renamed when they reach 5 MB, and the oldest are deleted when the files use up 100 MB of space.

2.3 Maintaining Flash Disks on Exadata Database Servers

Starting with Exadata Database Machine X7-8, the database servers contain flash devices instead of hard disks. These flash devices can be replaced without shutting down the server.

2.3.1 Monitoring the Status of Flash Disks

You can monitor the status of a flash disk on the Exadata Database Machine by checking its attributes with the DBMCLI LIST PHYSICALDISK command.

For example, a flash disk status equal to failed is probably having problems and needs to be replaced.

  • Use the DBMCLI command LIST PHSYICALDISK to determine the status of a flash disk:
    DBMCLI> LIST PHYSICALDISK WHERE disktype=flashdisk AND status!=normal DETAIL
             name:               FLASH_1_1
             deviceName:         /dev/nvme0n1
             diskType:           FlashDisk
             luns:               1_1
             makeModel:          "Oracle Flash Accelerator F640 PCIe Card"
             physicalFirmware:   QDV1RD09
             physicalInsertTime: 2017-08-11T12:25:00-07:00
             physicalSerial:     PHLE6514003R6P4BGN-1
             physicalSize:       2.910957656800747T
             slotNumber:         "PCI Slot: 1; FDOM: 1"
             status:             failed - dropped for replacement

The Exadata Database Server flash disk statuses are as follows:

  • normal

  • normal - dropped for replacement

  • failed

  • failed - dropped for replacement

  • failed - rejected due to incorrect disk model

  • failed - rejected due to incorrect disk model - dropped for replacement

  • failed - rejected due to wrong slot

  • failed - rejected due to wrong slot - dropped for replacement

  • warning - peer failure

  • warning - predictive failure, write-through caching

  • warning - predictive failure

  • warning - predictive failure - dropped for replacement

  • warning - write-through caching

2.3.2 Performing a Hot-Pluggable Replacement of a Flash Disk

For Oracle Exadata Database Machine X7-8 and X8-8 models, the database server uses hot-pluggable flash disks instead of hard disk drives.

  1. Determine if the flash disk is ready to be replaced.
    When performing a hot-pluggable replacement of a flash device on Oracle Exadata Database Machine X7-8 and X8-8 database servers, the disk status should be Dropped for replacement, which indicates the flash disk is ready for online replacement.
    DBMCLI> LIST PHYSICALDISK WHERE DISKTYPE=flashdisk AND STATUS LIKE '.*dropped 
    for replacement.*' DETAIL
    
             name:               FLASH_1_1
             deviceName:         /dev/nvme0n1
             diskType:           FlashDisk
             luns:               1_1
             makeModel:          "Oracle Flash Accelerator F640 PCIe Card"
             physicalFirmware:   QDV1RD09
             physicalInsertTime: 2017-08-11T12:25:00-07:00
             physicalSerial:     PHLE6514003R6P4BGN-1
             physicalSize:       2.910957656800747T
             slotNumber:         "PCI Slot: 1; FDOM: 1"
             status:             failed - dropped for replacement
    
  2. Locate the failed flash disk based on the PCI number and FDOM number.
    A white Locator LED is lit to help locate the affected database server. An amber Fault-Service Required LED is lit to identify the affected flash card.
  3. Make sure the DPCC OK LED is off on the card.

    Caution:

    Removing a card with the DPCC OK LED on could result in a system crash. If a failed disk has a status of Failed – dropped for replacement but the DPCC OK LED is still on, contact Oracle Support.
  4. Remove and replace the failed flash disk.
    1. Slide out the DPCC and replace the flash card inside.
    2. Slide the DPCC carrier back to the slot.
  5. Use a stylus to press both ATTN buttons on the front of the DPCC.
    • If only a single PCIe card is present, press only the corresponding ATTN button.
    • If you are not performing a hot-pluggable replacement, then this step is not necessary.

    The buttons alert the system to a request to bring the devices online. When the system acknowledges the request, it lights the DPCC OK LED indicators on the DPCC. If you do not press the ATTN buttons, then the flash disk will not be detected by the operating system.

2.4 Adding Disk Expansion Kit to Database Servers

You can add a disk expansion kit to some Oracle Exadata Database Servers.

Note the following restrictions:

  • The disk expansion kit is supported on Oracle Exadata Database Machine X5-2, X6-2, and X7-2 systems only.

  • Oracle Exadata System Software release 12.1.2.3.0 or later is required.

  1. Replace the plastic filler with the four drives in the disk expansion kit.
    The server should be powered on so that the controller can sense the new drives.
  2. An alert will be raised to indicate the disk expansion kit is detected and the drives have been automatically added to the existing RAID5 configuration and reconstruction of the corresponding virtual drive is started.
  3. Another alert will be raised when the virtual drive reconstruction is completed.
  4. Run /opt/oracle.SupportTools/reclaimdisks.sh -extend-vgexadb to extend the VGExaDb volume group to the rest of the /dev/sda system disk.

    Note:

    • reclaimdisks.sh works only during initial deployment, before the database software is installed. If you have already been running Oracle Exadata Database Machine for a while, you do not need to run it because you should have already run it during initial deployment. In the event that it was not run during initial deployment, you cannot run it at this time because too many changes to /u01 have been made. reclaimdisks.sh returns an error message if it discovers changes to the /u01 file system. By default, the /u01 file system is empty on new systems.

    • Run this command only after the virtual drive reconstruction is complete. If you run it before reconstruction is complete, you will see the following messages:

      [WARNING ] Reconstruction of the logical drive 0 is in progress: Completed: 14%. Left: 5 Hours 32 Minutes
      [WARNING ] Continue after reconstruction is complete
      

      If this occurs, wait until the virtual drive reconstruction is complete, then re-run the command.

    If prompted to fix the GUID Partition Table (GPT) or to continue with the current settings, enter F to fix the GPT.

    [root@dbnode01 ~]# /opt/oracle.SupportTools/reclaimdisks.sh -extend-vgexadb
    Model is ORACLE SERVER X6-2
    Number of LSI controllers: 1
    Physical disks found: 8 (252:0 252:1 252:2 252:3 252:4 252:5 252:6 252:7)
    Logical drives found: 1
    Linux logical drive: 0
    RAID Level for the Linux logical drive: 5
    Physical disks in the Linux logical drive: 8 (252:0 252:1 252:2 252:3 252:4 252:5 252:6 252:7)
    Dedicated Hot Spares for the Linux logical drive: 0
    Global Hot Spares: 0
    Valid. Disks configuration: RAID5 from 8 disks with no global and dedicated hot spare disks.
    Valid. Booted: Linux. Layout: Linux + DOM0.
    [INFO     ] Size of system block device /dev/sda: 4193GB
    [INFO     ] Last partition on /dev/sda ends on: 1797GB
    [INFO     ] Unused space detected on the system block device: /dev/sda
    [INFO     ] Label of partition table on /dev/sda: gpt
    [INFO     ] Adjust the partition table to use all of the space on /dev/sda
    [INFO     ] Respond to the following prompt by typing 'F'
    Warning: Not all of the space available to /dev/sda appears to be used, you can fix the GPT to use all of the space (an extra 4679680000 blocks) or
    continue with the current setting?
    Fix/Ignore? F
    Model: LSI MR9361-8i (scsi)
    Disk /dev/sda: 4193GB
    Sector size (logical/physical): 512B/512B
    Partition Table: gpt
     
    Number  Start   End     Size    File system  Name     Flags
     1      32.8kB  537MB   537MB   ext4         primary  boot
     2      537MB   123GB   122GB                primary  lvm
     3      123GB   1690GB  1567GB               primary
     4      1690GB  1797GB  107GB                primary  lvm
     
    [INFO     ] Check for Linux with inactive DOM0 system disk
    [INFO     ] Valid Linux with inactive DOM0 system disk is detected
    [INFO     ] Number of partitions on the system device /dev/sda: 4
    [INFO     ] Higher partition number on the system device /dev/sda: 4
    [INFO     ] Last sector on the system device /dev/sda: 8189440000
    [INFO     ] End sector of the last partition on the system device /dev/sda: 3509759000
    [INFO     ] Unmount /u01 from /dev/mapper/VGExaDbOra-LVDbOra1
    [INFO     ] Remove inactive system logical volume /dev/VGExaDb/LVDbSys3
    [INFO     ] Remove xen files from /boot
    [INFO     ] Remove logical volume /dev/VGExaDbOra/LVDbOra1
    [INFO     ] Remove volume group VGExaDbOra
    [INFO     ] Remove physical volume /dev/sda4
    [INFO     ] Remove partition /dev/sda4
    [INFO     ] Remove device /dev/sda4
    [INFO     ] Remove partition /dev/sda3
    [INFO     ] Remove device /dev/sda3
    [INFO     ] Create primary partition 3 using 240132160 8189439966
    [INFO     ] Set lvm flag for the primary partition 3 on device /dev/sda
    [INFO     ] Add device /dev/sda3
    [INFO     ] Primary LVM partition /dev/sda3 has size 7949307807 sectors
    [INFO     ] Create physical volume on partition /dev/sda3
    [INFO     ] LVM Physical Volume /dev/sda3 has size 3654340511 sectors
    [INFO     ] Size of LVM physical volume less than size of device /dev/sda3
    [INFO     ] Remove LVM physical volume /dev/sda3
    [INFO     ] Reboot is required to apply the changes in the partition table
    
  5. If required, reboot the database server to apply the changes in the partition table. The message at the end of the previous command will tell you if a reboot is required.
    [root@dbnode01 ~]# reboot
    

    If a reboot is not required, skip to step 7.

  6. Run /opt/oracle.SupportTools/reclaimdisks.sh -extend-vgexadb.

    This step depends on how you want to use the additional space. For example, you can increase the size of existing volume groups, or you can create and mount new volume groups to make use of the additional space.

    This command might return an error, as shown below. This error can be ignored.

    [root@dbnode01 ~]# /opt/oracle.SupportTools/reclaimdisks.sh -extend-vgexadb
    Model is ORACLE SERVER X6-2
    Number of LSI controllers: 1
    Physical disks found: 8 (252:0 252:1 252:2 252:3 252:4 252:5 252:6 252:7)
    Logical drives found: 1
    Linux logical drive: 0
    RAID Level for the Linux logical drive: 5
    Physical disks in the Linux logical drive: 8 (252:0 252:1 252:2 252:3 252:4 252:5 252:6 252:7)
    Dedicated Hot Spares for the Linux logical drive: 0
    Global Hot Spares: 0
    Valid. Disks configuration: RAID5 from 8 disks with no global and dedicated hot spare disks.
    Valid. Booted: Linux. Layout: Linux.
    [INFO     ] Check for Linux system disk
    [INFO     ] Number of partitions on the system device /dev/sda: 4
    [INFO     ] Higher partition number on the system device /dev/sda: 4
    [INFO     ] Last sector on the system device /dev/sda: 8189440000
    [INFO     ] End sector of the last partition on the system device /dev/sda: 8189439966
    [INFO     ] Next free available partition on the system device /dev/sda:
    [INFO     ] Primary LVM partition /dev/sda4 has size 4679680000 sectors
    [INFO     ] Create physical volume on partition /dev/sda4
    [INFO     ] LVM Physical Volume /dev/sda4 has size 4679680000 sectors
    [INFO     ] Size of LVM physical volume matches size of primary LVM partition /dev/sda4
    [INFO     ] Extend volume group VGExaDb with physical volume on /dev/sda4
    [INFO     ] Create 100Gb logical volume for DBORA partition in volume group VGExaDb
    [WARNING  ] Failed command at attempt: lvm lvcreate -L 100GB -n LVDbOra1 VGExaDb at 1/1
    [ERROR    ] Failed command: lvm lvcreate -L 100GB -n LVDbOra1 VGExaDb
    [ERROR    ] Unable to create logical volume LVDbOra1 in volume group VGExaDb
    [ERROR    ] Unable to reclaim all disk space
    
  7. Run /opt/oracle.SupportTools/reclaimdisks.sh for confirmation.
    [root@dbnode01 ~]# /opt/oracle.SupportTools/reclaimdisks.sh
    Model is ORACLE SERVER X6-2
    Number of LSI controllers: 1
    Physical disks found: 8 (252:0 252:1 252:2 252:3 252:4 252:5 252:6 252:7)
    Logical drives found: 1
    Linux logical drive: 0
    RAID Level for the Linux logical drive: 5
    Physical disks in the Linux logical drive: 8 (252:0 252:1 252:2 252:3 252:4 252:5 252:6 252:7)
    Dedicated Hot Spares for the Linux logical drive: 0
    Global Hot Spares: 0
    Valid. Disks configuration: RAID5 from 8 disks with no global and dedicated hot spare disks.
    Valid. Booted: Linux. Layout: Linux.
    
  8. Resize the logical volume and file systems as described in Extending LVM Partitions.

2.5 Adding Memory Expansion Kit to Database Servers

Additional memory can be added to database servers. The following procedure describes how to add the memory:

  1. Power down the database server.
  2. Replace the plastic fillers with the DIMMs.
  3. Power on the database server.
  4. Add the database server back to the cluster.

Additional notes:

  • Memory for Sun Server X4-2 Oracle Database Servers and Sun Server X3-2 Oracle Database Servers can be expanded to a maximum of 512 GB with the memory expansion kit.
  • Memory for Sun Fire X4170 Oracle Database Servers can be expanded to a maximum of 144 GB by removing the existing memory, and replacing it with three X2-2 Memory Expansion Kits.
  • Sun Fire X4170 M2 Oracle Database Servers ship from the factory with 96 GB of memory, with 12 of the 18 DIMM slots populated with 8 GB DIMMs. The optional X2-2 Memory Expansion Kit can be used to populate the remaining 6 empty slots with 16 GB DIMMs to bring the total memory to 192 GB (12 x 8 GB and 6 x 16GB).

    The memory expansion kit is primarily for consolidation workloads where many databases are run on each database server. In this scenario, the CPU usage is often low while the memory usage is very high.

    However, there is a downside to populating all the memory slots as the frequency of the memory DIMMs drop to 800 MHz from 1333 MHz. The performance effect of the slower memory appears as increased CPU utilization. The average measured increase in CPU utilization is typically between 5% and 10%. The increase varies greatly by workload. In test workloads, several workloads had almost zero increase, while one workload had as high as a 20% increase.

  • When adding memory to Oracle Exadata Database Machines running Oracle Linux, Oracle recommends updating the /etc/security/limits.conf file with the following:

    oracle    soft     memlock 75%
    oracle    hard     memlock 75%
    

2.6 Verifying and Modifying the Link Speed on the Client Network Ports for X7 and X8 Systems

Ensure you are using the correct link speed for Oracle Exadata Database Machine X7 and X8 database servers.

Note:

You should configure the client network ports using Oracle Exadata Deployment Assistant (OEDA) during the deployment of the X7 and X8 systems. See the OEDA Customer Network Configuration Page.

The following steps may be necessary to configure a client access port if the OEDA deployment was not performed or was performed incorrectly.

  1. For each network interface (designated by x) that does not have the link detected, run the following commands:
    • For 10GbE network interfaces:
      # ifdown ethx
      # ethtool -s ethx 10000 duplex full autoneg off
      # ifup ethx
      # ethtool ethx
    • For 25GbE network interfaces:
      # ifdown ethx
      # ethtool -s ethx 25000 duplex full autoneg off
      # ifup ethx
      # ethtool ethx
  2. Confirm that the output from the ethtool command shows yes for Link detected.
            Link detected: yes
  3. Edit the appropriate files in /etc/sysconfig/network-scripts, where x is the number associated with the network interface.
    1. Locate the /etc/sysconfig/network-scripts/ifcfg-ethx file. Add the following lines, if they are not already present in the file:
      • For 10GbE network interfaces:

        ONBOOT=YES
        ETHTOOL_OPTS="speed 10000 duplex full autoneg off"
      • For 25GbE network interfaces:

        ONBOOT=YES
        ETHTOOL_OPTS="speed 25000 duplex full autoneg off"
    2. Repeat the previous step for all network interfaces that do not have the ETHTOOL_OPTS setting in the associated ifcfg-ethx file and are connected to 10GbE or 25GbE switches.

    The network interface should now show the link as detected. These changes are persistent, and do not need to be repeated after a server reboot.

  4. Check the ILOM on each compute node to validate the LAN on Motherboard is properly configured to detect the 25G transceiver.
    show /HOST/network
      /HOST/network
         Targets:
    
         Properties:
             active_media = none
             auto_media_detection = enabled
             current_active_media = (none)
    
         Commands:
             cd
             set
             show

    If the NIC is not working, change the active_media and current_active_media to the proper values:

    • For 25G transceivers (Fiber or Copper) these parameters should be set to SPF28
    • For 10G network using RJ-25 ended CAT6 cables, these parameters should be set to RJ45

2.7 Adding and Configuring an Extra Network Card on Oracle Exadata Database Machine X6-2 and Later

You can add an additional network card on Oracle Exadata Database Machine X6-2 and later systems.

Prerequisites

Ensure you are using the correct link speed for Oracle Exadata Database Machine X7-2 and X8-2 compute nodes. Complete the steps in Verifying and Modifying the Link Speed on the Client Network Ports for X7 and X8 Systems.

Oracle Exadata Database Machine X6-2

Oracle Exadata Database Machine X6-2 database server offers highly available copper 10G network on the motherboard, and an optical 10G network via a PCI card on slot 2. Oracle offers an additional Ethernet card for customers that require additional connectivity. The additional card provides either dual port 10GE copper connectivity (part number 7100488) or dual port 10GE optical connectivity (part number X1109A-Z). You install this card in PCIe slot 1 on the Oracle Exadata Database Machine X6-2 database server.

After you install the card and connect it to the network, the Oracle Exadata System Software automatically recognizes the new card and configures the two ports as eth6 and eth7 interfaces on the X6-2 database server. You can use these additional ports to provide an additional client network, or to create a separate backup or data recovery network. On a database server that runs virtual machines, you could use this to isolate traffic from two virtual machines.

Oracle Exadata Database Machine X7-2

Oracle Exadata Database Machine X7-2 and later database servers offer 2 copper (RJ45) or 2 optical (SFP28) network connections on the motherboard plus 2 optical (SFP28) network connections in PCIe card slot 1. Oracle offers an additional 4 copper (RJ45) 10G network connections for customers that require additional connectivity. The additional card is the Oracle Quad Port 10GBase-T card (part number 7111181). You install this card in PCIe slot 3 on the database server.

After you install the card and connect it to the network, the Oracle Exadata System Software automatically recognizes the new card and configures the four ports as eth5 to eth8 interfaces on the database server. You can use these additional ports to provide an additional client network, or to create a separate backup or data recovery networks. On a database server that runs virtual machines, you could use this additional client network to isolate traffic from two virtual machines.

After you have added the card to the database server, you need to configure the card. See the following topics for instructions:

Oracle Exadata Database Machine X8-2

Oracle Exadata Database Machine X8-2 database servers offer 2 copper (RJ45) or 2 copper/optical (SFP28) network connections on the motherboard plus 2 optical (SFP28) network connections in PCIe card slot 1. Oracle offers an additional 4 copper 1/10G (RJ45) or 2 optical 10/25G (SFP28) network connections for customers that require additional connectivity. The two additional cards are:

  • Oracle Quad Port 10GBase-T card (part number 7111181)
  • Oracle Dual Port 25 Gb Ethernet Adapter (part number 7118016)

The additional card is installed in PCIe slot 3 on the database server.

After you install the card and connect it to the network, the Oracle Exadata System Software automatically recognizes the new card and configures either the four ports as eth5 to eth8 interfaces for the quad port card, or eth5 and eth6 for the dual port card on the database server. You can use these additional ports to provide an additional client network, or to create a separate backup or data recovery networks. On a database server that runs virtual machines, you could use this additional client network to isolate traffic from two virtual machines.

2.7.1 Viewing the Network Interfaces

To view the network interfaces, you can run the ipconf.pl command.

Example 2-4 Viewing the default network interfaces for an Oracle Exadata Database Machine X6-2 database server

The following example shows the output for an Oracle Exadata Database Machine X6-2 database server without the additional network card. The output shows two network cards:

  • A quad port 10Gb card, on eth0 to eth3

  • A dual port 10Gb card, on eth4 and eth5

# cd /opt/oracle.cellos/

# ./ipconf.pl
Logging started to /var/log/cellos/ipconf.log
Interface ib0   is          Linked.    hca: mlx4_0
Interface ib1   is          Linked.    hca: mlx4_0
Interface eth0  is          Linked.    driver/mac: ixgbe/00:10:e0:8b:24:b6
Interface eth1  is .....    Linked.    driver/mac: ixgbe/00:10:e0:8b:24:b7
Interface eth2  is .....    Linked.    driver/mac: ixgbe/00:10:e0:8b:24:b8
Interface eth3  is .....    Linked.    driver/mac: ixgbe/00:10:e0:8b:24:b9
Interface eth4  is          Linked.    driver/mac: ixgbe/90:e2:ba:ac:20:ec (slave of bondeth0)
nterface eth5  is           Linked.    driver/mac: ixgbe/90:e2:ba:ac:20:ec (slave of bondeth0)

Example 2-5 Viewing the default network interfaces for an Oracle Exadata Database Machine X7-2 database server

The following example shows the output for an Oracle Exadata Database Machine X7-2 or later database server without the additional network card. The output shows three network cards:

  • A single port 10Gb card, on eth0
  • A dual port 10 or 25Gb card, on eth1 and eth2
  • A dual port 25Gb card, on eth3 and eth4
# /opt/oracle.cellos/ipconf.pl
Logging started to /var/log/cellos/ipconf.log 
Interface ib0   is          Linked.    hca: mlx4_0 
Interface ib1   is          Linked.    hca: mlx4_0 
Interface eth0  is          Linked.    driver/mac: igb/00:
10:e0:c3:ba:72 
Interface eth1  is          Linked.    driver/mac: bnxt_en
/00:10:e0:c3:ba:73 
Interface eth2  is          Linked.    driver/mac: bnxt_en
/00:10:e0:c3:ba:74 
Interface eth3  is          Linked.    driver/mac: bnxt_en
/00:0a:f7:c3:14:a0 (slave of bondeth0) 
Interface eth4  is          Linked.    driver/mac: bnxt_en
/00:0a:f7:c3:14:a0 (slave of bondeth0)

2.7.2 Configuring the Additional Network Card for a Non-Oracle VM Environment

You can configure the additional network card on an Oracle Exadata Database Machine X6-2 or later database server for a non-Oracle VM environment.

This procedure assumes that you have already installed the network card in the Oracle Exadata Database Machine database server but have not yet completed the configuration with Oracle Exadata Deployment Assistant (OEDA).

WARNING:

If you have already installed Oracle Grid Infrastructure on Oracle Exadata Database Machine, then refer to the Oracle Clusterware documentation. Use caution when changing the network interfaces for the cluster.
  1. Ensure you have the following information for the new network card.
    You will need to input this information when you run ipconf.pl.
    • IP address
    • Netmask
    • Gateway
  2. Run the ipconf.pl script to configure the card.

    The following example shows a sample ipconf.pl session. The output shows three network cards:

    • A quad port 10Gb card, on eth0 to eth3

    • A dual port 10Gb card, on eth4 and eth5, with only one port cabled

    • A dual port 10Gb card, on eth6 and eth7, with only one port cabled. This is the new network card.

    For sample output for Oracle Exadata Database Machine X7-2, see Viewing the Network Interfaces.

    # cd /opt/oracle.cellos/
    # ./ipconf.pl
    
    Logging started to /var/log/cellos/ipconf.log
    Interface ib0   is                      Linked.    hca: mlx4_0
    Interface ib1   is                      Linked.    hca: mlx4_0
    Interface eth0  is                      Linked.    driver/mac: 
    ixgbe/00:10:e0:8b:22:e8 (slave of vmeth0)
    Interface eth1  is                      Linked.    driver/mac: 
    ixgbe/00:10:e0:8b:22:e9 (slave of bondeth0)
    Interface eth2  is                      Linked.    driver/mac: 
    ixgbe/00:10:e0:8b:22:e9 (slave of bondeth0)
    Interface eth3  is                      Linked.    driver/mac: 
    ixgbe/00:10:e0:8b:22:eb
    Interface eth4  is                      Linked.    driver/mac: 
    ixgbe/90:e2:ba:ac:1d:e4
    Interface eth5  is .................... Unlinked.  driver/mac: 
    ixgbe/90:e2:ba:ac:1d:e5
    Interface eth6  is ...                  Linked.    driver/mac: 
    ixgbe/90:e2:ba:78:d0:10
    Interface eth7  is .................... Unlinked.  driver/mac: 
    ixgbe/90:e2:ba:78:d0:11
    
    bondeth0 eth1,eth2 UP      vmbondeth0 10.128.1.169  255.255.240.0
    10.128.0.1  SCAN       test08client02.example.com
    bondeth1 None      UNCONF 
    bondeth2 None      UNCONF 
    bondeth3 None      UNCONF 
    Select interface name to configure or press Enter to continue: eth6
    Selected interface. eth6
    IP address or up or none: 10.129.19.34
    Netmask: 255.255.248.0
    Gateway (IP address or none) or none: 10.129.16.0
    
    Select network type for interface from the list below
    1: Management
    2: SCAN
    3: Other
    Network type: 3
    
    Fully qualified hostname or none: test08adm02-bkup.example.com
    Continue configuring or re-configuring interfaces? (y/n) [y]: n
    ...
    Do you want to configure basic ILOM settings (y/n) [y]: n
    [Info]: Custom changes have been detected in /etc/sysconfig/network-script
    s/ifcfg-eth6
    [Info]: Original file /etc/sysconfig/network-scripts/ifcfg-eth6 will be 
    saved in /opt/oracle.cellos/conf/network-scripts/backup_by_Exadata_ipconf
    [Info]: Original file /etc/ssh/sshd_config will be saved in /etc/ssh/sshd_
    config.backupbyExadata
    [Info]: Generate /etc/ssh/sshd_config with ListenAddress(es) 10.128.18.106, 
    10.129.19.34, 10.128.1.169, 192.168.18.44, 192.168.18.45
    Stopping sshd:                                             [  OK  ]
    Starting sshd:                                             [  OK  ]
    [Info]: Save /etc/sysctl.conf in /etc/sysctl.conf.backupbyExadata
    [Info]: Adjust settings for IB interfaces in /etc/sysctl.conf
    Re-login using new IP address 10.128.18.106 if you were disconnected after 
    following commands
    ip addr show vmbondeth0
    ip addr show bondeth0
    ip addr show vmeth0
    ip addr show eth0
    ifup eth6
    sleep 1
    ifup vmeth6
    sleep 1
    ip addr show vmeth6
    ip addr show eth6
    sleep 4
    service sshd condrestart
    
  3. If you need to set up the network card with VLAN, perform these steps:
    1. Add the VLAN ID to the /opt/oracle.cellos/cell.conf file.
      • Locate the Ethernet interface in the file. For example:

        <Interfaces>
          <Gateway>10.129.16.0</Gateway>
          <Hostname>test08adm02-bkup.example.com</Hostname>
          <IP_address>10.129.19.34</IP_address>
          <IP_enabled>yes</IP_enabled>
          <IP_ssh_listen>enabled</IP_ssh_listen>
          <Inet_protocol>IPv4</Inet_protocol>
          <Name>eth6</Name>
          <Net_type>Other</Net_type>
          <Netmask>255.255.248.0</Netmask>
          <State>1</State>
          <Status>UP</Status>
          <Vlan_id>0</Vlan_id>
        </Interfaces>
        
      • Add the VLAN ID to the <Vlan_id> element. The following example shows the interface configured with VLAN ID of 2122.

        <Interfaces>
          <Gateway>10.129.16.0</Gateway>
          <Hostname>test08adm02-bkup.example.com</Hostname>
          <IP_address>10.129.19.34</IP_address>
          <IP_enabled>yes</IP_enabled>
          <IP_ssh_listen>enabled</IP_ssh_listen>
          <Inet_protocol>IPv4</Inet_protocol>
          <Name>eth6</Name>
          <Net_type>Other</Net_type>
          <Netmask>255.255.248.0</Netmask>
          <State>1</State>
          <Status>UP</Status>
          <Vlan_id>2122</Vlan_id>
        </Interfaces>
        
    2. Run the following command to configure the network interface using the modified cell.conf file:
      # /opt/oracle.cellos/ipconf.pl -init -force
      
    3. Validate the interface has the VLAN configured by checking that the /etc/sysconfig/network-scripts directory contains files with the VLAN ID in the filename. For example, if the VLAN ID is 2122, you should see the following files:
      # ls -ltr /etc/sysconfig/network-scripts/*2122*
      
      -rw-r----- 1 root root 250 Sep  7 14:39 /etc/sysconfig/network-scripts/ifcfg-eth6.2122
      -rw-r----- 1 root root  85 Sep  7 14:39 /etc/sysconfig/network-scripts/route-eth6.2122
      -rw-r----- 1 root root  56 Sep  7 14:39 /etc/sysconfig/network-scripts/rule-eth6.2122
  4. Reboot the database server for the changes to take effect.
  5. Check that the network is working by pinging the gateway. For example:
    # ping 10.129.16.0

2.7.3 Configuring the Additional Network Card for an Oracle VM Environment

You can configure the additional network card on an Oracle Exadata Database Machine X6-2 and later database server for an Oracle VM environment.

This procedure assumes that you have already installed the network card in the Oracle Exadata Database Machine database server but have not yet completed the configuration with Oracle Exadata Deployment Assistant (OEDA).

Caution:

Do not attempt this procedure if you have already installed Oracle Grid Infrastructure on Oracle Exadata Database Machine.
  1. Add a section for the new network in the /EXAVMIMAGES/conf/virtual_machine_config_file configuration file in dom0.

    The following example assumes the bridge is called vmeth6, and the interface is called eth1. The virtual machine configuration file name is /EXAVMIMAGES/conf/test08adm01vm01.example.com-vm.xml.

    <Interfaces>
      <Bridge>vmeth6</Bridge>
      <Gateway>10.129.16.0</Gateway>
      <Hostname>test08adm02-bkup.example.com</Hostname>
      <IP_address>10.129.19.34</IP_address>
      <Name>eth1</Name>
      <IP_enabled>yes</IP_enabled>
      <IP_ssh_listen>disabled</IP_ssh_listen>
      <Net_type>Other</Net_type>
      <Netmask>255.255.248.0</Netmask>
      <Vlan_id>0</Vlan_id>
      <State>1</State>
      <Status>UP</Status>
    </Interfaces>
    

    If you are using VLANs, enter the appropriate VLAN ID [1-4095] in the <Vlan_id> element.

  2. Create the bridge.
    1. To create an unbonded bridge named vmeth6:
      # /opt/exadata_ovm/exadata.img.domu_maker add-single-bridge-dom0 vmeth6
      
    2. To create an bonded bridge, use a command similar to the following:
      # /opt/exadata_ovm/exadata.img.domu_maker add-bonded-bridge-dom0 bridge_name slave1 slave2 [vlan]

      slave1 and slave2 are the names of the bonded interfaces.

      For example:

      # /opt/exadata_ovm/exadata.img.domu_maker add-bonded-bridge-dom0 vmbondeth1 eth6 eth7
  3. Allocate the InfiniBand GUIDs:
    # /opt/exadata_ovm/exadata.img.domu_maker allocate-guids virtual_machine_config_file virtual_machine_config_file_final
    

    The virtual machine configuration files are located in the /EXAVMIMAGES/conf directory. For example:

    # /opt/exadata_ovm/exadata.img.domu_maker allocate-guids /EXAVMIMAGES/conf/
    test08adm01vm01.example.com-vm.xml /EXAVMIMAGES/conf/final-test08adm01vm01
    .example.com-vm.xml
    
  4. Shut down the guest and restart it.
    # /opt/exadata_ovm/exadata.img.domu_maker remove-domain /EXAVMIMAGES/conf
    /final-test08adm01vm01.example.com-vm.xml
    
    # /opt/exadata_ovm/exadata.img.domu_maker start-domain /EXAVMIMAGES/conf
    /final-test08adm01vm01.example.com-vm.xml
  5. Once the guest is running, use the ip addr command to verify the interface is valid.

    The following example checks the eth1 interface.

    # ip addr show eth1
    eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
      link/ether 00:16:3e:53:56:00 brd ff:ff:ff:ff:ff:ff
      inet 10.129.19.34/21 brd 10.129.23.255 scope global eth1
         valid_lft forever preferred_lft forever
    

2.8 Increasing the Number of Active Cores on Database Servers

You can increase the number of active cores on Oracle Exadata Database Machine using capacity-on-demand.

The number of active cores on the database servers on Oracle Exadata Database Machine X4-2 and newer systems can be reduced during installation. The number of active cores can be increased when additional capacity is needed. This is known as capacity-on-demand.

Additional cores are increased in 2-core increments on Oracle Exadata Database Machine X4-2 and newer systems, and in 8-core increments on Oracle Exadata Database Machine X4-8 Full Rack and newer systems. The following table lists the capacity-on-demand core processor configurations.

Table 2-3 Capacity-on-Demand Core Processor Configurations

Oracle Exadata Database Machine Eligible Systems Minimum Cores per Server Maximum Cores per Server Core Increments

Oracle Exadata Database Machine X7-2 and X8-2

Any configuration except Eighth Rack

14

48

From 14 to 48, in multiples of 2:

14, 16, 18, ..., 46, 48

Oracle Exadata Database Machine X7-2 and X8-2

Eighth rack

8

24

From 8 to 24, in multiples of 2:

8, 10, 12, ..., 22, 24

Oracle Exadata Database Machine X6-2

Any configuration except Eighth Rack

14

44

From 14 to 44, in multiples of 2:

14, 16, 18, ..., 42, 44

Oracle Exadata Database Machine X6-2

Eighth rack

8

22

From 8 to 22, in multiples of 2:

8, 10, 12, ..., 20, 22

Oracle Exadata Database Machine X5-2

Any configuration except Eighth Rack

14

36

From 14 to 36, in multiples of 2:

14, 16, 18, ..., 34, 36

Oracle Exadata Database Machine X5-2

Eighth rack

8

18

From 8 to 18, in multiples of 2:

8, 10, 12, 14, 16, 18

Oracle Exadata Database Machine X4-2

Full rack

Half rack

Quarter rack

12

24

From 12 to 24, in multiples of 2:

12, 14, 16, 18, 20, 22, 24

Oracle Exadata Database Machine X7-8 and X8-8

Any configuration

56

192

From 56 to 192, in multiples of 8:

56, 64, 72, ..., 184, 192

Oracle Exadata Database Machine X6-8 and X5-8

Any configuration

56

144

From 56 to 144, in multiples of 8:

56, 64, 72, ..., 136, 144

Oracle Exadata Database Machine X4-8

Full rack

48

120

From 48 to 120, in multiples of 8:

48, 56, 64, ..., 112, 120

Note:

Oracle recommends licensing the same number of cores on each server, in case of failover.

Database servers can be added one at a time, and capacity-on-demand can be applied to the individual database servers. This option includes Oracle Exadata Database Machine X5-2 Eighth Racks.

The database server must be restarted after enabling additional cores. If the database servers are part of a cluster, then they can be enabled in a rolling fashion.

  1. Verify the number of active physical cores using the following command:
    DBMCLI> LIST DBSERVER attributes coreCount
    
  2. Use the following command to increase the number of active physical cores:
    DBMCLI> ALTER DBSERVER pendingCoreCount = new_number_of_active_physical_cores
    
  3. Verify the pending number of active physical cores using the following command:
    DBMCLI> LIST DBSERVER attributes pendingCoreCount
    
  4. Restart the server.
  5. Verify the number of active physical cores using the following command:
    DBMCLI> LIST DBSERVER attributes coreCount
    

2.9 Extending LVM Partitions

Logical Volume Manager (LVM) provides flexibility to reorganize the partitions in the database servers.

Note:

  • Keep at least 1 GB of free space in the VGExaDb volume group. This space is used for the LVM snapshot created by the dbnodeupdate.sh utility during software maintenance.

  • If you make snapshot-based backups of the / (root) and /u01 directories by following the steps in Creating a Snapshot-Based Backup of Oracle Linux Database Server, then keep at least 6 GB of free space in the VGExaDb volume group.

This section contains the following topics:

2.9.1 Extending the root LVM Partition

The procedure for extending the root LVM partition depends on your Oracle Exadata System Software release.

The following sections contain the procedures, based on the Oracle Exadata System Software release:

2.9.1.1 Extending the root LVM Partition on Systems Running Oracle Exadata System Software Release 11.2.3.2.1 or Later

The following procedure describes how to extend the size of the root (/) partition on systems running Oracle Exadata System Software release 11.2.3.2.1 or later:

Note:

  • This procedure does not require an outage on the server.

  • For management domain systems, the active and inactive Sys LVM's are LVDbSys2 and LVDbSys3 instead of LVDbSys1 and LVDbSys2.

  • Make sure that LVDbSys1 and LVDbSys2 are sized the same.

  1. Collect information about the current environment.
    1. Use the df command to identify the amount of free and used space in the root partition (/).
      # df -h /
      

      The following is an example of the output from the command:

      Filesystem            Size  Used Avail Use% Mounted on
      /dev/mapper/VGExaDb-LVDbSys1
                             30G  22G  6.2G  79% / 
      

      Note:

      The active root partition may be either LVDbSys1 or LVDbSys2, depending on previous maintenance activities.

    2. Use the lvs command to display the current volume configuration.
      # lvs -o lv_name,lv_path,vg_name,lv_size
      

      The following is an example of the output from the command:

      LV                 Path                            VG       LSize
      LVDbOra1           /dev/VGExaDb/LVDbOra1           VGExaDb  100.00g
      LVDbSwap1          /dev/VGExaDb/LVDbSwap1          VGExaDb  24.00g
      LVDbSys1           /dev/VGExaDb/LVDbSys1           VGExaDb  30.00g
      LVDbSys2           /dev/VGExaDb/LVDbSys2           VGExaDb  30.00g
      LVDoNotRemoveOrUse /dev/VGExaDb/LVDoNotRemoveOrUse VGExaDb  1.00g
      
  2. Use the tune2fs command to check the online resize option.
    tune2fs -l /dev/mapper/vg_name-lv_name | grep resize_inode
    

    For example:

    tune2fs -l /dev/mapper/VGExaDb-LVDbSys1 | grep resize_inode
    

    The resize_inode option should be listed in the output from the command. If the option is not listed, then the file system must be unmounted before resizing the partition. Refer to Extending the root LVM Partition on Systems Running Oracle Exadata System Software Earlier than Release 11.2.3.2.1 to resize the partition.

  3. Verify there is available space in the volume group VGExaDb using the vgdisplay command.
    # vgdisplay -s
    

    The following is an example of the output from the command:

    "VGExaDb" 834.89 GB [184.00 GB used / 650.89 GB free]
    

    The volume group must contain enough free space to increase the size of both system partitions, and maintain at least 1 GB of free space for the LVM snapshot created by the dbnodeupdate.sh utility during upgrade.

    If there is not enough free space, then verify that the reclaimdisks.sh utility has been run. If the utility has not been run, then use the following command to reclaim disk space:

    # /opt/oracle.SupportTools/reclaimdisks.sh -free -reclaim 
    

    If the utility has been run and there is not enough free space, then the LVM cannot be resized.

    Note:

    reclaimdisks.sh cannot run at the same time as a RAID rebuild (that is, a disk replacement or expansion). Wait until the RAID rebuild is complete, then run reclaimdisks.sh.

  4. Resize both LVDbSys1 and LVDbSys2 logical volumes using the lvextend command.

    In the following example, XG is the amount of space in GB that the logical volume will be extended. The amount of space added to each system partition must be the same.

    # lvextend -L +XG --verbose /dev/VGExaDb/LVDbSys1
    # lvextend -L +XG --verbose /dev/VGExaDb/LVDbSys2
    

    The following example extends the logical volumes by 10 GB:

    # lvextend -L +10G /dev/VGExaDb/LVDbSys1
    # lvextend -L +10G /dev/VGExaDb/LVDbSys2
    
  5. Resize the file system within the logical volume using the resize2fs.
    # resize2fs /dev/VGExaDb/LVDbSys1
    # resize2fs /dev/VGExaDb/LVDbSys2
    
  6. Verify the space was extended for the active system partition using the df command.
    # df -h /
    
2.9.1.2 Extending the root LVM Partition on Systems Running Oracle Exadata System Software Earlier than Release 11.2.3.2.1

The following procedure describes how to extend the size of the root (/) partition on systems running Oracle Exadata System Software earlier than release 11.2.3.2.1:

Note:

  • This procedure requires the system to be offline and restarted.

  • Keep at least 1 GB of free space in the VGExaDb volume group to be used for the LVM snapshot created by the dbnodeupdate.sh utility during software maintenance. If you make snapshot-based backups of the / (root) and /u01 directories by following the steps in "Creating a Snapshot-Based Backup of Oracle Linux Database Server", then keep at least 6 GB of free space in the VGExaDb volume group.

  • For management domain systems, active and inactive Sys LVM's are LVDbSys2 and LVDbSys3 instead of LVDbSys1 and LVDbSys2.

  • Make sure LVDbSys1 and LVDbSys2 are sized the same.

  1. Collect information about the current environment.
    1. Use the df command to identify the mount points for the root partition (/) and the non-root partition (/u01), and their respective LVMs.

      The following is an example of the output from the command:

      # df
      Filesystem                    1K-blocks   Used    Available Use% Mounted on
      /dev/mapper/VGExaDb-LVDbSys1 30963708   21867152   7523692  75%    /
      /dev/sda1                      126427      16355    103648  14%    /boot
      /dev/mapper/VGExaDb-LVDbOra1 103212320  67404336  30565104  69%    /u01
      tmpfs                         84132864   3294608  80838256   4%    /dev/shm
      

      The file system name in the df command output is in the following format:

      /dev/mapper/VolumeGroup-LogicalVolume
      

      The full logical volume name of the root file system in the preceding example is /dev/VGExaDb/LVDbSys1.

    2. Use the lvscan command to display logical volumes.
      #lvm lvscan
      
      ACTIVE            '/dev/VGExaDb/LVDbSys1'  [30.00 GB]  inherit
      ACTIVE            '/dev/VGExaDb/LVDbSwap1' [24.00 GB]  inherit
      ACTIVE            '/dev/VGExaDb/LVDbOra1'  [100.00 GB] inherit
      
    3. Use the lvdisplay command to display the current logical volume and the volume group configuration.
      #lvm lvdisplay /dev/VGExaDb/LVDbSys1
      
      --- Logical volume ---
      LV Name               /dev/VGExaDb/LVDbSys1
      VG Name               VGExaDb
      LV UUID               GScpD7-lKa8-gLg9-oBo2-uWaM-ZZ4W-Keazih
      LV Write Access       read/write
      LV Status             available
      # open                1
      LV Size               30.00 GB
      Current LE            7680
      Segments              1
      Allocation            inherit
      Read ahead sectors    auto
      - currently set to    256
      Block device          253:0
      
    4. Verify there is available space in the volume group VGExaDb so the logical volume can be extended.
      # lvm vgdisplay VGExaDb -s
      "VGExaDb" 556.80 GB [154.00 GB used / 402.80 GB free]
      

      If the command shows there is zero free space, then neither the logical volume or the file system can be extended.

  2. Restart the system in diagnostic mode.
    1. Copy the /opt/oracle.SupportTools/diagnostics.iso file to a directory on the machine using the ILOM interface.
    2. Log in to the ILOMweb interface.
    3. Select the Remote Control tab.
    4. Select the Redirection tab.
    5. Click Launch Remote Console.

      The ILOM Remote Console window is displayed.

    6. Select the Devices menu in the ILOM Remote Console window.
    7. Click CD-ROM image.
    8. Navigate to the location of the diagnostics.iso file on the local machine in the File Open dialog box.
    9. Select the diagnostics.iso file.
    10. Click Open.

      A message similar to the following appears on the console.

    11. Select Host Control from the Remote Control tab.
    12. Select CDROM as the next boot device from the list of values.
    13. Click Save.

      When the system is restarted, the diagnostics.iso image is used. The system reverts to the default after the next restart.

    14. Log in as the root user in the ILOM Remote Console window.
    15. Restart the server.
      # shutdown -r now
      

      The system restarts using the diagnostics.iso image.

  3. Enter e to enter the diagnostic shell.
    Choose from following by typing letter in '()':
    (e)nter interactive diagnostics shell. Must use credentials from Oracle
    support to login (reboot or power cycle to exit the shell),
    (r)estore system from NFS backup archive,
    Select:e
    
  4. Log in to the system as the root user. You will be prompted for the password.
    localhost login: root
    Password: *********
    -sh-3.1# 
    
  5. Unmount the root file system.
    # cd /
    # umount /mnt/cell
    
  6. Verify the logical volume name.
    # lvm lvscan
    ACTIVE '/dev/VGExaDb/LVDbSys1' [30.00 GB] inherit
    ACTIVE '/dev/VGExaDb/LVDbSwap1' [24.00 GB] inherit
    ACTIVE '/dev/VGExaDb/LVDbOra1' [100.00 GB] inherit
    
  7. Resize the LVDbSys1 and LVDbSys2 holding the current and backup root file system.

    In the following commands, XG is the amount of space in GB that the logical volume will be extended.

    # lvm lvextend -L+XG --verbose /dev/VGExaDb/LVDbSys1
    # lvm lvextend -L+XG --verbose /dev/VGExaDb/LVDbSys2
    

    For example, if the logical volume is expanded 5 GB, then the commands would be:

    # lvm lvextend -L+5G --verbose /dev/VGExaDb/LVDbSys1
    # lvm lvextend -L+5G --verbose /dev/VGExaDb/LVDbSys2
    
  8. Verify the file system is valid using e2fsck.
    # e2fsck -f /dev/VGExaDb/LVDbSys1
    # e2fsck -f /dev/VGExaDb/LVDbSys2
    
  9. Resize the file system.
    # resize2fs -p /dev/VGExaDb/LVDbSys1
    # resize2fs -p /dev/VGExaDb/LVDbSys2
    
  10. Restart the system in normal mode.
    # reboot
    
  11. Log in to the system.
  12. Verify the root file system mount mounts without issues with the new size.

2.9.2 Resizing a Non-root LVM Partition

The procedure for resizing a non-root LVM partition depends on your Oracle Exadata System Software release.

The following sections contain the procedures, based on the Oracle Exadata System Software release:

2.9.2.1 Extending a Non-root LVM Partition on Systems Running Oracle Exadata System Software Release 11.2.3.2.1 or Later

This procedure describes how to extend the size of a non-root (/u01) partition on systems running Oracle Exadata System Software release 11.2.3.2.1 or later.

This procedure does not require an outage on the server.

  1. Collect information about the current environment.
    1. Use the df command to identify the amount of free and used space in the /u01 partition.
      # df -h /u01
      

      The following is an example of the output from the command:

      Filesystem            Size  Used Avail Use% Mounted on
      /dev/mapper/VGExaDb-LVDbOra1
                            99G   25G  70G   26% /u01
    2. Use the lvs command to display the current logical volume configuration used by the /u01 file system.
      # lvs -o lv_name,lv_path,vg_name,lv_size
      

      The following is an example of the output from the command:

       LV        Path                   VG      LSize
       LVDbOra1  /dev/VGExaDb/LVDbOra1  VGExaDb 100.00G
       LVDbSwap1 /dev/VGExaDb/LVDbSwap1 VGExaDb  24.00G
       LVDbSys1  /dev/VGExaDb/LVDbSys1  VGExaDb  30.00G
       LVDbSys2  /dev/VGExaDb/LVDbSys2  VGExaDb  30.00G
      
  2. Use the tune2fs command to check the online resize option.
    tune2fs -l /dev/mapper/vg_name | grep resize_inode
    

    The resize_inode option should be listed in the output from the command. If the option is not listed, then the file system must be unmounted before resizing the partition. Refer to "Extending a Non-root LVM Partition on Systems Running Oracle Exadata System Software Earlier than Release 11.2.3.2.1" when resizing the partition.

  3. Verify there is available space in the volume group VGExaDb using the vgdisplay command.
    # vgdisplay -s
    

    The following is an example of the output from the command:

    "VGExaDb" 834.89 GB [184.00 GB used / 650.89 GB free]
    

    If the output shows there is less than 1 GB of free space, then neither the logical volume nor file system should be extended. Maintain at least 1 GB of free space in the VGExaDb volume group for the LVM snapshot created by the dbnodeupdate.sh utility during an upgrade.

    If there is not enough free space, then verify that the reclaimdisks.sh utility has been run. If the utility has not been run, then use the following command to reclaim disk space:

    # /opt/oracle.SupportTools/reclaimdisks.sh -free -reclaim 
    

    If the utility has been run and there is not enough free space, then the LVM cannot be resized.

    Note:

    • reclaimdisks.sh cannot run at the same time as a RAID rebuild (that is, a disk replacement or expansion). Wait until the RAID rebuild is complete, then run reclaimdisks.sh.

  4. Resize the logical volume using the lvextend command.
    # lvextend -L +sizeG /dev/VGExaDb/LVDbOra1
    

    In the preceding command, size is the amount of space to be added to the logical volume.

    The following example extends the logical volume by 10 GB:

    # lvextend -L +10G /dev/VGExaDb/LVDbOra1
    
  5. Resize the file system within the logical volume using the resize2fs command.
    # resize2fs /dev/VGExaDb/LVDbOra1
  6. Verify the space was extended using the df command
    # df -h /u01
    
2.9.2.2 Extending a Non-root LVM Partition on Systems Running Oracle Exadata System Software Earlier than Release 11.2.3.2.1

This procedure describes how to extend the size of a non-root (/u01) partition on systems running Oracle Exadata System Software earlier than release 11.2.3.2.1.

In this procedure, /dev/VGExaDb/LVDbOra1 is mounted at /u01.

Note:

  • Keep at least 1 GB of free space in the VGExaDb volume group. This space is used for the LVM snapshot created by the dbnodeupdate.sh utility during software maintenance.

  • If you make snapshot-based backups of the / (root) and /u01 directories by following the steps in Creating a Snapshot-Based Backup of Oracle Linux Database Server, then keep at least 6 GB of free space in the VGExaDb volume group.

  1. Collect information about the current environment.
    1. Use the df command to identify the mount points for the root partition (/) and the non-root partition (/u01), and their respective LVMs.
      # df
      Filesystem                    1K-blocks   Used    Available Use% Mounted on
      /dev/mapper/VGExaDb-LVDbSys1 30963708   21867152   7523692  75%    /
      /dev/sda1                      126427      16355    103648  14%    /boot
      /dev/mapper/VGExaDb-LVDbOra1 103212320  67404336  30565104  69%    /u01
      tmpfs                         84132864   3294608  80838256   4%    /dev/shm
      
    2. Use the lvm lvscan command to display logical volumes.
      ACTIVE            '/dev/VGExaDb/LVDbSys1'  [30.00 GB]  inherit
      ACTIVE            '/dev/VGExaDb/LVDbSwap1' [24.00 GB]  inherit
      ACTIVE            '/dev/VGExaDb/LVDbOra1'  [100.00 GB] inherit
      
    3. Use the lvdisplay command to display the current volume group configuration.
      # lvdisplay /dev/VGExaDb/LVDbOra1
      
      --- Logical volume ---
      LV Name               /dev/VGExaDb/LVDbOra1
      VG Name               VGExaDb
      LV UUID               vzoIE6-uZrX-10Du-UD78-314Y-WXmz-f7SXyY
      LV Write Access       read/write
      LV Status             available
      # open                1
      LV Size               100.00 GB
      Current LE            25600
      Segments              1
      Allocation            inherit
      Read ahead sectors    auto
      - currently set to    256
      Block device          253:2
      
    4. Verify there is available space in the volume group VGExaDb so the logical drive can be extended.

      If the command shows there is zero free space, then neither the logical volume or file system can be extended.

      # lvm vgdisplay VGExaDb -s
      
      "VGExaDb" 556.80 GB [154.00 GB used / 402.80 GB free]
      
  2. Shut down any software that uses /u01.

    The following software typically uses /u01:

    • Oracle Clusterware, Oracle ASM, and Oracle Database

      # Grid_home/bin/crsctl stop crs
      
    • Trace File Analyzer

      # Grid_home/bin/tfactl stop
      
    • OS Watcher

      # /opt/oracle.oswatcher/osw/stopOSW.sh
      
    • Oracle Enterprise Manager agent

      (oracle)$ agent_home/bin/emctl stop agent
      
  3. Unmount the partition as the root user.
    # umount /u01
    

    Note:

    If the umount command reports that the file system is busy, then use the fuser(1) command to identify processes still accessing the file system that must be stopped before the umount command will succeed.

    # umount /u01
    umount: /u01: device is busy
    umount: /u01: device is busy
     
    # fuser -mv /u01
     
            USER      PID ACCESS COMMAND
    /u01:   root     6788 ..c..  ssh
            root     8422 ..c..  bash
            root    11444 ..c..  su
            oracle  11445 ..c..  bash
            oracle  11816 ....m  mgr
            root    16451 ..c..  bash
  4. Verify the file system.
    # e2fsck -f /dev/VGExaDb/LVDbOra1
    
  5. Extend the partition.

    In this example, the logical volume is expanded to 80% of the physical volume size. At the same time, the file system is resized with the command.

    # lvextend -L+XG --verbose /dev/VGExaDb/LVDbOra1
    

    In the preceding command, XG is the amount of GB the logical volume will be extended. The following example shows how to extend the logical volume by an additional 200 GB:

    # lvextend -L+200G --verbose /dev/VGExaDb/LVDbOra1
    

    Caution:

    Use extreme caution when reducing the size. The new size must be large enough to hold all the original content of the partition. To reduce the size, use a command similar to the following:

    lvreduce -L60G --resizefs --verbose /dev/VGExaDb/LVDbOra1
    

    In the preceding command, the size of /u01 is reduced to 60 GB.

  6. Check the /u01 file system using the e2fsck command.
    # e2fsck -f /dev/VGExaDb/LVDbOra1
    
  7. Resize the /u01 file system.
    # resize2fs -p /dev/VGExaDb/LVDbOra1
    
  8. Mount the partition.
    # mount -t ext3 /dev/VGExaDb/LVDbOra1 /u01
    
  9. Verify the space was extended.
    $ df -h /u01
    
  10. Restart any software that was stopped in step 2.
    • Oracle Clusterware, Oracle ASM, and Oracle Database

      # Grid_home/bin/crsctl start crs
      
    • Trace File Analyzer

      # Grid_home/bin/tfactl start
      
    • OS Watcher

      # /opt/oracle.cellos/vldrun -script oswatcher
      
    • Oracle Enterprise Manager agent

      (oracle)$ agent_home/bin/emctl start agent
      
2.9.2.3 Reducing a Non-root LVM Partition on Systems Running Oracle Exadata System Software Release 11.2.3.2.1 or Later

You can reduce the size of a non-root (/u01) partition on systems running Oracle Exadata System Software release 11.2.3.2.1 or later.

Note:

  • This procedure does not require an outage on the server.

  • It is recommended that you back up your file system before performing this procedure.

  1. Use the df command to determine the amount of free and used space in the /u01 partition:
    # df -h /u01
    

    The following is an example of the output:

    Filesystem                    Size  Used Avail Use% Mounted on
    /dev/mapper/VGExaDb-LVDbOra1  193G   25G  159G  14% /u01
    
  2. Use the lvm command to display the current logical volume configuration used by the /u01 file system.

    In this example, the size of the LVDbOra1 partition needs to be reduced so that LVDbSys2 (30.00 GB in size) can be created by the dbserver_backup.sh script.

    # lvm vgdisplay VGExaDb -s
      "VGExaDb" 271.82 GB [250.04 GB used / 21.79 GB free]
    
    # lvm lvscan
      ACTIVE            '/dev/VGExaDb/LVDbSys1' [30.00 GB] inherit
      ACTIVE            '/dev/VGExaDb/LVDbSwap1' [24.00 GB] inherit
      ACTIVE            '/dev/VGExaDb/LVDbOra1' [196.04 GB] inherit
    
  3. Shut down any software that uses /u01.

    The following software typically uses /u01:

    • Oracle Clusterware, Oracle ASM, and Oracle Database

      # Grid_home/bin/crsctl stop crs
      
    • Trace File Analyzer

      # Grid_home/bin/tfactl stop
      
    • OS Watcher (releases earlier than 11.2.3.3.0)

      # /opt/oracle.oswatcher/osw/stopOSW.sh
      
    • ExaWatcher (release 11.2.3.3.0 and later)

      # /opt/oracle.ExaWatcher/ExaWatcher.sh --stop
      
    • Oracle Enterprise Manager agent

      (oracle)$ agent_home/bin/emctl stop agent
      
  4. Unmount the partition as the root user.
    # umount /u01
    

    Note:

    If the umount command reports that the file system is busy, then use the fuser(1) command to identify the processes still accessing the file system that must be stopped before the umount command will succeed.

    # umount /u01
    umount: /u01: device is busy
    umount: /u01: device is busy
    
    # fuser -mv /u01
    
            USER      PID ACCESS COMMAND
    /u01:   root     6788 ..c..  ssh
            root     8422 ..c..  bash
            root    11444 ..c..  su
            oracle  11445 ..c..  bash
            oracle  11816 ....m  mgr
            root    16451 ..c..  bash
  5. Verify the file system.
    # e2fsck -f /dev/VGExaDb/LVDbOra1
    
    fsck 1.39 (29-May-2006)
    e2fsck 1.39 (29-May-2006)
    Pass 1: Checking inodes, blocks, and sizes
    Pass 2: Checking directory structure
    Pass 3: Checking directory connectivity
    Pass 4: Checking reference counts
    Pass 5: Checking group summary information
    DBORA: 72831/25706496 files (2.1% non-contiguous), 7152946/51389440 blocks
    
  6. Resize the file system to the required size (120G in the example below).
    # resize2fs /dev/VGExaDb/LVDbOra1 120G
    resize2fs 1.39 (29-May-2017)
    Resizing the filesystem on /dev/VGExaDb/LVDbOra1 to 26214400 (4k) blocks.
    The filesystem on /dev/VGExaDb/LVDbOra1 is now 26214400 blocks long.
    
  7. Resize the LVM to the desired size.
    # lvm lvreduce -L 120G --verbose /dev/VGExaDb/LVDbOra1
        Finding volume group VGExaDb
      WARNING: Reducing active logical volume to 120.00 GB
      THIS MAY DESTROY YOUR DATA (filesystem etc.)
    Do you really want to reduce LVDbOra1? [y/n]: y
        Archiving volume group "VGExaDb" metadata (seqno 8).
      Reducing logical volume LVDbOra1 to 120.00 GB
        Found volume group "VGExaDb"
        Found volume group "VGExaDb"
        Loading VGExaDb-LVDbOra1 table (253:2)
        Suspending VGExaDb-LVDbOra1 (253:2) with device flush
        Found volume group "VGExaDb"
        Resuming VGExaDb-LVDbOra1 (253:2)
        Creating volume group backup "/etc/lvm/backup/VGExaDb" (seqno 9).
      Logical volume LVDbOra1 successfully resized
    
  8. Mount the partition.
    # mount -t ext3 /dev/VGExaDb/LVDbOra1 /u01
    
  9. Verify the space was reduced.
    # df -h /u01
    Filesystem                    Size  Used Avail Use% Mounted on
    /dev/mapper/VGExaDb-LVDbOra1  119G   25G   88G  22% /u01
    
    # lvm vgdisplay -s
      "VGExaDb" 271.82 GB [174.00 GB used / 97.82 GB free]
    
  10. Restart any software that was stopped in step 3.
    • Oracle Clusterware, Oracle ASM, and Oracle Database

      # Grid_home/bin/crsctl start crs
      
    • Trace File Analyzer

      # Grid_home/bin/tfactl start
      
    • OS Watcher (releases earlier than 11.2.3.3.0)

      # /opt/oracle.cellos/vldrun -script oswatcher
      
    • ExaWatcher (release 11.2.3.3.0 and later)

      # /opt/oracle.cellos/vldrun -script oswatcher
      
    • Oracle Enterprise Manager agent

      (oracle)$ agent_home/bin/emctl start agent
      

2.9.3 Extending the Swap Partition

This procedure describes how to extend the size of the swap (/swap) partition.

Note:

This procedure requires the system to be offline and restarted.

Keep at least 1 GB of free space in the VGExaDb volume group to be used for the Logical Volume Manager (LVM) snapshot created by the dbnodeupdate.sh utility during software maintenance. If you make snapshot-based backups of the / (root) and /u01 directories by following the steps in "Creating a Snapshot-Based Backup of Oracle Linux Database Server", then keep at least 6 GB of free space in the VGExaDb volume group.

  1. Collect information about the current environment.
    1. Use the swapon command to identify the swap partition.
      # swapon -s
      Filename    Type        Size       Used   Priority
      /dev/dm-2   partition   25165816   0      -1
      
    2. Use the lvm lvscan command to display the logical volumes.
      # lvm lvscan
      ACTIVE '/dev/VGExaDb/LVDbSys1' [30.00 GiB] inherit
      ACTIVE '/dev/VGExaDb/LVDbSys2' [30.00 GiB] inherit
      ACTIVE '/dev/VGExaDb/LVDbSwap1' [24.00 GiB] inherit
      ACTIVE '/dev/VGExaDb/LVDbOra1' [103.00 GiB] inherit
      ACTIVE '/dev/VGExaDb/LVDoNotRemoveOrUse' [1.00 GiB] inherit
      
    3. Use the vgdisplay command to display the current volume group configuration.
      # vgdisplay
        --- Volume group ---
        VG Name               VGExaDb
        System ID            
        Format                lvm2
        Metadata Areas        1
        Metadata Sequence No  4
        VG Access             read/write
        VG Status             resizable
        MAX LV                0
        Cur LV                3
        Open LV               3
        Max PV                0
        Cur PV                1
        Act PV                1
        VG Size               556.80 GB
        PE Size               4.00 MB
        Total PE              142541
        Alloc PE / Size       39424 / 154.00 GB
        Free  PE / Size       103117 / 402.80 GB
        VG UUID               po3xVH-9prk-ftEI-vijh-giTy-5chm-Av0fBu
      
    4. Use the pvdisplay command to display the name of the physical device created by LVM and used with the operating system.
      # pvdisplay
        --- Physical volume ---
        PV Name               /dev/sda2
        VG Name               VGExaDb
        PV Size               556.80 GB / not usable 2.30 MB
        Allocatable           yes
        PE Size (KByte)       4096
        Total PE              142541
        Free PE               103117
        Allocated PE          39424
        PV UUID               Eq0e7e-p1fS-FyGN-zrvj-0Oqd-oUSb-55x2TX
  2. Attach the /opt/oracle.SupportTools/diagnostics.iso file from any healthy database server as virtual media to the ILOM of the database server to be restored.

    The following is an example of how to set up a virtual CD-ROM using the ILOM interface:

    1. If you installed Oracle Exadata System Software release 19.1 or later on the database server, then download the diagnostics.iso file from My Oracle Support using the relevant patch for your Oracle Exadata System Software release, for example, patch 27162010 for release 19.1.0.0.0.
    2. If you installed Oracle Exadata System Software release 18.1.x or earlier on the database server, then copy the /opt/oracle.SupportTools/diagnostics.iso file to a directory on the machine using the ILOM interface.
    3. Log in to the ILOM web interface.
    4. Select Remote Console from the Remote Control tab. This will start the console.
    5. Select the Devices menu.
    6. Select the CD-ROM image option.
    7. Navigate to the location of the diagnostics.iso file.
    8. Open the diagnostics.iso file.
    9. Select Host Control from the Remote Control tab.
    10. Select CDROM as the next boot device from the list of values.
    11. Click Save.
  3. Restart the server in diagnostic mode.
  4. Verify the file system is valid.

    Use the following command:

    #fsck -f /dev/VGExaDb/LVDbSwap1
    
  5. Extend the partition.

    In this example, the logical volume is expanded to 80% of the physical volume size. At the same time, the file system is resized with this command. In the following command, the value for LogicalVolumePath is obtained by the lvm lvscan command, and the value for PhysicalVolumePath is obtained by the pvdisplay command.

    #lvextend -l+80%PVS --resizefs --verbose LogicalVolumePath PhysicalVolumePath
    
  6. Restart the system in normal mode.

2.10 Creating a Snapshot-Based Backup of Oracle Linux Database Server

A backup should be made before and after every significant change to the software on the database server. For example, a backup should be made before and after the following procedures:

  • Application of operating system patches
  • Application of Oracle patches
  • Reconfiguration of significant operating parameters
  • Installation or reconfiguration of significant non Oracle software

Starting with Oracle Exadata System Software release 19.1.0, the SSHD ClientAliveInterval defaults to 600 seconds. If the time needed for completing backups exceeds 10 minutes, then you can specify a larger value for ClientAliveInterval in the /etc/ssh/sshd_config file. You must restart the SSH service for changes to take effect. After the long running operation completes, remove the modification to the ClientAliveInterval parameter and restart the SSH service.

This section contains the following topics:

2.10.1 Creating a Snapshot-Based Backup of Oracle Linux Database Server with Uncustomized Partitions

This procedure describes how to take a snapshot-based backup. The values shown in the procedure are examples.

If you have not customized the database server partitions from their original shipped configuration, then use the procedures in this section to take a backup and use the backup to restore the database server using the backup.

Note:

  • The recovery procedure restores the exact partitions, including the name and sizes, as they were originally shipped. If you modified the partitions in any way, then you cannot use this procedure. Modifications include changing sizes, renaming, addition or removal of partitions.

  • All steps must be performed as the root user.

  1. Prepare a destination to hold the backup.

    The destination can be a large, writable NFS location. The NFS location should be large enough to hold the backup tar files. For uncustomized partitions, 145 GB should be adequate.

    1. Create a mount point for the NFS share.
      mkdir -p /root/tar
    2. Mount the NFS location.

      In the following command, ip_address is the IP address of the NFS server, and nfs_location is the NFS location.

      mount -t nfs -o rw,intr,soft,proto=tcp,nolock
      ip_address:/nfs_location/ /root/tar
      
  2. Take a snapshot-based backup of the / (root), /u01, and /boot directories.
    1. Create a snapshot named root_snap for the root directory.

      LVDbSys1 is used in the example below, but you should use the value based on the output of imageinfo. If the active image is on LVDbSys2, then the command would be: lvcreate -L1G -s -c 32K -n root_snap /dev/VGExaDb/LVDbSys2.

      lvcreate -L1G -s -c 32K -n root_snap /dev/VGExaDb/LVDbSys1
    2. Label the snapshot.
      e2label /dev/VGExaDb/root_snap DBSYS_SNAP
      
    3. Determine the file system type of the / (root) and /u01 directories.

      Database servers running 12.1.2.1.0 or later use the "ext4" file system type, while older systems use "ext3". Systems older than X5 with cells updated to 12.1.2.1.0 or later also use "ext3".

      # mount -l
      
    4. Mount the snapshot.

      In the mount command below, filesystem_type_of_/u01_directory is a placeholder for the file system type as determined in the previous step.

      mkdir /root/mnt
      mount /dev/VGExaDb/root_snap /root/mnt -t filesystem_type_of_/u01_directory
      
    5. Create a snapshot named u01_snap for the /u01 directory.
      lvcreate -L5G -s -c 32K -n u01_snap /dev/VGExaDb/LVDbOra1
    6. Label the snapshot.
      e2label /dev/VGExaDb/u01_snap DBORA_SNAP
      
    7. Mount the snapshot.

      In the mount command below, filesystem_type_of_/u01_directory is a placeholder for the file system type as determined in step 2.c above.

      mkdir -p /root/mnt/u01
      mount /dev/VGExaDb/u01_snap /root/mnt/u01 -t filesystem_type_of_/u01_directory
      
    8. Change to the directory for the backup.
      cd /root/mnt
    9. Create the backup file using one of the following commands:
      • System does not have NFS mount points:

        # tar -pjcvf /root/tar/mybackup.tar.bz2 * /boot --exclude \
        tar/mybackup.tar.bz2 > /tmp/backup_tar.stdout 2> /tmp/backup_tar.stderr
        
      • System has NFS mount points:

        In the following command, nfs_mount_points are the NFS mount points. Excluding the mount points prevents the generation of large files and long backup times.

        # tar -pjcvf /root/tar/mybackup.tar.bz2 * /boot --exclude \
        tar/mybackup.tar.bz2 --exclude nfs_mount_points >         \
        /tmp/backup_tar.stdout 2> /tmp/backup_tar.stderr
        
    10. Check the /tmp/backup_tar.stderr file for any significant errors.
      Errors about failing to tar open sockets, and other similar errors, can be ignored.
  3. Unmount the snapshots and remove the snapshots for the / (root) and /01 directories.
    cd /
    umount /root/mnt/u01
    umount /root/mnt
    /bin/rm -rf /root/mnt
    lvremove /dev/VGExaDb/u01_snap
    lvremove /dev/VGExaDb/root_snap
    
  4. Unmount the NFS share.
    umount /root/tar

2.10.2 Creating a Snapshot-Based Backup of Oracle Linux Database Server with Customized Partitions

When you have customized the partitions, the procedure to back up is the same as the procedure used for non-customized database servers, with the following exceptions:

  • You must add any additional partitions similar to /u01 to the backup.

  • If any partitions were renamed, then use the names defined for your environment. For example, if /u01 was renamed to /myown_u01, then use /myown_u01 in the commands.

2.11 Backing up the Management Domain (dom0) and User Domains (domU) in an Oracle Virtual Server Deployment

In an Oracle Virtual Server deployment, you need to back up the management domain, dom0, and the user domains (domU's):

2.11.1 Backing up the Management Domain dom0 Using Snapshot-Based Backup

This procedure describes how to take a snapshot-based backup of the management domain, dom0.

The logical volume /dev/VGExaDb/LVDoNotRemoveOrUse is a placeholder to make sure there is always free space available to create a snapshot. If you run dbserver_backup.sh, then the placeholder LVM is removed by the script, the free space is used for a snapshot, and the LVM is re-created after the snapshot is created. If you follow the manual procedure described here, then you have to perform all these tasks manually.

The values shown in the steps below are examples. All steps must be performed as the root user.

  1. Prepare a destination to hold the backup.

    The destination should reside outside of the local machine, such as a writable NFS location, and be large enough to hold the backup tar file(s). For non-customized partitions, the space needed for holding the backup is around 60 GB.

    The following commands may be used to prepare the backup destination.

    # mkdir -p /remote_FS
    
    # mount -t nfs -o rw,intr,soft,proto=tcp,nolock ip_address:/nfs_location/ /remote_FS
    

    ip_address is the IP address of the NFS server, and nfs_location is the NFS location holding the backups.

  2. Take a snapshot-based backup of the file system hosting the / (root) directory.
    1. Check for the existence of the LVDoNotRemoveOrUse logical volume.

      If this volume is present, then remove the volume to make space for the snapshot. Execute the script below to check for the existence of the LVDoNotRemoveOrUse logical volume and remove it if present.

      lvm lvdisplay --ignorelockingfailure /dev/VGExaDb/LVDoNotRemoveOrUse
      if [ $? -eq 0 ]; then
        # LVDoNotRemoveOrUse logical volume exists.
        lvm lvremove -f /dev/VGExaDb/LVDoNotRemoveOrUse
        if [ $? -ne 0 ]; then
             echo "Unable to remove logical volume: LVDoNotRemoveOrUse. Unable to proceed with backup"
        fi
      fi

      If the LVDoNotRemoveOrUse logical volume does not exist, then investigate the reason and do not proceed with the steps below.

    2. Create a snapshot named LVDbSys3_snap for the file system hosting the / (root) directory.

      This example assumes LVDbSys3 is the active partition.

      # lvcreate -L1G -s -n LVDbSys3_snap /dev/VGExaDb/LVDbSys3
      
    3. Label the snapshot.
      # e2label /dev/VGExaDb/LVDbSys3_snap DBSYSOVS_SNAP
      
    4. Mount the snapshot.
      # mkdir /root/mnt
      
      # mount /dev/VGExaDb/LVDbSys3_snap /root/mnt -t ext4
      
    5. Change to the directory for the backup.
      # cd /root/mnt
      
    6. Create the backup file.
      # tar -pjcvf /remote_FS/mybackup.tar.bz2 * /boot > /tmp/backup_tar.stdout 2> /tmp/backup_tar.stderr
      
    7. Check the /tmp/backup_tar.stderr file for any significant errors.

      Errors about failing to tar open sockets, and other similar errors, can be ignored.

  3. Unmount the snapshot and remove the snapshot for the root directory.
    # cd /
    # umount /root/mnt
    # /bin/rmdir /root/mnt
    # lvremove /dev/VGExaDb/LVDbSys3_snap
  4. Unmount the NFS share.
    # umount /remote_FS
  5. Recreate the /dev/VGExaDb/LVDoNotRemoveOrUse logical volume.
    # lvm lvcreate -n LVDoNotRemoveOrUse -L1G VGExaDb

2.11.2 Backing up the User Domains

You can create a backup of all the user domains on a host, or of individual user domains.

There are three ways to back up the user domains:

  • Method 1: Back up all user domains in the storage repository using Oracle Cluster File System (OCFS) reflinks to get a consistent backup

    This method backs up the storage repository that is the /EXAVMIMAGES OCFS2 file system. This method provides a more robust and a comprehensive backup than method 2 or 3. Method 3 provides a quicker and an easier backup method, especially in role separated environments.

    Method 1 is best-suited for when a management domain (dom0) administrator is responsible for user domain backups.

  • Method 2: Back up individual user domains in the storage repository using Oracle Cluster File System (OCFS) reflinks to get a consistent backup.

    You select which user domains you want to back up from the /EXAVMIMAGES OCFS2 file system. The user domains are located in the /EXAVMIMAGES/GuestImages/user directories.

    Method 2 is best-suited for when a management domain (dom0) administrator is responsible for user domain backups.

  • Method 3: Back up a user domain using snapshot-based backup

    This method backs up a single user domain using snapshot-based backup from inside the user domain.

    Method 3 is ideal where a user domain administrator is responsible for the user domain backups.

2.11.2.1 Method 1: Back up All the User Domains

You can back up all the user domains by backing up the storage repository that is the /EXAVMIMAGES OCFS2 file system.

The backup destination should reside outside of the local machine, such as a writable NFS location, and be large enough to hold the backup. The space needed for the backup is proportional to the number of Oracle VMs deployed on the system, up to a maximum space of about 1.6 TB.

This procedure assumes there are 15 or less user domains per management domain.

  1. Use the following script to prepare the backup destination and prepare the user domains for backup.
    ScriptStarttime=$(date +%s)
    printf "This script is going to remove the directory /EXAVMIMAGES/Backup.
    If that is not acceptable, exit the script by typing n, manually 
    remove /EXAVMIMAGES/Backup and come back to rerun the script. Otherwise, 
    press y to continue  :"
    read proceed 
    
    if [[ ${proceed} == "n" ]] || [[  ${proceed} == "N" ]]
    then
      exit 0
    fi 
    
    rm -rf /EXAVMIMAGES/Backup 
    
    ##  Create the Backup Directory 
    
    mkdirStartTime=$(date +%s)
    find /EXAVMIMAGES -type d|grep -v 'lost+found'|
    awk '{print "mkdir -p /EXAVMIMAGES/Backup"$1}'|sh
    mkdirEndTime=$(date +%s)
    mkdirTime=$(expr ${mkdirEndTime} - ${mkdirStartTime})
    echo "Backup Directory creation time :" ${mkdirTime}" seconds" 
    
    ##  Create reflinks for files not in /EXAVMIMAGES/GuestImages
    relinkothesStartTime=$(date +%s)
    find /EXAVMIMAGES/ -not -path "/EXAVMIMAGES/GuestImages/*" 
    -not -path "/EXAVMIMAGES/Backup/*" -type f|awk '{print
    "reflink",$0,"/EXAVMIMAGES/Backup"$0}'|sh
    relinkothesEndTime=$(date +%s)
    reflinkothesTime=$(expr ${relinkothesEndTime} - ${relinkothesStartTime})
    echo "Reflink creation time for files other than in /EXAVMIMAGES/GuestIm
    ages :" ${reflinkothesTime}" seconds" 
    
    ##  Pause the user domains
    for hostName in $(xm list|egrep -v '^Domain-0|^Name'|awk '{print $1}')
    do
    PauseStartTime=$(date +%s)
    xm pause ${hostName}
    PauseEndTime=$(date +%s)
    PauseTime=$(expr ${PauseEndTime} - ${PauseStartTime})
    echo "PauseTime for guest - ${hostName} :" ${PauseTime}" seconds" 
    
    ## Create reflinks for all the files in /EXAVMIMAGES/GuestImages
    relinkStartTime=$(date +%s)
    find /EXAVMIMAGES/GuestImages/${hostName} -type f|awk '{print "reflink",
    $0,"/EXAVMIMAGES/Backup"$0}'|sh
    relinkEndTime=$(date +%s)
    reflinkTime=$(expr ${relinkEndTime} - ${relinkStartTime})
    echo "Reflink creation time for guest - ${hostName} :" ${reflinkTime}" seconds" 
    
    ## Unpause the user domains
    unPauseStartTime=$(date +%s)
    xm unpause ${hostName}
    unPauseEndTime=$(date +%s)
    unPauseTime=$(expr ${unPauseEndTime} - ${unPauseStartTime})
    echo "unPauseTime for guest - ${hostName} :" ${unPauseTime}" seconds"
    done 
    
    ScriptEndtime=$(date +%s) 
    ScriptRunTime=$(expr ${ScriptEndtime} - ${ScriptStarttime}) 
    echo ScriptRunTime ${ScriptRunTime}" seconds”
  2. Create a backup of the snapshot.

    Backup the reflink files in the /EXAVMIMAGES/Backup directory that was created with the script in Step 1 to a remote location. For example:

    1. Create a tarball file comprising of all files under /EXAVMIMAGES/Backup.
    2. Copy the tarball to a remote location.

    This allows for restore operations if the management domain (Dom0) is permanently lost or damaged.

  3. Remove the reflinks created by the script.
2.11.2.2 Method 2: Back up Individual User Domains

You can back up an individual user domain by backing up its specific folder in /EXAVMIMAGES file system.

The backup destination should reside outside of the local machine, such as a writable NFS location, and be large enough to hold the backup. The space needed for the backup is proportional to the number of Oracle VMs deployed on the system, up to a maximum space of about 1.6 TB.

  1. Use the following script to prepare the backup destination and prepare the user domain for backup.
    ScriptStarttime=$(date +%s)
    printf "This script is going to remove the directory /EXAVMIMAGES/Backup.
    If that is not acceptable, exit the script by typing n, manually 
    remove /EXAVMIMAGES/Backup and come back to rerun the script. Otherwise, 
    press y to continue  :"
    read proceed 
    
    if [[ ${proceed} == "n" ]] || [[  ${proceed} == "N" ]]
    then
      exit 0
    fi 
    
    rm -rf /EXAVMIMAGES/Backup 
    
    printf "Enter the name of the user domains to be backed up :"
    read userDomainName
    
    ##  Create the Backup Directory 
    
    mkdirStartTime=$(date +%s)
    find /EXAVMIMAGES/GuestImages/${userDomainName} -type d|grep -v 
    'lost+found'|awk '{print "mkdir -p /EXAVMIMAGES/Backup"$1}'|sh
    mkdirEndTime=$(date +%s)
    mkdirTime=$(expr ${mkdirEndTime} - ${mkdirStartTime})
    echo "Backup Directory creation time :" ${mkdirTime}" seconds" 
    
    ##  Pause the user domain
    PauseStartTime=$(date +%s)
    xm pause ${userDomainName}
    PauseEndTime=$(date +%s)
    PauseTime=$(expr ${PauseEndTime} - ${PauseStartTime})
    echo "PauseTime for guest - ${userDomainName} :" ${PauseTime}" seconds" 
    
    ## Create reflinks for all the files in /EXAVMIMAGES/GuestImages/${userDomainName}
    relinkStartTime=$(date +%s)
    find /EXAVMIMAGES/GuestImages/${userDomainName} -type f|awk '{print "reflink",
    $0,"/EXAVMIMAGES/Backup"$0}'|sh
    relinkEndTime=$(date +%s)
    reflinkTime=$(expr ${relinkEndTime} - ${relinkStartTime})
    echo "Reflink creation time for guest - ${userDomainName} :" ${reflinkTime}" seconds" 
    
    ## Unpause the user domain
    unPauseStartTime=$(date +%s)
    xm unpause ${userDomainName}
    unPauseEndTime=$(date +%s)
    unPauseTime=$(expr ${unPauseEndTime} - ${unPauseStartTime})
    echo "unPauseTime for guest - ${userDomainName} :" ${unPauseTime}" seconds"
    done 
    
    ScriptEndtime=$(date +%s) 
    ScriptRunTime=$(expr ${ScriptEndtime} - ${ScriptStarttime}) 
    echo ScriptRunTime ${ScriptRunTime}" seconds”
  2. Create a backup of the snapshot.

    Backup the reflink files in the /EXAVMIMAGES/Backup directory that was created with the script in Step 1 to a remote location. For example:

    1. Create a tarball file comprising of all files under /EXAVMIMAGES/Backup.
    2. Copy the tarball to a remote location.

    This allows for restore operations if the management domain (Dom0) is permanently lost or damaged.

  3. Remove the reflinks created by the script.
2.11.2.3 Method 3: Back up a User Domain from Inside the User Domain

You can take a snapshot-based backup of a user domain from inside the user domain, which can then be used to restore the user domain to a workable state.

All steps are performed from inside the user domain.

Note:

This method of backing up a user domain from inside the user domain using LVM snapshots will have limited usage in terms of recovery. Such a backup can only be used for recovery purposes when the user domain is still bootable and allows login as the root user. This means the damage is such that some files have been lost or damaged but can be restored from the tar backup after the user domain has booted up and the / (root) partition and the boot partitions are mounted. If that is not the case and the damage is such that the user domain does not boot, then you need a backup taken using methods 1 or 2 above to recover the user domain, and you need to perform the recovery procedure at the user domain level using the recovery procedure described below.

This procedure backs up the following:

  • LVDbSys1
  • LVDbOra1
  • /boot partition
  • Grid Infrastructure home
  • RDBMS home

All steps must be performed as the root user.

  1. Prepare a destination to hold the backup.

    In the following example, ip_address is the IP address of the NFS server, and nfs_location is the NFS location holding the backups.

    # mkdir -p /remote_FS
    
    # mount -t nfs -o rw,intr,soft,proto=tcp,nolock ip_address:/nfs_location/ /remote_FS
  2. Take a snapshot-based backup of the file systems containing / (root) and the /u01 directories, as follows:
    1. Create a snapshot named LVDbSys1_snap for the file system containing the root directory.
      The volume group must have at least 1 GB of free space for the command to succeed.
      # lvcreate -L1G -s -n LVDbSys1_snap /dev/VGExaDb/LVDbSys1
    2. Label the snapshot.
      # e2label /dev/VGExaDb/LVDbSys1_snap DBSYS_SNAP
    3. Mount the snapshot.
      # mkdir /root/mnt
      
      # mount /dev/VGExaDb/LVDbSys1_snap /root/mnt -t ext4
    4. Create a snapshot named u01_snap for the /u01 directory.
      # lvcreate -L256M -s -n u01_snap /dev/VGExaDb/LVDbOra1
    5. Label the snapshot.
      # e2label /dev/VGExaDb/u01_snap DBORA_SNAP
    6. Mount the snapshot.
      # mkdir -p /root/mnt/u01
      
      # mount /dev/VGExaDb/u01_snap /root/mnt/u01 -t ext4
    7. Change to the directory for the backup.
      # cd /root/mnt
    8. Create the backup file to back up the two snapshots taken above, the /boot partition, the Oracle Database home directory, and the Oracle Grid Infrastructure home directory.

      In the following example: Grid_home is the location of the Oracle Grid Infrastructure home, for example, /u01/app/18.1.0/grid; DB_home is the location of the Oracle Database home, for example, /u01/app/oracle/product/18.1.0/dbhome_1.

      # tar -pjcvf /remote_FS/mybackup.tar.bz2 * /boot Grid_home 
      DB_home > /tmp/backup_tar.stdout 2> /tmp/backup_tar.stderr
      
    9. Check the /tmp/backup_tar.stderr file for any significant errors.

      Errors about failing to tar open sockets, and other similar errors, can be ignored.

  3. Unmount and remove the snapshots for the file system containing the root directories.
    # cd /
    # umount /root/mnt/u01
    # umount /root/mnt
    # /bin/rmdir /root/mnt
    # lvremove /dev/VGExaDb/u01_snap
    # lvremove /dev/VGExaDb/LVDbSys1_snap
  4. Unmount the NFS share.
    # umount /remote_FS

2.12 Recovering Oracle Linux Database Servers Using a Snapshot-Based Backup

This section describes how to recover a database server file systems running Oracle Linux using a snapshot-based backup after severe disaster conditions happen for the database server, or when the server hardware is replaced to such an extent that it amounts to new hardware. For example, replacing all hard disks leaves no trace of original software on the system. This is similar to replacing the complete system as far as the software is concerned. In addition, it provides a method for disaster recovery of the database servers using an LVM snapshot-based backup taken when the database server was healthy before the disaster condition.

The recovery procedures use the diagnostics.iso image as a virtual CD-ROM to restart the database server in rescue mode using the ILOM.

Note:

Restoring files from tape may require additional drives to be loaded, and is not covered in this chapter. Oracle recommends backing up files to an NFS location, and using existing tape options to back up and recover from the NFS host.

The general workflow includes the following tasks:

  1. Recreate the following:

    • Boot partitions
    • Physical volumes
    • Volume groups
    • Logical volumes
    • File system
    • Swap partition
  2. Activate the swap partition.
  3. Ensure the /boot partition is the active boot partition.
  4. Restore the data.
  5. Reconfigure GRUB.
  6. Restart the server.

If you use quorum disks, then after recovering the database servers from backup, you must manually reconfigure the quorum disk for the recovered server. See Reconfigure Quorum Disk After Restoring a Database Server for more information.

The recovery procedures described in this section do not include backup or recovery of Oracle Exadata Storage Servers or database data. Oracle recommends testing the backup and recovery procedures on a regular basis. This section contains the following topics:

2.12.1 Recovering Oracle Linux Database Server with Uncustomized Partitions

You can recover the Oracle Linux database server from a snapshot-based backup when using uncustomized partitions.

This procedure is applicable when the layout of the partitions, logical volumes, file systems, and their sizes are equal to the layout when the database server was initially deployed.

Caution:

All existing data on the disks is lost during the procedure.
  1. Prepare NFS server to host the backup archive mybackup.tar.bz2.

    The NFS server must be accessible by PI address. For example, on an NFS server with the IP address nfs_ip, where the directory /export is exported from NFS mounts, put the mybackup.tar.bz2 file in the /export directory.

  2. Attach the /opt/oracle.SupportTools/diagnostics.iso file from any healthy database server as virtual media to the ILOM of the database server to be restored.

    The following is an example of how to set up a virtual CD-ROM using the ILOM interface:

    1. If you installed Oracle Exadata System Software release 19.1 or later on the database server, then download the diagnostics.iso file from My Oracle Support using the relevant patch for your Oracle Exadata System Software release, for example, patch 27162010 for release 19.1.0.0.0.
    2. If you installed Oracle Exadata System Software release 18.1.x or earlier on the database server, then copy the /opt/oracle.SupportTools/diagnostics.iso file to a directory on the machine using the ILOM interface.
    3. Log in to the ILOM web interface.
    4. Select Remote Console from the Remote Control tab. This will start the console.
    5. Select the Devices menu.
    6. Select the CD-ROM image option.
    7. Navigate to the location of the diagnostics.iso file.
    8. Open the diagnostics.iso file.
    9. Select Host Control from the Remote Control tab.
    10. Select CDROM as the next boot device from the list of values.
    11. Click Save.
  3. Restart the system from the ISO file.

    Choose the CD-ROM as boot device during start up. You can also preset the boot device using the ipmitool command from any other machine that can reach the ILOM of the database server to be restored, instead of selecting the boot device manually during boot up using the following commands:

    ipmitool -H ILOM_ip_address_or_hostname \
    -U root_user chassis bootdev cdrom
    
    ipmitool -H ILOM_ip_address_or_hostname \
    -U root_user chassis power cycle
    
  4. Answer as follows when prompted by the system. The responses are shown in bold.

    Note that for Oracle Exadata System Software release 12.1.2.2.0 or later, DHCP is used and you do not have to manually set up the network.

    • If you are using Oracle Exadata System Software release 18.1 or later, running on Oracle Exadata Database Machine X7 or later, then the prompt looks like the following:


      Description of boot_screen_18.1.jpg follows
      Description of the illustration boot_screen_18.1.jpg
    • If you are using Oracle Exadata System Software release 18.1 or later and restoring through one of the 10GbE Ethernet SFP+ ports on Oracle Exadata Database Machine X3-2, X4-2, X5-2 or X6-2, then the prompt looks like the following:

      ------------------------------------------------------------------------------ 
               Choose from the following by typing letter in '()': 
                 (e)nter interactive diagnostics shell. 
                   Use diagnostics shell password to login as root user 
                   (reboot or power cycle to exit the shell), 
                 (r)estore system from NFS backup archive, 
       Select: r 
       Continue (y/n) [n]: y 
       Rescue password: 
       [INFO     ] Enter path to the backup file on the NFS server in format: 
               Enter path to the backup file on the NFS server in format: 
               <ip_address_of_the_NFS_share>:/<path>/<archive_file> 
               For example, 10.124.1.15:/export/nfs/share/backup.2010.04.08.tar.bz2 
       NFS line: <nfs_ip>:/export/mybackup.tar.bz2 
       [INFO     ] The backup file could be created either from LVM or non-LVM 
      based COMPUTE node 
       [INFO     ] Versions below 11.2.1.3.0 do not support LVM based partitioning 
       Use LVM based scheme. (y/n) [y]: y 
       Configure network settings on host via DHCP. (y/n) [y]: n 
       Configure bonded network interface. (y/n) [y]: y 
       IP Address of bondeth0 on this host: <IP address of the DB host> 
       
      Netmask of bondeth0 on this host: <netmask for the above IP address>
       Bonding mode:active-backup or 802.3ad [802.3ad]: active-backup 
       Slave interface1 for bondeth0 (ethX) [eth4]: eth4 
       Slave interface2 for bondeth0 (ethX) [eth5]: eth5 
      ...
       [  354.619610] bondeth0: first active interface up!
       [  354.661427] ixgbe 0000:13:00.1 eth5: NIC Link is Up 10 Gbps, Flow Control: RX/TX
       [  354.724414] bondeth0: link status definitely up for interface eth5, 10000 Mbps full duplex
       Default gateway: <Gateway for the above IP address>
      ------------------------------------------------------------------------------ 
    • If you are using Oracle Exadata System Software release 12.1.x or 12.2.x, then the prompts look like the following:

      ------------------------------------------------------------------------------ 
       Use diagnostics shell password to login as root user
                  (reboot or power cycle to exit the shell),
                (r)estore system from NFS backup archive.
      Select: r
      Continue (y/n) [n]: y
      Rescue password:
      [INFO: ] Enter path to the backup file on the NFS server in format:
             Enter path to the backup file on the NFS server in format:
             <ip_address_of_the_NFS_share>:/<path>/<archive_file>
             For example, 10.124.1.15:/export/nfs/share/backup.2010.04.08.tar.bz2
      NFS line: <nfs_ip>:/export/mybackup.tar.bz2
      [INFO: ] The backup file could be created either from LVM or non-LVM based COMPUTE node
      [INFO: ] Versions below 11.2.1.3.0 do not support LVM based partitioning
      Use LVM based scheme. (y/n) [y]: y
      ------------------------------------------------------------------------------ 
    • If you are using Oracle Exadata System Software release earlier than 12.1.2.2.0, then the prompts look like the following

      ------------------------------------------------------------------------------ 
            Choose from following by typing letter in '()':
         (e)nter interactive diagnostics shell. Must use credentials from Oracle
            support to login (reboot or power cycle to exit the shell),
         (r)estore system from NFS backup archive,
      Select:r
      Are you sure (y/n) [n]:y
       
      The backup file could be created either from LVM or non-LVM based compute node
      versions below 11.2.1.3.1 and 11.2.2.1.0 or higher do not support LVM based partitioning
      use LVM based scheme(y/n):y
       
      Enter path to the backup file on the NFS server in format:
      ip_address_of_the_NFS_share:/path/archive_file
      For example, 10.10.10.10:/export/operating_system.tar.bz2
      NFS line:<nfs_ip>:/export/mybackup.tar.bz2
      IP Address of this host:IP address of the DB host
      Netmask of this host:netmask for the above IP address
      Default gateway:Gateway for the above IP address. If there is no default gateway in your network, enter 0.0.0.0.
      ------------------------------------------------------------------------------ 
      

    When the recovery completes, the log in screen appears.

  5. Log in as the root user.
    If you do not have the password for the root user, then contact Oracle Support Services.
  6. Use the reboot command to restart the system.
    The restoration process is complete.
  7. Verify that all Oracle software can start and function by logging in to the database server.
    The /usr/local/bin/imagehistory command indicates that the database server was reconstructed.

    The following is an example of the output:

    # imagehistory
    
    Version                  : 11.2.2.1.0
    Image activation date    : 2010-10-13 13:42:58 -0700
    Imaging mode             : fresh
    Imaging status           : success
    
    Version                  : 11.2.2.1.0
    Image activation date    : 2010-10-30 19:41:18 -0700
    Imaging mode             : restore from nfs backup
    Imaging status           : success
    
  8. If the recovery was on Oracle Exadata Database Machine Eighth Rack, then perform the procedure described in Configuring Oracle Exadata Database Machine Eighth Rack Oracle Linux Database Server After Recovery.

2.12.2 Recovering Exadata Database Servers X7 or Later with Customized Partitions

This procedure describes how to recover an Oracle Exadata Database Machine X7-2 or later Oracle Linux database server from a snapshot-based backup when using customized partitions.

Note:

This task assumes you are running Oracle Exadata System Software release 18c (18.1.0) or greater.
  1. Complete step 1 through step 3 in Recovering Oracle Linux Database Server with Uncustomized Partitions.
  2. Choose to enter the diagnostics shell at step 4 in Recovering Oracle Linux Database Server with Uncustomized Partitions, and log in as the root user.

    If you do not have the password for the root user, then contact Oracle Support Services.

  3. If required, use /opt/MegaRAID/MegaCli/MegaCli64 to configure the disk controller to set up the disks.
  4. Ensure you create a primary boot partition of size at least 128 MB to be mounted at /boot.
    The boot area cannot be a LVM partition.
  5. Create the boot partition.
    umount /mnt/cell
    parted /dev/sda
    

    The interactive shell appears. The following procedure describes how to respond to the system prompts:

    1. Assign a disk label.
      (parted) mklabel gpt
    2. Set the unit size as sector.
      (parted) unit s
    3. Check the partition table by displaying the existing partitions.
      (parted) print
    4. Remove the partitions that will be re-created.
      (parted) rm <part#>
    5. Create a new first partition.
      (parted) mkpart primary 64 1048639
    6. Specify this is a bootable partition.
      (parted) set 1 boot on
  6. Create second primary (boot) and third primary (LVM) partitions.
    1. Create a second primary partition as a UEFI boot partition with fat32.
      (parted) mkpart primary fat32 1048640s 1572927s 
      (parted) set 2 boot on
    2. Create a new third partition.
      (parted) mkpart primary 1572928 –1
    3. Configure the third partition as a physical volume.
      (parted) set 3 lvm on
    4. Write the information to disk, then quit.
      (parted) quit
  7. Use the /sbin/lvm command to re-create the customized LVM partitions and mkfs to create file systems.
    1. Create the physical volume, volume group, and the logical volumes as follows:
      # lvm pvcreate /dev/sda3
      # lvm vgcreate VGExaDb /dev/sda3

      If the logical volume or volume group already exists, then remove the logical volume or volume group as follows and then re-create them.

      # lvm vgremove VGExaDb
      # lvm pvremove /dev/sda3
      # lvm pvcreate /dev/sda3
      # lvm vgcreate VGExaDb /dev/sda3
    2. Create the logical volume for the / (root) directory, a file system, and label it.
      • Create the logical volume:

        # lvm lvcreate -n LVDbSys1 -L40G VGExaDb
        
      • Create the file system.

        # mkfs.ext4 /dev/VGExaDb/LVDbSys1
        
      • Label the file system.

        # e2label /dev/VGExaDb/LVDbSys1 DBSYS
        
    3. Create the logical volume for the swap directory, and label it.
      # lvm lvcreate -n LVDbSwap1 -L24G VGExaDb
      # mkswap -L SWAP /dev/VGExaDb/LVDbSwap1
      
    4. Create the logical volume for the root/u01 directory, and label it.
      • Create the logical volume:

        # lvm lvcreate -n LVDbOra1 -L100G VGExaDb
        
      • Create the file system on LVDbOra1.

        # mkfs.ext4 /dev/VGExaDb/LVDbOra1
      • Label the file system.

        # e2label /dev/VGExaDb/LVDbOra1 DBORA
        
      • Label the /dev/sda2 file system.

        # dosfslabel /dev/sda2 ESP
        
    5. Create a file system on the /boot partition, and label it.
      • Create the file system.

        # mkfs.ext4 /dev/sda1
        
      • Label the file system:

        # e2label /dev/sda1 BOOT
        

      Note:

      For customized file system layouts, additional logical volumes can be created at this time. For customized layouts, different sizes may be used.
  8. Create mount points for all the partitions to mirror the original system, and mount the respective partitions.

    For example, assuming /mnt is used as the top level directory for this, the mounted list of partitions may look like the following:

    /dev/VGExaDb/LVDbSys1 on /mnt
    /dev/VGExaDb/LVDbOra1 on /mnt/u01
    /dev/sda1 on /mnt/boot
    

    You would create the directories and mount the partitions as follows:

    
    # mount /dev/VGExaDb/LVDbSys1 /mnt -t ext4
    # mkdir /mnt/u01 /mnt/boot
    # mount /dev/VGExaDb/LVDbOra1 /mnt/u01 -t ext4
    # mount /dev/sda1 /mnt/boot -t ext4
    

    Note:

    For customized file system layouts with additional logical volumes, additional mount points need to be created during this step.
  9. Create mount point /mnt/boot/efi and mount /dev/sda2 on /mnt/boot/efi with type vfat.
    # mkdir /mnt/boot/efi
    # mount /dev/sda2 /mnt/boot/efi -t vfat
    
  10. Bring up the network.
    ip address add ip_address_for_eth0/netmask_for_eth0 dev eth0
    ip link set up eth0
    ip route add default via gateway_address dev eth0
    
  11. Mount the NFS server where you have the backup.

    Mount the NFS server with IP address nfs_ip and the export it as /export to the backup location.

    # mkdir -p /root/mnt
    # mount -t nfs -o ro,intr,soft,proto=tcp,nolock nfs_ip:/export /root/mnt
    
  12. Restore from backup.
    # tar -pjxvf /root/mnt/mybackup.tar.bz2 -C /mnt
    
  13. Unmount the restored file systems.
    # umount /mnt/u01
    # umount /mnt/boot/efi
    # umount /mnt/boot
    # umount /mnt
    
  14. Detach the diagnostics.iso file.
  15. Check the boot devices and boot order for the ExadataLinux_1 device.
    1. Check the available boot devices.
      # efibootmgr
      BootCurrent: 000C
      Timeout: 1 seconds
      BootOrder: 000C,0001,0002,0003,0004,0005,0007,0008,0009,000A,000B
      Boot0001* NET0:PXE IP4 Intel(R) I210 Gigabit  Network Connection
      Boot0002* NET1:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller
      Boot0003* NET2:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller
      Boot0004* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter
      Boot0005* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter
      Boot0007* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot0008* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot0009* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot000A* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot000B* Oracle Linux
      Boot000C* USB:SUN
      

      If the Boot0000* ExadataLinux_1 device is not listed then create the device.

      # efibootmgr -c -d /dev/sda -p 2 -l '\EFI\REDHAT\SHIM.EFI' -L 'ExadataLinux_1'
      BootCurrent: 000C
      Timeout: 1 seconds
      BootOrder: 0000,000C,0001,0002,0003,0004,0005,0007,0008,0009,000A,000B
      Boot0001* NET0:PXE IP4 Intel(R) I210 Gigabit  Network Connection
      Boot0002* NET1:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller
      Boot0003* NET2:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller
      Boot0004* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter
      Boot0005* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter
      Boot0007* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot0008* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot0009* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot000A* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot000B* Oracle Linux
      Boot000C* USB:SUN
      Boot0000* ExadataLinux_1
    2. Configure the Boot0000* ExadataLinux_1 device to be first in the boot order.
      # efibootmgr -o 0000
      BootCurrent: 000B
      Timeout: 1 seconds
      BootOrder: 0000
      Boot0000* ExadataLinux_1
      Boot0001* NET0:PXE IP4 Intel(R) I210 Gigabit  Network Connection
      Boot0002* NET1:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller
      Boot0003* NET2:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller
      Boot0004* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter
      Boot0005* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter
      Boot0007* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot0008* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot0009* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot000A* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot000B* USB:SUN
      Boot000C* UEFI OS
  16. Restart the system and update the boot order in the BIOS.
    # reboot
    

    Modify the boot order to set the ExadataLinux_1 boot device as the first device.

    1. Press F2 when booting the system.
    2. Go to the Setup Utility.
    3. Select BOOT.
    4. Set ExadataLinux_1 for Boot Option #1.
    5. Exit the Setup Utility.

    This completes the restoration procedure for the server.

  17. If the recovery was on Oracle Exadata Database Machine Eighth Rack, then perform the procedure described in Configuring Oracle Exadata Database Machine Eighth Rack Oracle Linux Database Server After Recovery.

2.12.3 Recovering Exadata X6 or Earlier Database Servers with Customized Partitions

This procedure describes how to recover Oracle Exadata Database Servers for Oracle Exadata Database Machine X6-2 or earlier running Oracle Linux from a snapshot-based backup when using customized partitions.

The steps are the same as restoring non-customized partitions until you are prompted to choose from enter interactive diagnostics shell and restore system from NFS backup archive options after booting the database server using diagnostics.iso.

  1. Choose to enter the diagnostics shell, and log in as the root user.

    If you do not have the password for the root user, then contact Oracle Support Services.

  2. If required, use /opt/MegaRAID/MegaCli/MegaCli64 to configure the disk controller to set up the disks.
  3. Ensure you create a primary boot partition of size at least 128 MB to be mounted at /boot.
    The boot area cannot be a LVM partition.
  4. Create the boot partition.
    umount /mnt/cell
    parted /dev/sda
    

    The interactive shell appears. The following procedure describes how to respond to the system prompts:

    1. Assign a disk label.
      • If you are running Oracle Exadata System Software release 11.2.3.3.0 or later:

        (parted) mklabel gpt
      • If you are running a release earlier than Oracle Exadata System Software release 11.2.3.3.0:

        (parted) mklabel msdos
    2. Set the unit size as sector.
      (parted) unit s
    3. Check the partition table by displaying the existing partitions.
      (parted) print
    4. Remove the partitions that will be re-created.
      (parted) rm <part#>
    5. Create a new first partition.
      (parted) mkpart primary 63 1048639
    6. Specify this is a bootable partition.
      (parted) set 1 boot on
  5. Create an additional primary (LVM) partition.
    • If using Oracle Exadata System Software release 18.1.0.0.0 or later — Create second primary (bios_grub) and third primary (LVM) partitions:
      1. Enter mkpart primary 1048640 1050687 to create a new second partition.

      2. Enter set 2 bios_grub on to specify this is a GRUB BIOS partition.

      3. Enter mkpart primary 1050688 1751949278 to create a new third partition.

      4. Enter set 3 lvm on to specify this is a physical volume.

      5. Enter quit to write the information to disk, then quit.

    • If using a release earlier than Oracle Exadata System Software release 18.1.0.0.0:
      1. Enter mkpart primary 1048640 -1 to create a new second partition.

      2. Enter set 2 lvm on to specify this is a physical volume.

      3. Enter quit to write the information to disk, then quit.

  6. Use the /sbin/lvm command to re-create the customized LVM partitions and mkfs to create file systems.
    1. Create the physical volume, volume group, and the logical volumes as follows:
      lvm pvcreate /dev/sda2
      lvm vgcreate VGExaDb /dev/sda2
      
    2. Create the logical volume for the / (root) directory, a file system, and label it.
      • Create the logical volume:

        lvm lvcreate -n LVDbSys1 -L40G VGExaDb
        
      • If using Oracle Exadata System Software release 12.1.2.2.0 or later, then create the logical volume for the reserved partition.

        # lvm lvcreate -n LVDoNotRemoveOrUse –L1G VGExaDb
        

        Note:

        Do not create any file system on this logical volume.
      • Create the file system.

        • If you previously had an ext4 file system, use the mkfs.ext4 command:

          mkfs.ext4 /dev/VGExaDb/LVDbSys1
          
        • If you previously had an ext3 file system, use the mkfs.ext3 command:

          mkfs.ext3 /dev/VGExaDb/LVDbSys1
          
      • Label the file system.

        e2label /dev/VGExaDb/LVDbSys1 DBSYS
        
    3. Create the logical volume for the swap directory, and label it.
      lvm lvcreate -n LVDbSwap1 -L24G VGExaDb
      mkswap -L SWAP /dev/VGExaDb/LVDbSwap1
      
    4. Create the logical volume for the root/u01 directory, and label it.
      • Create the logical volume:

        lvm lvcreate -n LVDbOra1 -L100G VGExaDb
        
      • Create the file system.

        • If you previously had an ext4 file system, then use the mkfs.ext4 command:

          mkfs.ext4 /dev/VGExaDb/LVDbOra1
          
        • If you previously had an ext3 file system, then use the mkfs.ext3 command:

          mkfs.ext3 /dev/VGExaDb/LVDbOra1
          
      • Label the file system.

        e2label /dev/VGExaDb/LVDbOra1 DBORA
        
    5. Create a file system on the /boot partition, and label it.
      • Create the file system.

        • If you previously had an ext4 file system, use the mkfs.ext4 command:

          mkfs.ext4 /dev/sda1
          
        • If you previously had an ext3 file system, use the mkfs.ext3 command:

          mkfs.ext3 /dev/sda1
          
      • Label the file system:

        e2label /dev/sda1 BOOT
        

      Note:

      For customized file system layouts, additional logical volumes can be created at this time. For customized layouts, different sizes may be used.
  7. Create mount points for all the partitions to mirror the original system, and mount the respective partitions.

    For example, assuming /mnt is used as the top level directory for this, the mounted list of partitions may look like the following:

    /dev/VGExaDb/LVDbSys1 on /mnt
    /dev/VGExaDb/LVDbOra1 on /mnt/u01
    /dev/sda1 on /mnt/boot
    

    Note:

    For customized file system layouts with additional logical volumes, additional mount points need to be created during this step.

    The following is an example for Oracle Exadata Database Machine X6-2 and earlier systems of how to mount the root file system, and create two mount points. In the commands below, filesystem_type_of_/_directory specifies the file system of the / (root) directory: it is either ext3 or ext4.

    mount /dev/VGExaDb/LVDbSys1 /mnt -t filesystem_type_of_/_directory
    mkdir /mnt/u01 /mnt/boot
    
    mount /dev/VGExaDb/LVDbOra1 /mnt/u01 -t filesystem_type_of_/u01_directory
    mount /dev/sda1 /mnt/boot -t filesystem_type_of_/boot_directory
    
  8. Bring up the network.
    • If the operating system is Oracle Linux 6 or later:
      ip address add ip_address_for_eth0/netmask_for_eth0 dev eth0
      ip link set up eth0
      ip route add default via gateway_address dev eth0
      
    • If the operating system is Oracle Linux 5:
      ifconfig eth0 ip_address_for_eth0 netmask netmask_for_eth0 up
      
  9. Mount the NFS server where you have the backup.

    Mount the NFS server with IP address nfs_ip and the export it as /export to the backup location.

    mkdir -p /root/mnt
    mount -t nfs -o ro,intr,soft,proto=tcp,nolock nfs_ip:/export /root/mnt
    
  10. Restore from backup.
    tar -pjxvf /root/mnt/mybackup.tar.bz2 -C /mnt
    
  11. Unmount the restored file systems, and remount the /boot partition.
    umount /mnt/u01
    umount /mnt/boot
    umount /mnt
    mkdir /boot
    mount /dev/sda1 /mnt/boot -t filesystem_type_of_/boot_directory
    
  12. Set up the boot loader.

    In the following instructions, /dev/sda1 is the /boot area.

    • If using Oracle Exadata System Software release 18.1.0.0.0 or later:
      grub2-install /dev/sda
      
      Installing for i386-pc platform.
      Installation finished. No error reported.
    • If using a release earlier than Oracle Exadata System Software release 18.1.0.0.0:
      grub
      find /I_am_hd_boot     (1)
      root (hdX,0)
      setup (hdX)
      quit
      

      In the preceding commands, (1) finds the hard disk hdX that has the file I_am_hd_boot, such as (hd0,0).

  13. Detach the diagnostics.iso file.
  14. Unmount the /boot partition.
    umount /boot
    
  15. Restart the system.
    reboot
    

    This completes the restoration procedure for the server.

  16. If the recovery was on Oracle Exadata Database Machine Eighth Rack, then perform the procedure described in "Configuring Oracle Exadata Database Machine Eighth Rack Oracle Linux Database Server After Recovery".

2.12.4 Configuring Oracle Exadata Database Machine Eighth Rack Oracle Linux Database Server After Recovery

After the Oracle Linux database server in Oracle Exadata Database Machine Eighth Rack has been re-imaged, restored, or rescued, you can then reconfigure the eighth rack. Use one of the following procedures:

2.12.4.1 Configuring Eighth Rack On X3-2 or Later Machines Running Oracle Exadata Storage Server Release 12.1.2.3.0 or Later

The following procedure should be performed after Oracle Linux database server in Oracle Exadata Database Machine Eighth Rack has been re-imaged, restored, or rescued.

For X3-2 systems, use this method only if you are running Oracle Exadata System Software release 12.1.2.3.0 or later.

  1. On the recovered server, check that the resourcecontrol utility exists in the /opt/oracle.SupportTools directory. If not, copy it from another database server to the recovered server.
  2. Ensure proper permissions are set on the resourcecontrol utility.
    # chmod 740 /opt/oracle.SupportTools/resourcecontrol
    
  3. Verify the current configuration.
    # dbmcli -e LIST DBSERVER ATTRIBUTES coreCount
    

    See Table 2-3 for the number of cores allowed for each machine configuration. If the correct value is shown, then no configuration changes are necessary. If that value is not shown, then continue to step 4 of this procedure.

  4. Change the enabled core configuration.
    # dbmcli -e ALTER DBSERVER pendingCoreCount=new_core_count FORCE
    

    new_core_count for an Eighth Rack is:

    • X8-2: 24
    • X7-2: 24
    • X6-2: 22
    • X5-2: 18
    • X4-8: 60
    • X4-2: 12
  5. Restart the server.
    # reboot
    
  6. Verify the changes to the configuration.
    # dbmcli -e LIST DBSERVER ATTRIBUTES coreCount
    
2.12.4.2 Configuring Eighth Rack On X3-2 Machines Running Oracle Exadata Storage Server Release 12.1.2.2.3 or Earlier

The following procedure should be performed after Oracle Linux database server in Oracle Exadata Database Machine Eighth Rack has been re-imaged, restored, or rescued.

  1. Copy the /opt/oracle.SupportTools/resourcecontrol utility from another database server to the /opt/oracle.SupportTools/resourcecontrol directory on the recovered server.
  2. Ensure proper permissions are set on the utility.
    # chmod 740 /opt/oracle.SupportTools/resourcecontrol
    
  3. Verify the current configuration.

    The output from the command is shown in this example.

    # /opt/oracle.SupportTools/resourcecontrol -show
    
      Validated hardware and OS. Proceed.
      Number of cores active: 8
    

    For an eighth rack configuration, eight cores should be enabled. If that value is shown, then no configuration changes are necessary. If that value is not shown, then continue to step 4 of this procedure.

    Note:

    If there is an error similar to the following after running the utility, then restarting the server one or more times usually clears the error:

    Validated hardware and OS. Proceed.
    Cannot get ubisconfig export. Cannot Proceed. Exit.
  4. Change the configuration for enabled cores.
    # /opt/oracle.SupportTools/resourcecontrol -cores 8
    
  5. Restart the server.
    # reboot
    
  6. Verify the changes to the configuration.
    # /opt/oracle.SupportTools/resourcecontrol -show
    

    The following is an example of the expected output from the command for the database server:

    This is a Linux database server.
    Validated hardware and OS. Proceed.
    Number of cores active per socket: 4

2.13 Recovering in an Oracle VM Server Deployment

The following procedures describe how to recover an Oracle VM Server from a snapshot-based backup when severe disaster conditions damage the Oracle VM Server, or when the server hardware is replaced to such an extent that it amounts to new hardware. For example, replacing all hard disks leaves no trace of original software on the system. This is similar to replacing the complete system as far as the software is concerned. In addition, it provides a method for disaster recovery of the database servers using an LVM snapshot-based backup taken when the database server was healthy before the disaster condition.

The recovery procedures use the diagnostics.iso image as a virtual CD-ROM to restart the Oracle VM Server in rescue mode using the Integrated Lights Out Manager (ILOM). At a high-level, the steps look like this:

  1. Re-create the following:

    • Boot partitions

    • Physical volumes

    • Volume groups

    • Logical volumes

    • File system

    • Swap partition

  2. Activate the swap partition.

  3. Ensure the /boot partition is the active boot partition.

  4. Restore the data.

  5. Reconfigure GRUB.

  6. Restart the server.

The recovery procedures described in this section do not include backup or recovery of Oracle Exadata Storage Servers or Oracle Database data. Oracle recommends testing the backup and recovery procedures on a regular basis.

2.13.1 Scenario 1: Recovering an Oracle VM Server and Its User Domains from Backup

You can recover the Oracle VM Server and all its user domains from a backup.

The following procedures step you through the recovery process. Chose one of the following procedures, based on the version of Oracle Exadata System Software that is installed on your system.

Note:

All existing data on the disks is lost during these procedures.
2.13.1.1 Recovering an Oracle Virtual Server and Its User Domains (Releases Prior to 12.2.1.1.0)

You can recover an Oracle Virtual Server (OVS) from a snapshot-based backup when severe disaster conditions damage the OVS, or when the server hardware is replaced to such an extent that it amounts to new hardware.

  1. Prepare an NFS server to host the backup archive mybackup.tar.bz2.

    The NFS server must be accessible by IP address. For example, on an NFS server with the IP address nfs_ip, where the directory /export is exported from NFS mounts, put the mybackup.tar.bz2 file in the /export directory

  2. Attach the /opt/oracle.SupportTools/diagnostics.iso file from any healthy database server as virtual media to the ILOM of the Oracle Virtual Server to be restored.
    The following example shows how to set up a virtual CD-ROM using the ILOM interface:
    1. Copy the diagnostics.iso file to a directory on the machine that will be using the ILOM interface.
    2. Log in to the ILOM web interface.
    3. In the Oracle ILOM web interface, click Remote Control , and then click Redirection.
    4. Select Use Video Redirection.
    5. After the console launches, click Storage in the KVMS menu.
    6. To add a storage image, such as a DVD image, to the Storage Devices dialog box, click Add.
    7. Open the diagnostics.iso file.
    8. To redirect storage media from the Storage Device dialog box, select the storage media and click Connect.

      After a connection to the device has been established, the label on the Connect button in the Storage Device dialog box changes to Disconnect.

    9. Select Host Control from the Host Management tab.
    10. Select CDROM as the next boot device from the list of values.
    11. Click Save.

      When the system is booted, the diagnostics.iso image is used.

  3. Restart the system from the ISO image file.

    You can restart the system using one of the following methods:

    • Choose the CD-ROM as the boot device during startup

    • Preset the boot device by running the ipmitool command from any other machine that can reach the ILOM of the Oracle Virtual Server to be restored:

      # ipmitool -H ILOM_ip_address_or_hostname -U root chassis bootdev cdrom
      # ipmitool -H ILOM_ip_address_or_hostname -U root chassis power cycle
  4. Login to the diagnostics shell as the root user.

    When the system displays the following:

    Choose from following by typing letter in '()':
    (e)nter interactive diagnostics shell. Must use credentials from Oracle support to login (reboot or power cycle to exit the shell),
    (r)estore system from NFS backup archive,

    Type e to enter the diagnostics shell, and log in as the root user.

    Note:

    If you do not have the password for the root user, then contact Oracle Support Services.
  5. If required, use /opt/MegaRAID/MegaCli/MegaCli64 to configure the disk controller to set up the disks.
  6. Remove the logical volumes, the volume group, and the physical volume, in case they still exist after the disaster.
    # lvm vgremove VGExaDb --force
    # lvm pvremove /dev/sda2 --force
  7. Remove the existing partitions and clean up the drive.
    # parted
    GNU Parted 2.1
    Using /dev/sda
    Welcome to GNU Parted! Type 'help' to view a list of commands.
    (parted) rm 1 
    sda: sda2 sda3
    (parted) rm 2 
    sda: sda3
    (parted) rm 3 
    sda:
    (parted) q
    
    # dd if=/dev/zero of=/dev/sda bs=64M count=2
  8. Create the three partitions on /dev/sda.
    1. Get the end sector for the disk /dev/sda from a surviving dom0 and store it in a variable:
      # end_sector=$(parted -s /dev/sda unit s print|perl -ne '/^Disk\s+\S+:\s+(\d+)s/ and print $1')
    2. Create the boot partition, /dev/sda1.
      # parted -s /dev/sda mklabel gpt mkpart primary 64s 1048639s set 1 boot on
    3. Create the partition that will hold the LVMs, /dev/sda2.
      # parted -s /dev/sda mkpart primary 1048640s 240132159s set 2 lvm on
    4. Create the OCFS2 storage repository partition, /dev/sda3.
      # parted -s /dev/sda mkpart primary 240132160s ${end_sector}s set 3
  9. Use the /sbin/lvm command to re-create the logical volumes and mkfs to create file systems.
    1. Create the physical volume and the volume group.
      # lvm pvcreate /dev/sda2
      # lvm vgcreate VGExaDb /dev/sda2
      
    2. Create the logical volume for the file system that will contain the / (root) directory and label it.
      # lvm lvcreate -n LVDbSys3 -L30G VGExaDb
      # mkfs.ext4 /dev/VGExaDb/LVDbSys3
      # e2label /dev/VGExaDb/LVDbSys3 DBSYSOVS
      
    3. Create the logical volume for the swap directory, and label it.
      # lvm lvcreate -n LVDbSwap1 -L24G VGExaDb
      # mkswap -L SWAP /dev/VGExaDb/LVDbSwap1
      
    4. Create the logical volume for the backup partition, and build a file system on top of it.
      # lvm lvcreate -n LVDbSys2 -L30G VGExaDb
      # mkfs.ext4 /dev/VGExaDb/LVDbSys2
      
    5. Create the logical volume for the reserved partition.
      # lvm lvcreate -n LVDoNotRemoveOrUse –L1G VGExaDb

      Note:

      Do not create any file system on this logical volume.
    6. Create a file system on the /dev/sda1 partition, and label it.

      In the mkfs.ext3 command below, the -I 128 option is needed to set the inode size to 128.

      # mkfs.ext3 -I 128 /dev/sda1
      # tune2fs -c 0 -i 0 /dev/sda1
      # e2label /dev/sda1 BOOT
      
  10. Create mount points for all the partitions, and mount the respective partitions.

    For example, if /mnt is used as the top level directory, the mounted list of partitions may look like:

    • /dev/VGExaDb/LVDbSys3 on /mnt

    • /dev/sda1 on /mnt/boot

    The following example mounts the root file system, and creates two mount points:

    # mount /dev/VGExaDb/LVDbSys3 /mnt -t ext4
    # mkdir /mnt/boot
    # mount /dev/sda1 /mnt/boot -t ext3
    
  11. Bring up the network on eth0 and assign the host's IP address and netmask to it.
    # ifconfig eth0 ip_address_for_eth0 netmask netmask_for_eth0 up
    # route add -net 0.0.0.0 netmask 0.0.0.0 gw gateway_ip_address
    
  12. Mount the NFS server holding the backups.
    # mkdir -p /root/mnt
    # mount -t nfs -o ro,intr,soft,proto=tcp,nolock nfs_ip:/location_of_backup /root/mnt
    
  13. From the backup which was created in "Backing up the Management Domain dom0 Using Snapshot-Based Backup", restore the root directory (/) and the boot file system.
    # tar -pjxvf /root/mnt/backup-of-root-and-boot.tar -C /mnt
  14. Unmount the restored /dev/sda1 partition, and remount it on /boot.
    # umount /mnt/boot
    # mkdir /boot
    # mount /dev/sda1 /boot -t ext3
    
  15. Set up the grub boot loader using the command below:
    # grub --device-map=/boot/grub/device.map << DOM0_GRUB_INSTALL
    root (hd0,0)
    setup (hd0)
    quit
    DOM0_GRUB_INSTALL
    
  16. Unmount the /boot partition.
    # umount /boot
  17. Detach the diagnostics.iso file.

    This can be done by clicking Disconnect on the ILOM web interface console, where you clicked Connect in step 2.h to attach the DVD ISO image.

  18. Check the restored /etc/fstab file and remove any reference to /EXAVMIMAGES and /dev/sda3.
    # cd /mnt/etc

    Comment out any line that references /EXAVMIMAGES or /dev/sda3.

  19. Restart the system.
    # reboot

    This completes the restoration procedure for the Oracle VS/dom0.

  20. Convert to eighth rack, if required.

    If the recovery is on an Oracle Exadata Database Machine eighth rack, then perform the procedure described in "Configuring Oracle Exadata Database Machine Eighth Rack Oracle Linux Database Server After Recovery"".

  21. When the server comes back up, build an OCFS2 file system on the /dev/sda3 partition.
    # mkfs -t ocfs2 -L ocfs2 -T vmstore --fs-features=local /dev/sda3 --force
  22. Mount the OCFS2 partition /dev/sda3 on /EXAVMIMAGES.
    # mount -t ocfs2 /dev/sda3 /EXAVMIMAGES
  23. In /etc/fstab, uncomment the commented out references to /EXAVMIMAGES and /dev/sda3 that was performed in step 18.
  24. Mount the backup NFS server that holds the storage repository (/EXAVMIMAGES) backup to restore the /EXAVMIMAGES file system which holds all the user domain images.
    # mkdir -p /root/mnt
    # mount -t nfs -o ro,intr,soft,proto=tcp,nolock nfs_ip:/location_of_backup /root/mnt
    
  25. Restore the /EXAVMIMAGES file system.
    # tar -Spxvf /root/mnt/backup-of-exavmimages.tar -C /EXAVMIMAGES

    To restore a single user domain from the backup, use the following command instead:

    # tar -Spxvf /root/mnt/backup-of-exavmimages.tar -C /EXAVMIMAGES EXAVMIMAGES/<user-domain-name-to-be-restored>
  26. Bring up each user domain.
    # xm create /EXAVMIMAGES/GuestImages/user_domain_hostname/vm.cfg

At this point all the user domains should come up along with Oracle Grid Infrastructure and the database instances, and the database instances should join the Oracle RAC cluster formed by the other surviving Oracle Virtual Server nodes.

2.13.1.2 Recovering an Oracle Virtual Server and Its User Domains (Releases 12.2.1.1.0 and Later)

You can recover an Oracle Virtual Server (OVS) from a snapshot-based backup when severe disaster conditions damage the OVS, or when the server hardware is replaced to such an extent that it amounts to new hardware.

  1. Prepare an NFS server to host the backup archive mybackup.tar.bz2.

    The NFS server must be accessible by IP address. For example, on an NFS server with the IP address nfs_ip, where the directory /export is exported from NFS mounts, put the mybackup.tar.bz2 file in the /export directory

  2. Attach the /opt/oracle.SupportTools/diagnostics.iso file from any healthy database server as virtual media to the ILOM of the Oracle Virtual Server to be restored.
    The following example shows how to set up a virtual CD-ROM using the ILOM interface:
    1. Copy the diagnostics.iso file to a directory on the machine that will be using the ILOM interface.
    2. Log in to the ILOM web interface.
    3. In the Oracle ILOM web interface, click Remote Control , and then click Redirection.
    4. Select Use Video Redirection.
    5. After the console launches, click Storage in the KVMS menu.
    6. To add a storage image, such as a DVD image, to the Storage Devices dialog box, click Add.
    7. Open the diagnostics.iso file.
    8. To redirect storage media from the Storage Device dialog box, select the storage media and click Connect.

      After a connection to the device has been established, the label on the Connect button in the Storage Device dialog box changes to Disconnect.

    9. Select Host Control from the Host Management tab.
    10. Select CDROM as the next boot device from the list of values.
    11. Click Save.

      When the system is booted, the diagnostics.iso image is used.

  3. Restart the system from the ISO image file.

    You can restart the system using one of the following methods:

    • Choose the CD-ROM as the boot device during startup

    • Preset the boot device by running the ipmitool command from any other machine that can reach the ILOM of the Oracle Virtual Server to be restored:

      # ipmitool -H ILOM_ip_address_or_hostname -U root chassis bootdev cdrom
      # ipmitool -H ILOM_ip_address_or_hostname -U root chassis power cycle
  4. Login to the diagnostics shell as the root user.

    When the system displays the following:

    Choose from following by typing letter in '()':
    (e)nter interactive diagnostics shell. Must use credentials from Oracle support to login (reboot or power cycle to exit the shell),
    (r)estore system from NFS backup archive,

    Type e to enter the diagnostics shell, and log in as the root user.

    Note:

    If you do not have the password for the root user, then contact Oracle Support Services.
  5. If required, use /opt/MegaRAID/MegaCli/MegaCli64 to configure the disk controller to set up the disks.
  6. Remove the logical volumes, the volume group, and the physical volume, in case they still exist after the disaster.
    # lvm vgremove VGExaDb --force
    # lvm pvremove /dev/sda2 --force
  7. Remove the existing partitions and clean up the drive.
    # parted
    GNU Parted 2.1
    Using /dev/sda
    Welcome to GNU Parted! Type 'help' to view a list of commands.
    (parted) rm 1 
    [12064.253824] sda: sda2
    (parted) rm 2 
    [12070.579094] sda: 
    (parted) q
    
    # dd if=/dev/zero of=/dev/sda bs=64M count=2
  8. Create the two partitions on /dev/sda.
    1. Get the end sector for the disk /dev/sda from a surviving dom0 and store it in a variable:
      # end_sector_logical=$(parted -s /dev/sda unit s print|perl -ne '/^Disk\s+\S+:\s+(\d+)s/ and print $1')
      # end_sector=$( expr $end_sector_logical - 34 )
      

      The values for the start and end sectors in the commands below were taken from a surviving dom0. Because these values can change over time, it is recommended that these values are checked from a surviving dom0 using the following command:

      # parted -s /dev/sda unit S print
    2. Create the boot partition, /dev/sda1.
      # parted -s /dev/sda mklabel gpt mkpart primary 64s 1048639s set 1 boot on
    3. Create the partition that will hold the LVMs, /dev/sda2.
      # parted -s /dev/sda mkpart primary 1048640s 3509759966s set 2 lvm on
  9. Use the /sbin/lvm command to re-create the logical volumes and mkfs to create file systems.
    1. Create the physical volume and the volume group.
      # lvm pvcreate /dev/sda2
      # lvm vgcreate VGExaDb /dev/sda2
      
    2. Create the logical volume for the file system that will contain the / (root) directory and label it.
      # lvm lvcreate -n LVDbSys3 -L30G VGExaDb
      # mkfs -t ext4 –b 4096 /dev/VGExaDb/LVDbSys3
      # e2label /dev/VGExaDb/LVDbSys3 DBSYSOVS
      
    3. Create the logical volume for the swap directory, and label it.
      # lvm lvcreate -n LVDbSwap1 -L24G VGExaDb
      # mkswap -L SWAP /dev/VGExaDb/LVDbSwap1
      
    4. Create the logical volume for the backup partition, and build a file system on top of it.
      # lvm lvcreate -n LVDbSys2 -L30G VGExaDb
      # mkfs -t ext4 –b 4096 /dev/VGExaDb/LVDbSys2
    5. Create the logical volume for the reserved partition.
      # lvm lvcreate -n LVDoNotRemoveOrUse –L1G VGExaDb

      Note:

      Do not create any file system on this logical volume.
    6. Create the logical volume for the guest storage repository.
      # lvm lvcreate -l 100%FREE -n LVDbExaVMImages VGExaDb
      
    7. Create a file system on the /dev/sda1 partition, and label it.

      In the mkfs.ext3 command below, the -I 128 option is needed to set the inode size to 128.

      # mkfs.ext3 -I 128 /dev/sda1
      # tune2fs -c 0 -i 0 /dev/sda1
      # e2label /dev/sda1 BOOT
      
  10. Create mount points for all the partitions, and mount the respective partitions.

    For example, if /mnt is used as the top-level directory, the mounted list of partitions might look like:

    • /dev/VGExaDb/LVDbSys3 on /mnt

    • /dev/sda1 on /mnt/boot

    The following example mounts the root file system, and creates two mount points:

    # mount /dev/VGExaDb/LVDbSys3 /mnt -t ext4
    # mkdir /mnt/boot
    # mount /dev/sda1 /mnt/boot -t ext3
    
  11. Bring up the network on eth0 and assign the host's IP address and netmask to it.
    # ifconfig eth0 ip_address_for_eth0 netmask netmask_for_eth0 up
    # route add -net 0.0.0.0 netmask 0.0.0.0 gw gateway_ip_address
    
  12. Mount the NFS server holding the backups.
    # mkdir -p /root/mnt
    # mount -t nfs -o ro,intr,soft,proto=tcp,nolock nfs_ip:/location_of_backup /root/mnt
    
  13. From the backup which was created in "Backing up the Management Domain dom0 Using Snapshot-Based Backup", restore the root directory (/) and the boot file system.
    # tar -pjxvf /root/mnt/backup-of-root-and-boot.tar -C /mnt
  14. Unmount the restored /dev/sda1 partition, and remount it on /boot.
    # umount /mnt/boot
    # mkdir -p /boot
    # mount /dev/sda1 /boot -t ext3
    
  15. Set up the grub boot loader using the command below:
    # grub --device-map=/boot/grub/device.map << DOM0_GRUB_INSTALL
    root (hd0,0)
    setup (hd0)
    quit
    DOM0_GRUB_INSTALL
    
  16. Unmount the /boot partition.
    # umount /boot
  17. Detach the diagnostics.iso file.

    This can be done by clicking Disconnect on the ILOM web interface console, where you clicked Connect in step 2.h to attach the DVD ISO image.

  18. Check the restored /etc/fstab file and remove any reference to /EXAVMIMAGES.
    # cd /mnt/etc

    Comment out any line that references /EXAVMIMAGES.

  19. Restart the system.
    # reboot

    This completes the restoration procedure for the Oracle VS/dom0.

  20. Convert to eighth rack, if required.

    If the recovery is on an Oracle Exadata Database Machine eighth rack, then perform the procedure described in "Configuring Oracle Exadata Database Machine Eighth Rack Oracle Linux Database Server After Recovery"".

  21. When the server comes back up, build an OCFS2 file system on the LVDbExaVMImages logical volume, which was created in step 9.f.
    # mkfs -t ocfs2 -L ocfs2 -T vmstore --fs-features=local /dev/VGExaDb/LVDbExaVMImages --force
  22. Mount the OCFS2 partition on /EXAVMIMAGES.
    # mount -t ocfs2 /dev/VGExaDb/LVDbExaVMImages /EXAVMIMAGES
  23. In /etc/fstab, uncomment the commented out references to /EXAVMIMAGES and /dev/mapper/VGExaDb-LVDbExaVMImages that was performed in step 18.
  24. Mount the backup NFS server that holds the storage repository (/EXAVMIMAGES) backup to restore the /EXAVMIMAGES file system.
    # mkdir -p /root/mnt
    # mount -t nfs -o ro,intr,soft,proto=tcp,nolock nfs_ip:/location_of_backup /root/mnt
    
  25. Restore the /EXAVMIMAGES file system.
    # tar -Spxvf /root/mnt/backup-of-exavmimages.tar -C /EXAVMIMAGES

    To restore a single user domain from the backup, use the following command instead:

    # tar -Spxvf /root/mnt/backup-of-exavmimages.tar -C /EXAVMIMAGES EXAVMIMAGES/<user-domain-name-to-be-restored>
  26. Bring up each user domain.
    # xm create /EXAVMIMAGES/GuestImages/user_domain_hostname/vm.cfg

At this point all the user domains should come up along with Oracle Grid Infrastructure and the database instances, and the database instances should join the Oracle RAC cluster formed by the other surviving Oracle Virtual Server nodes.

2.13.1.3 Recovering an Oracle VM Server and Its User Domains (Release 18.1 and X7 and Later)

You can recover an Oracle VM Server (OVS) from a snapshot-based backup when severe disaster conditions damage the OVS, or when the server hardware is replaced to such an extent that it amounts to new hardware.

  1. Prepare an NFS server to host the backup archive mybackup.tar.bz2.

    The NFS server must be accessible by IP address. For example, on an NFS server with the IP address nfs_ip, where the directory /export is exported from NFS mounts, put the mybackup.tar.bz2 file in the /export directory

  2. Attach the /opt/oracle.SupportTools/diagnostics.iso file from any healthy database server as virtual media to the ILOM of the Oracle VM Server to be restored.
    The following example shows how to set up a virtual CD-ROM using the ILOM interface:
    1. Copy the diagnostics.iso file to a directory on the machine that will be using the ILOM interface.
    2. Log in to the ILOM web interface.
    3. In the Oracle ILOM web interface, click Remote Control, and then click Redirection.
    4. Select Use Video Redirection.
    5. After the console launches, click Storage in the KVMS menu.
    6. To add a storage image, such as a DVD image, to the Storage Devices dialog box, click Add.
    7. Open the diagnostics.iso file.
    8. To redirect storage media from the Storage Device dialog box, select the storage media and click Connect.

      After a connection to the device has been established, the label on the Connect button in the Storage Device dialog box changes to Disconnect.

    9. Select Host Control from the Host Management tab.
    10. Select CDROM as the next boot device from the list of values.
    11. Click Save.

      When the system is booted, the diagnostics.iso image is used.

  3. Restart the system from the ISO image file.

    You can restart the system using one of the following methods:

    • Choose the CD-ROM as the boot device during start up

    • Preset the boot device by running the ipmitool command from any other machine that can reach the ILOM of the Oracle VM Server to be restored:

      # ipmitool -H ILOM_ip_address_or_hostname -U root chassis bootdev cdrom
      # ipmitool -H ILOM_ip_address_or_hostname -U root chassis power cycle
  4. Log in to the diagnostics shell as the root user.

    When the system displays the following:

    Choose from following by typing letter in '()':
    (e)nter interactive diagnostics shell. Must use credentials from Oracle support to login (reboot or power cycle to exit the shell),
    (r)estore system from NFS backup archive,

    Type e to enter the diagnostics shell, and log in as the root user.

    Note:

    If you do not have the password for the root user, then contact My Oracle Support.
  5. If required, use /opt/MegaRAID/MegaCli/MegaCli64 to configure the disk controller to set up the disks.
  6. Remove the logical volumes, the volume group, and the physical volume, in case they still exist after the disaster.
    # lvm vgremove VGExaDb --force
    # lvm pvremove /dev/sda3 --force
  7. Remove the existing partitions, then verify all partitions were removed.
    # parted
    GNU Parted 2.1
    Using /dev/sda
    Welcome to GNU Parted! Type 'help' to view a list of commands.
    (parted) print 
    Model: AVAGO MR9361-16i (scsi)
    Disk /dev/sda: 4193GB
    Sector size (logical/physical): 512B/512B
    Partition Table: gpt
    
    Number  Start   End     Size    File system  Name     Flags
     1      32.8kB  537MB   537MB   ext4         primary  boot
     2      537MB   805MB   268MB   fat32        primary  boot
     3      805MB   4193GB  4192GB               primary  lvm
    
    (parted) rm 1
    [ 1730.498593]  sda: sda2 sda3 
    (parted) rm 2 
    [ 1736.203794]  sda: sda3
    
    (parted) rm 3 
    [ 1738.546845]  sda:
    (parted) print
     Model: AVAGO MR9361-16i (scsi)
    Disk /dev/sda: 4193GB
    Sector size (logical/physical): 512B/512B
    Partition Table: gpt
    
    Number  Start  End  Size  File system  Name  Flags
    
    (parted) q 
    Information: You may need to update /etc/fstab.
  8. Create the three partitions on /dev/sda.
    1. Get the end sector for the disk /dev/sda from a surviving management domain (dom0) and store it in a variable:
      # end_sector_logical=$(parted -s /dev/sda unit s print|perl -ne '/^Disk\s+\S+:\s+(\d+)s/ and print $1')
      # end_sector=$( expr $end_sector_logical - 34 )
      # echo $end_sector

      The values for the start and end sectors in the commands below were taken from a surviving management domain (Dom0). Because these values can change over time, it is recommended that these values are checked from a surviving Dom0. For example, for an Exadata Database Machine X7-2 database server with 8 hard disk drives, you might see the following:

      # parted -s /dev/sda unit s print
      Model: AVAGO MR9361-16i (scsi)
      Disk /dev/sda: 8189440000s
      Sector size (logical/physical): 512B/512B
      Partition Table: gpt
      
      Number  Start     End          Size         File system  Name     Flags
       1      64s       1048639s     1048576s     ext4         primary  boot
       2      1048640s  1572927s     524288s      fat32        primary  boot
       3      1572928s  8189439966s  8187867039s               primary  lvm
      

      Note:

      The s (sector) value for the following sub-steps are based on a system with 8 hard disk drives. If you have 4 hard disk drives, then you need to view the partition table from the management domain on a surviving node and adjust the sector values accordingly.
    2. Create the boot partition, /dev/sda1.
      # parted -s /dev/sda mklabel gpt mkpart primary 64s 1048639s set 1 boot on
    3. Create the partition that will hold the LVMs, /dev/sda2.
      # parted -s /dev/sda mkpart primary fat32 1048640s 1572927s set 2 boot on
    4. Create the partition that will hold the LVMs, /dev/sda3.
      # parted -s /dev/sda mkpart primary 1572928s 8189439966s set 3 lvm on
  9. Use the /sbin/lvm command to re-create the logical volumes and mkfs to create the file systems.
    1. Create the physical volume and the volume group.
      # lvm pvcreate /dev/sda3
      # lvm vgcreate VGExaDb /dev/sda3
      
    2. Create the logical volume for the file system that will contain the / (root) directory and label it.
      # lvm lvcreate -n LVDbSys3 -L30G VGExaDb
      # mkfs -t ext4 /dev/VGExaDb/LVDbSys3
      # e2label /dev/VGExaDb/LVDbSys3 DBSYSOVS
      
    3. Create the logical volume for the swap directory, and label it.
      # lvm lvcreate -n LVDbSwap1 -L24G VGExaDb
      # mkswap -L SWAP /dev/VGExaDb/LVDbSwap1
      
    4. Create the logical volume for the backup partition, and build a file system on top of it.
      # lvm lvcreate -n LVDbSys2 -L30G VGExaDb
      # mkfs -t ext4 /dev/VGExaDb/LVDbSys2
    5. Create the logical volume for the guest storage repository.
      # lvm lvcreate -l 100%FREE -n LVDbExaVMImages VGExaDb
      
    6. Create a file system on the /dev/sda1 partition, and label it.
      # mkfs.ext4 /dev/sda1
      # e2label /dev/sda1 BOOT
      # tune2fs -l /dev/sda1
    7. Create a file system on the /dev/sda2 partition, and label it.
      # mkfs.vfat -v -c -F 32 -s 2 /dev/sda2
  10. Create mount points for all the partitions, and mount the respective partitions.

    For example, if /mnt is used as the top-level directory, the mounted list of partitions might look like:

    • /dev/VGExaDb/LVDbSys3 on /mnt
    • /dev/sda1 on /mnt/boot
    • /dev/sda2 on /mnt/boot/efi

    The following example mounts the root (/) file system, and creates three mount points:

    # mount /dev/VGExaDb/LVDbSys3 /mnt -t ext4
    # mkdir /mnt/boot
    # mount /dev/sda1 /mnt/boot -t ext4
    # mkdir /mnt/boot/efi
    # mount /dev/sda2 /mnt/boot/efi -t vfat
    
  11. Bring up the network on eth0 and (if not using DHCP) assign the host's IP address and netmask to it.

    If you are using DHCP then you do not have to manually configure the IP address for the host.

    # ip address add ip_address_for_eth0/netmask_for_eth0 dev eth0
    # ip link set up eth0
    # ip route add default via gateway_ip_address dev eth0
  12. Mount the NFS server holding the backups.
    # mkdir -p /root/mnt
    # mount -t nfs -o ro,intr,soft,proto=tcp,nolock nfs_ip:/location_of_backup /root/mnt
    
  13. From the backup which was created in Backing up the Management Domain dom0 Using Snapshot-Based Backup, restore the root directory (/) and the boot file system.
    # tar -pjxvf /root/mnt/backup-of-root-and-boot.tar -C /mnt
  14. Use the efibootmgr command to set the boot device.
    1. Disable and delete the Oracle Linux boot device. If you see the entry ExadataLinux_1, then remove this entry and recreate it.

      For example:

      # efibootmgr
      BootCurrent: 000F
      Timeout: 1 seconds
      BootOrder: 000F,0000,0001,0002,0003,0004,0005,0007,0008,0009,000A,000D,000E
      Boot0000* ExadataLinux_1
      Boot0001* NET0:PXE IP4 Intel(R) I210 Gigabit  Network Connection
      Boot0002* NET1:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller
      Boot0003* NET2:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller
      Boot0004* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter
      Boot0005* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter
      Boot0007* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot0008* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot0009* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot000A* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot000D* Oracle Linux
      Boot000E* UEFI OS
      Boot000F* USB:SUN
      

      In this example, you would disable and remove Oracle Linux (Boot00D) and ExadataLinux_1 (Boot000). Use commands similar to the following to disable and delete the boot devices:

      Disable 'Oracle Linux':
      # efibootmgr -b 000D -A
      Delete 'Oracle Linux':
      # efibootmgr -b 000D -B
      Disable old 'ExadataLinux_1':
      # efibootmgr -b 0000 -A
      Delete old 'ExadataLinux_1':
      # efibootmgr -b 0000 -B

    2. Recreate the boot entry for ExadataLinux_1 and then view the boot order entries.
      # efibootmgr -c -d /dev/sda -p 2 -l '\EFI\XEN\XEN.EFI' -L 
      'ExadataLinux_1'
      
      # efibootmgr
      BootCurrent: 000F
      Timeout: 1 seconds
      BootOrder: 0000,0001,0002,0003,0004,0005,0007,0008,0009,000A,000E,000F
      Boot0001* NET0:PXE IP4 Intel(R) I210 Gigabit  Network Connection
      Boot0002* NET1:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller
      Boot0003* NET2:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller
      Boot0004* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter
      Boot0005* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter
      Boot0007* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot0008* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot0009* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot000A* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot000E* UEFI OS
      Boot000F* USB:SUN
      Boot0000* ExadataLinux_1

      In the output from the efibootmgr command, make note of the boot order number for ExadataLinux_1 and use that value in the following commands:

      # efibootmgr -b (entry number) -A
      # efibootmgr -b (entry number) -a

      For example, in the previous output shown in step 14.a, ExadataLinux_1 was listed as (Boot000). So you would use the following commands:

      # efibootmgr -b 0000 -A
      # efibootmgr -b 0000 -a
    3. Set the correct boot order.
      Set ExadataLinux_1 as the first boot device. The remaining devices should stay in the same boot order, except for USB:SUN, which should be last.
      # efibootmgr -o
      0000,0001,0002,0003,0004,0005,0007,0008,0009,000A,000E,000F
      

      The boot order should now look like the following:

      # efibootmgr
      BootCurrent: 000F
      Timeout: 1 seconds
      BootOrder: 0000,0001,0002,0003,0004,0005,0007,0008,0009,000A,000E,000F
      Boot0000* ExadataLinux_1
      Boot0001* NET0:PXE IP4 Intel(R) I210 Gigabit  Network Connection
      Boot0002* NET1:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller
      Boot0003* NET2:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller
      Boot0004* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter
      Boot0005* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter
      Boot0007* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot0008* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot0009* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot000A* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot000E* UEFI OS
      Boot000F* USB:SUN
    4. Check the boot order using the ubiosconfig command.
      # ubiosconfig export all -x /tmp/ubiosconfig.xml
      Make sure the ExadataLinux_1 entry is the first child element of boot_order.
       <boot_order>
          <boot_device>
            <description>ExadataLinux_1</description>  
            <instance>1</instance>
          </boot_device>
          <boot_device>
            <description>NET0:PXE IP4 Intel(R) I210 Gigabit  Network
      Connection</description>
            <instance>1</instance>
          </boot_device>
      ...
  15. Check the restored /etc/fstab file and remove any reference to /EXAVMIMAGES.
    # cd /mnt/etc

    Comment out any line that references /EXAVMIMAGES.

  16. Detach the diagnostics.iso file.

    This can be done by clicking Disconnect on the ILOM web interface console, where you clicked Connect in step 2.h to attach the DVD ISO image.

  17. Unmount the restored /dev/sda1 partitions so /dev/sda1 can be remounted on /boot.
    # umount /mnt/boot/efi
    # umount /mnt/boot
    # umount /mnt
    # umount /root/mnt
  18. Restart the system.
    # reboot

    This completes the restoration procedure for the Oracle VM Server management domain (dom0).

  19. Convert to eighth rack, if required.

    If the recovery is on an Oracle Exadata Database Machine eighth rack, then perform the procedure described in Configuring Oracle Exadata Database Machine Eighth Rack Oracle Linux Database Server After Recovery.

  20. When the server comes back up, build an OCFS2 file system on the LVDbExaVMImages logical volume, which was created in step 9.e.
    # mkfs -t ocfs2 -L ocfs2 -T vmstore --fs-features=local /dev/VGExaDb/LVDbExaVMImages --force
  21. Mount the OCFS2 partition on /EXAVMIMAGES.
    # mount -t ocfs2 /dev/VGExaDb/LVDbExaVMImages /EXAVMIMAGES
  22. In /etc/fstab, uncomment the commented out references to /EXAVMIMAGES and /dev/mapper/VGExaDb-LVDbExaVMImages that was performed in step 15.
  23. Mount the backup NFS server that holds the storage repository (/EXAVMIMAGES) backup to restore the /EXAVMIMAGES file system.
    # mkdir -p /root/mnt
    # mount -t nfs -o ro,intr,soft,proto=tcp,nolock nfs_ip:/location_of_backup /root/mnt
    
  24. Restore the /EXAVMIMAGES file system.
    # tar -Spxvf /root/mnt/backup-of-exavmimages.tar -C /EXAVMIMAGES

    To restore a single user domain from the backup, use the following command instead:

    # tar -Spxvf /root/mnt/backup-of-exavmimages.tar -C /EXAVMIMAGES EXAVMIMAGES/<user-domain-name-to-be-restored>
  25. Bring up each user domain.
    # xm create /EXAVMIMAGES/GuestImages/user_domain_hostname/vm.cfg

At this point all the user domains should come up along with Oracle Grid Infrastructure and the database instances, and the database instances should join the Oracle RAC cluster formed by the other surviving Oracle VM Server nodes.

2.13.2 Scenario 2: Reimaging dom0 and Restoring User Domains from Backups

The following procedure can be used when the OVS/dom0 is damaged beyond repair and no backup exists for the dom0, but there is a backup available of the storage repository (/EXAVMIMAGES file system) housing all the user domains. This procedure reimages the dom0 and reconstructs all the user domains.

  1. Re-image the OVS with the image used in the other OVS/dom0's in the rack using the procedure described in "Re-Imaging Oracle Exadata Database Servers".

  2. Run the following commands:

    # /opt/oracle.SupportTools/switch_to_ovm.sh
    
    # /opt/oracle.SupportTools/reclaimdisks.sh –free –reclaim
    
  3. If the recovery is on Oracle Exadata Database Machine eighth rack, then perform the procedure described in "Configuring Oracle Exadata Database Machine Eighth Rack Oracle Linux Database Server After Recovery".

  4. Rebuild the ocfs2 file system on the /dev/sda3 partition.

    # umount /EXAVMIMAGES
    
    # mkfs -t ocfs2 -L ocfs2 -T vmstore --fs-features=local /dev/sda3 --force
    
  5. Mount the ocfs2 partition /dev/sda3 on /EXAVMIMAGES.

    # mount -t ocfs2 /dev/sda3 /EXAVMIMAGES
    
  6. Mount the backup NFS server to restore the /EXAVMIMAGES file system which holds the user domain images:

    # mkdir -p /remote_FS
    
    # mount -t nfs -o ro,intr,soft,proto=tcp,nolock nfs_ip:/location_of_backup /remote_FS
    
  7. Restore the /EXAVMIMAGES file system.

    # tar -Spxvf /remote_FS/backup-of-exavmimages.tar -C /EXAVMIMAGES
    

    Note:

    The restore process of storage repository restores the user domain specific files (files under /EXAVMINAGES/GuestImages/<user_domain>/) as regular files and not as ocfs2 reflinks which is what these files in the storage repository originally got created as at the time of user domain creation. Consequently, the space usage in /EXAVMINAGES may go up after the restoration process compared to the original space usage at the time of the backup.

  8. Set up the network bridges manually.

    1. Determine the version of the ovmutils rpm:

      # rpm -qa|grep ovmutils
      
    2. If the version of the ovmutils rpm is earlier than 12.1.2.2.0, perform these steps:

      • Back up /opt/exadata_ovm/exadata.img.domu_maker. You will need the backup copy later.

        # cp /opt/exadata_ovm/exadata.img.domu_maker /opt/exadata_ovm/exadata.img.domu_maker-orig
        
      • Open the /opt/exadata_ovm/exadata.img.domu_maker file in a text editor such as vi, and search for "g_do_not_set_bridge=yes". It should be located a few lines below the case statement option "network-discovery)".

        Change it to "g_do_not_set_bridge=no".

        Save and exit /opt/exadata_ovm/exadata.img.domu_maker.

      • Run /opt/exadata_ovm/exadata.img.domu_maker manually for every xml file in the /EXAVMIMAGES/conf directory.

        # cd /EXAVMIMAGES/conf
        # ls -1|while read file; do /opt/exadata_ovm/exadata.img.domu_maker network-discovery $file /tmp/netdisc-$file; done
        
      • Restore /opt/exadata_ovm/exadata.img.domu_maker from the backup copy.

        # cp /opt/exadata_ovm/exadata.img.domu_maker-orig /opt/exadata_ovm/exadata.img.domu_maker
        
    3. If the version of the ovmutils rpm is 12.1.2.2.0 or later, run the following command:

      # /opt/exadata_ovm/exadata.img.domu_maker add-bonded-bridge-dom0 vmbondeth0 eth4 eth5
      
  9. For each user domain directory in the /EXAVMIMAGES/GuestImages directory, perform the steps below.

    1. Get the UUID of the user domain.

      # grep ^uuid /EXAVMIMAGES/GuestImages/<user domain hostname>/vm.cfg|awk -F"=" '{print $2}'|sed s/"'"//g|sed s/" "//g
      

      The command returns the uuid value, which is used in the commands below.

    2. # mkdir -p /OVS/Repositories/uuid

    3. # ln -s /EXAVMIMAGES/GuestImages/user_domain_hostname/vm.cfg /OVS/Repositories/uuid/vm.cfg

    4. # ln -s /OVS/Repositories/uuid/vm.cfg /etc/xen/auto/user_domain_hostname.cfg

    5. # mkdir VirtualDisks

    6. # cd VirtualDisks

    7. Create four symbolic links in this directory using the four disk image names in the vm.cfg file, pointing to the four ".img" files in /EXAVMIMAGES/GuestImages/user_domain_hostname directory.

      For example, below is a sample disk entry in a sample vm.cfg file in a /OVS/Repositories/uuid directory:

      disk =  ['file:/OVS/Repositories/6e7c7109c1bc4ebba279f84e595e0b27/VirtualDisks/dfd641a1c6a84bd69643da704ff98594.img,xvda,w','file:/OVS/Repositories/6e7c7109c1bc4ebba279f84e595e0b27/VirtualDisks/d349fd420a1e49459118e6a6fcdbc2a4.img,xvdb,w','file:/OVS/Repositories/6e7c7109c1bc4ebba279f84e595e0b27/VirtualDisks/8ac470eeb8704aab9a8b3adedf1c3b04.img,xvdc,w','file:/OVS/Repositories/6e7c7109c1bc4ebba279f84e595e0b27/VirtualDisks/333e7ed2850a441ca4d2461044dd0f7c.img,xvdd,w']
      

      You can list the four ".img" files in the /EXAVMIMAGES/GuestImages/user_domain_hostname directory:

      ls /EXAVMIMAGES/GuestImages/user_domain_name/*.img
      
      /EXAVMIMAGES/GuestImages/user_domain_name/System.img
      /EXAVMIMAGES/GuestImages/user_domain_name/grid12.1.0.2.2.img
      /EXAVMIMAGES/GuestImages/user_domain_name/db12.1.0.2.2-3.img
      /EXAVMIMAGES/GuestImages/user_domain_name/pv1_vgexadb.img
      

      In this case, the commands below may be used to create the four symbolic links where dbm01db08vm01 is the user domain hostname:

      # ln -s /EXAVMIMAGES/GuestImages/dbm01db08vm01/System.img $(grep ^disk /EXAVMIMAGES/GuestImages/dbm01db08vm01/vm.cfg|awk -F":" '{print $2}'|awk -F"," '{print $1}'|awk -F"/" '{print $6}')
      
      # ln -s /EXAVMIMAGES/GuestImages/dbm01db08vm01/grid12.1.0.2.2.img $(grep ^disk /EXAVMIMAGES/GuestImages/dbm01db08vm01/vm.cfg|awk -F":" '{print $3}'|awk -F"," '{print $1}'|awk -F"/" '{print $6}')
      
      # ln -s /EXAVMIMAGES/GuestImages/dbm01db08vm01/db12.1.0.2.2-3.img $(grep ^disk /EXAVMIMAGES/GuestImages/dbm01db08vm01/vm.cfg|awk -F":" '{print $4}'|awk -F"," '{print $1}'|awk -F"/" '{print $6}')
      
      # ln -s /EXAVMIMAGES/GuestImages/dbm01db08vm01/pv1_vgexadb.img $(grep ^disk /EXAVMIMAGES/GuestImages/dbm01db08vm01/vm.cfg|awk -F":" '{print $5}'|awk -F"," '{print $1}'|awk -F"/" '{print $6}')
      
  10. Bring up each user domain.

    # xm create /EXAVMIMAGES/GuestImages/user_domain_hostname/vm.cfg
    

At this point all the user domain should come up along with the Grid Infrastructure and the database instances in them and join the Oracle RAC cluster formed by the other surviving OVS nodes.

2.13.3 Scenario 3: Restoring and Recovering User Domains from Snapshot Backups

Use this procedure to restore lost or damaged files of a user domain using a snapshot-based user domain backup taken from inside a user domain. The user domain backup was created using the procedure described in "Method 3: Back up a User Domain from Inside the User Domain".

  1. Log in to the user domain as the root user.
  2. Mount the backup NFS server to restore the damaged or lost files.
    # mkdir -p /root/mnt
    
    # mount -t nfs -o ro,intr,soft,proto=tcp,nolock nfs_ip:/location_of_backup /root/mnt
    
  3. Extract the damaged or lost files from the backup to a staging area.

    Prepare a staging area to hold the extracted files. The backup LVM LVDbSys2 can be used for this:

    # mkdir /backup-LVM
    
    # mount /dev/mapper/VGExaDb-LVDbSys2 /backup-LVM
    
    # mkdir /backup-LVM/tmp_restore
    
    # tar -pjxvf /root/mnt/tar_file_name -C /backup-LVM/tmp_restore absolute_path_of_file_to_be_restored
    
  4. Restore the damaged or lost files from the temporary staging area as needed.
  5. Reboot the user domain.

2.14 Re-Imaging Oracle Exadata Database Servers

The re-image procedure is necessary when an Oracle Linux database server needs to be brought to an initial state for various reasons.

Some examples scenarios for re-imaging the database server are:

  • You want to install a new server and need to use an earlier release than is in the image already installed on the server.
  • You need to replace a damaged database server with a new database server.
  • Your database server had multiple disk failures causing local disk storage failure and you do not have a database server backup.
  • You want to repurpose the server to a new rack.

During the re-imaging procedure, the other database servers on Oracle Exadata Database Machine are available. When the new server is added to the cluster, the software is copied from an existing database server to the new server. It is your responsibility to restore scripting, CRON jobs, maintenance actions, and non-Oracle software.

Note:

The procedures in this section assume the database is Oracle Database 11g Release 2 (11.2) or later. If the database is Oracle Database 11g Release 1 (11.1), then refer to the documentation for that release for information about adding and deleting a server from a cluster.

Starting with Oracle Exadata System Software release 19.1.0, Secure Eraser is automatically started during re-imaging if the hardware supports Secure Eraser. This significantly simplifies the re-imaging procedure while maintaining performance. Now, when re-purposing a rack, you only have to image the rack and the secure data erasure is taken care of transparently as part of the process.

The following tasks describes how to re-image an Oracle Exadata Database Server running Oracle Linux:

  1. Contact Oracle Support Services

  2. Download Latest Release of Cluster Verification Utility

  3. Remove the Database Server from the Cluster

  4. Image the Database Server

  5. Configure the Re-imaged Database Server

  6. Prepare the Re-imaged Database Server for the Cluster

  7. Apply Oracle Exadata System Software Patch Bundles to the Replacement Database Server

  8. Clone Oracle Grid Infrastructure to the Replacement Database Server

  9. Clone Oracle Database Homes to the Replacement Database Server

2.14.1 Contact Oracle Support Services

If a failed server is being replaced, open a support request with Oracle Support Services.

The support engineer will identify the failed server, and send a replacement. The support engineer will ask for the output from the imagehistory command run from a surviving database server. The output provides a link to the computeImageMaker file that was used to image the original database server, and provides a means to restore the system to the same level.

2.14.2 Download Latest Release of Cluster Verification Utility

The latest release of the cluster verification utility (cluvfy) is available from My Oracle Support.

See My Oracle Support note 316817.1 for download instructions and other information.

2.14.3 Remove the Database Server from the Cluster

If you are reimaging a failed server or repurposing a server, follow the steps in this task to remove the server from the cluster before you reimage it. If you are reimaging the server for a different reason, skip this task and proceed with the reimaging task next.

The steps in this task are performed using a working database server in the cluster. In the following commands, working_server is a working database server, and failed_server is the database server you are removing, either because it failed or it is being repurposed.

  1. Log in as the oracle user on a database server in the cluster.
    The oracle user refers to the operating system user that owns the Oracle Database software installation. The $ORACLE_HOME variable should point to the location of the database home.
  2. Disable the listener that runs on the failed server.
    $ srvctl disable listener -n failed_server
    $ srvctl stop listener -n failed_server
    
  3. Delete the Oracle home from the Oracle inventory.

    In the following command, list_of_working_servers is a list of the servers that are still working in the cluster, such as dm01db02, dm01db03, and so on.

    In the following command, replace /u01/app/oracle/product/12.1.0.2/dbhome_1 with the location of your Oracle Database home directory.

    $ cd $ORACLE_HOME/oui/bin
    $ ./runInstaller -updateNodeList ORACLE_HOME= \
    /u01/app/oracle/product/12.1.0.2/dbhome_1 "CLUSTER_NODES=list_of_working_servers"
    
  4. Log in as the grid user on the database server.
    The grid user refers to the operating system user that owns the Oracle Grid Infrastructure software installation. The $ORACLE_HOME variable should point to the location of the Grid home.
  5. Verify the failed server is unpinned.
    $ olsnodes -s -t
    

    The following is an example of the output from the command:

    dm01db01        Inactive        Unpinned
    dm01db02        Active          Unpinned
    
  6. Log in as the root user on the database server.
  7. Stop and delete the VIP resources for the failed database server.
    # srvctl stop vip -i failed_server-vip
    PRCC-1016 : failed_server-vip.example.com was already stopped
    
    # srvctl remove vip -i failed_server-vip
    Please confirm that you intend to remove the VIPs failed_server-vip (y/[n]) y
    
  8. Delete the server from the cluster.
    # crsctl delete node -n failed_server
    CRS-4661: Node dm01db01 successfully deleted.
    

    If you receive an error message similar to the following, then relocate the voting disks.

    CRS-4662: Error while trying to delete node dm01db01.
    CRS-4000: Command Delete failed, or completed with errors.
    

    To relocate the voting disks use the following steps:

    1. Determine the current location of the voting disks.
      # crsctl query css votedisk
      

      The following is an example of the output from the command. The current location is DBFS_DG.

      ##  STATE    File Universal Id          File Name                Disk group
      --  -----    -----------------          ---------                ----------
      1. ONLINE   123456789abab (o/192.168.73.102/DATA_CD_00_dm01cel07) [DBFS_DG]
      2. ONLINE   123456789cdcd (o/192.168.73.103/DATA_CD_00_dm01cel08) [DBFS_DG]
      3. ONLINE   123456789efef (o/192.168.73.100/DATA_CD_00_dm01cel05) [DBFS_DG]
      Located 3 voting disk(s).
      
    2. Relocate the voting disks to another disk group.
      # ./crsctl replace votedisk +DATA
      
      Successful addition of voting disk 2345667aabbdd.
      ...
      CRS-4266: Voting file(s) successfully replaced
      
    3. Relocate the voting disks to the original location using a command similar to the following:
      # ./crsctl replace votedisk +DBFS_DG
      
    4. Delete the server from the cluster.
  9. Log in as the grid user on the database server.
    The grid user refers to the operating system user that owns the Oracle Grid Infrastructure software installation. The $ORACLE_HOME variable should point to the location of the Grid home.
  10. Update the Oracle inventory.

    In the following command, replace /u01/app/12.1.0.2/grid with the location of your Oracle Grid Infrastructure home directory.

    $ cd $ORACLE_HOME/oui/bin
    $ ./runInstaller -updateNodeList ORACLE_HOME=/u01/app/12.1.0.2/grid \
      "CLUSTER_NODES=list_of_working_servers" CRS=TRUE
    
  11. Verify the server was deleted successfully.
    $ cluvfy stage -post nodedel -n failed_server -verbose
    

    The following is an example of the output from the command:

    Performing post-checks for node removal
    Checking CRS integrity...
    The Oracle clusterware is healthy on node "dm01db02"
    CRS integrity check passed
    Result:
    Node removal check passed
    Post-check for node removal was successful.

2.14.4 Image the Database Server

After the database server has been installed or replaced, you can image the new database server.

You can use installation media on a USB thumb drive, or a touchless option using PXE or ISO attached to the ILOM. See Imaging a New System in Oracle Exadata Database Machine Installation and Configuration Guide for the details.

2.14.5 Configure the Re-imaged Database Server

The re-imaged database server does not have any host names, IP addresses, DNS or NTP settings. The steps in this task describe how to configure the re-imaged database server.

You need the following information prior to configuring the re-imaged database server:

  • Name servers
  • Time zone, such as Americas/Chicago
  • NTP servers
  • IP address information for the management network
  • IP address information for the client access network
  • IP address information for the InfiniBand network
  • Canonical host name
  • Default gateway

The information should be the same on all database servers in Oracle Exadata Database Machine. The IP addresses can be obtained from DNS. In addition, a document with the information should have been provided when Oracle Exadata Database Machine was installed.

The following procedure describes how to configure the re-imaged database server:

  1. Power on the replacement database server. When the system boots, it automatically runs the Configure Oracle Exadata routine, and prompts for information.
  2. Enter the information when prompted, and confirm the settings. The startup process will continue.

Note:

  • If the database server does not use all network interfaces, then the configuration process stops, and warns that some network interfaces are disconnected. It prompts whether to retry the discovery process. Respond with yes or no, as appropriate for the environment.

  • If bonding is used for the client access network, then it is set in the default active-passive mode at this time.

2.14.6 Prepare the Re-imaged Database Server for the Cluster

This task describes how to ensure the changes made during initial installation are done to the re-imaged, bare metal database server.

Note:

For Oracle VM systems, follow the procedure in Expanding an Oracle VM Oracle RAC Cluster on Exadata.
  1. Copy or merge the contents of the following files using files on a working database server as reference:
    1. Copy the contents of the /etc/security/limits.conf file.
    2. Merge the contents of the /etc/hosts files.
    3. Copy the /etc/oracle/cell/network-config/cellinit.ora file.
    4. Update the /etc/oracle/cell/network-config/cellinit.ora file with the IP_ADDRESS of the ifcfg-bondib0 interface (in case of active/passive bonding) or ib0 and ib1 interfaces (in case of active/active bonding) of the replacement server.
    5. Copy the /etc/oracle/cell/network-config/cellip.ora file.
      The content of the cellip.ora file should be the same on all database servers.
    6. Configure additional network requirements, such as 10 GbE.
    7. Copy the modprobe configuration.

      The contents of the configuration file should be the same on all database servers.

      • Oracle Linux 5 or 6: The file is located at /etc/modprobe.conf.
      • Oracle Linux 7: The file is located at /etc/modprobe.d/exadata.conf.
    8. Copy the /etc/sysctl.conf file.
      The contents of the file should be the same on all database servers.
    9. Update the cellroute.ora.

      Make a copy of the /etc/oracle/cell/network-config/cellroute.ora file. Modify the contents on the replacement server to use the local InfiniBand interfaces on the new node.

    10. Restart the database server so the network changes take effect.
  2. Set up the users for the software owners on the replacement database server by adding groups.
    If you are using role-separated management, then the users are usually oracle and grid. If you use a single software owner, then the user is usually oracle. The group information is available on a working database server.
    1. Obtain the current group information from a working database server.
      # id oracle
      uid=1000(oracle) gid=1001(oinstall) groups=1001(oinstall),1002(dba),1003(oper),1004(asmdba)
      
    2. Use the groupadd command to add the group information to the replacement database server.
      # groupadd -g 1001 oinstall
      # groupadd -g 1002 dba
      # groupadd -g 1003 oper
      # groupadd -g 1004 asmdba
      
    3. Obtain the current user information from a working database server.
      # id oracle uid=1000(oracle) gid=1001(oinstall) \
        groups=1001(oinstall),1002(dba),1003(oper),1004(asmdba)
      
    4. Add the user information to the replacement database server.
      # useradd -u 1000 -g 1001 -G 1001,1002,1003,1004 -m -d /home/oracle -s \
        /bin/bash oracle
      
    5. Create the Oracle Base and Grid home directories, such as /u01/app/oracle and /u01/app/12.2.0.1/grid.
      # mkdir -p /u01/app/oracle
      # mkdir -p /u01/app/12.2.0.1/grid
      # chown -R oracle:oinstall /u01/app
      
    6. Change the ownership on the cellip.ora and cellinit.ora files.

      The ownership is usually oracle:oinstall.

      # chown -R oracle:oinstall /etc/oracle/cell/network-config
      
    7. Secure the restored database server.
      $ chmod u+x /opt/oracle.SupportTools/harden_passwords_reset_root_ssh
      $ /opt/oracle.SupportTools/harden_passwords_reset_root_ssh
      

      The database server restarts. Log in as the root user when prompted by the system. You are prompted for a new password. Set the password to match the root password of the other database servers.

    8. Set the password for the Oracle software owner.
      The owner is usually oracle.
      # passwd oracle
      
  3. Set up SSH for the oracle account.
    1. Log in to the oracle account on the replacement database server.
      # su - oracle
      
    2. Create the dcli group file on the replacement database server listing the servers in the Oracle cluster.
    3. Run the following command on the replacement database server.
      $ dcli -g dbs_group -l oracle -k
      
    4. Log in as the oracle user on the replacement database server.
      # su - oracle
      
    5. Verify SSH equivalency.
      $ dcli -g dbs_group -l oracle date
      
  4. Set up or copy any custom login scripts from a working database server to the replacement database server.

    In the following command, replacement_server is the name of the new server, such as dm01db01.

    $ scp .bash* oracle@replacement_server:. 
    

2.14.7 Apply Oracle Exadata System Software Patch Bundles to the Replacement Database Server

Oracle periodically releases Oracle Exadata System Software patch bundles for Oracle Exadata Database Machine.

If a patch bundle has been applied to the working database servers that was later than the release of the computeImageMaker file, then the patch bundle must be applied to the replacement Oracle Exadata Database Server. Determine the if a patch bundle has been applied as follows:

  • Prior to Oracle Exadata System Software release 11.2.1.2.3, the database servers did not maintain version history information. To determine the release number, log in to Oracle Exadata Storage Server, and run the following command:

    imageinfo -ver
    

    If the command shows a different release than the release used by the computeImageMaker file, then Oracle Exadata System Software patch has been applied to Oracle Exadata Database Machine and must be applied to the replacement Oracle Exadata Database Server.

  • Starting with Oracle Exadata System Software release 11.2.1.2.3, the imagehistory command exists on the Oracle Exadata Database Server. Compare information on the replacement Oracle Exadata Database Server to information on a working Oracle Exadata Database Server. If the working database has a later release, then apply the Oracle Exadata Storage Server patch bundle to the replacement Oracle Exadata Database Server.

2.14.8 Clone Oracle Grid Infrastructure to the Replacement Database Server

This procedure describes how to clone Oracle Grid Infrastructure to the replacement database server.

In the following commands, working_server is a working database server, and replacement_server is the replacement database server. The commands in this procedure are run from a working database server as the Grid home owner. When the root user is needed to run a command, it will be called out.

  1. Verify the hardware and operating system installation using the cluster verification utility (cluvfy).
    $ cluvfy stage -post hwos -n replacement_server,working_server -verbose
    

    The phrase Post-check for hardware and operating system setup was successful should appear at the end of the report. If the cluster verification utility fails to validate the storage on the replacement server, you can ignore those messages.

  2. Verify peer compatibility.
    $ cluvfy comp peer -refnode working_server -n replacement_server  \
      -orainv oinstall -osdba dba | grep -B 3 -A 2 mismatched
    

    The following is an example of the output:

    Compatibility check: Available memory [reference node: dm01db02]
    Node Name Status Ref. node status Comment
    ------------ ----------------------- ----------------------- ----------
    dm01db01 31.02GB (3.2527572E7KB) 29.26GB (3.0681252E7KB) mismatched
    Available memory check failed
    Compatibility check: Free disk space for "/tmp" [reference node: dm01db02]
    Node Name Status Ref. node status Comment
    ------------ ----------------------- ---------------------- ----------
    dm01db01 55.52GB (5.8217472E7KB) 51.82GB (5.4340608E7KB) mismatched
    Free disk space check failed
    

    If the only failed components are related to the physical memory, swap space and disk space, then it is safe to continue.

  3. Perform the requisite checks for adding the server.
    1. Ensure the GRID_HOME/network/admin/samples directory has permissions set to 750.
    2. Validate the addition of the database server.

      Run the following command as the oracle user. The command prompts for the password of the root user.

      $ cluvfy stage -pre nodeadd -n replacement_server -fixup -method root -verbose
      Enter "ROOT" password:

      If the only failed component is related to swap space, then it is safe to continue.

      If the command returns an error, then set the following environment variable and rerun the command:

      $ export IGNORE_PREADDNODE_CHECKS=Y
      
  4. Add the replacement database server to the cluster.

    If you are using Oracle Grid Infrastructure release 12.1 or higher, include the CLUSTER_NEW_NODE_ROLES attribute, as shown in the following example.

    $ cd GRID_HOME/addnode
    
    $ ./addnode.sh -silent "CLUSTER_NEW_NODES={replacement_server}" \
         "CLUSTER_NEW_VIRTUAL_HOSTNAMES={replacement_server-vip}" \
         "CLUSTER_NEW_NODE_ROLES={hub}"
    

    The second command causes Oracle Universal Installer to copy the Oracle Clusterware software to the replacement database server. A message similar to the following is displayed:

    WARNING: A new inventory has been created on one or more nodes in this session.
    However, it has not yet been registered as the central inventory of this
    system. To register the new inventory please run the script at
    '/u01/app/oraInventory/orainstRoot.sh' with root privileges on nodes
    'dm01db01'. If you do not register the inventory, you may not be able to update
    or patch the products you installed.
    
    The following configuration scripts need to be executed as the "root" user in
    each cluster node:
    
    /u01/app/oraInventory/orainstRoot.sh #On nodes dm01db01
    
    /u01/app/12.1.0.2/grid/root.sh #On nodes dm01db01
    
  5. Run the configuration scripts.
    As the root user, first disable HAIP, then run the orainstRoot.sh and root.sh scripts on the replacement database server using the commands shown in the following example.
    # export HAIP_UNSUPPORTED=true
    # /u01/app/oraInventory/orainstRoot.sh
    Creating the Oracle inventory pointer file (/etc/oraInst.loc)
    Changing permissions of /u01/app/oraInventory.
    Adding read,write permissions for group.
    Removing read,write,execute permissions for world.
    Changing groupname of /u01/app/oraInventory to oinstall.
    The execution of the script is complete.
     
    # GRID_HOME/root.sh
    

    Note:

    Check GRID_HOME/install/ log files for the output of root.sh script.

    If you are running Oracle Grid Infrastructure release 11.2, then the output file created by the script reports that the listener resource on the replaced database server failed to start. This is the expected output.

    /u01/app/11.2.0/grid/bin/srvctl start listener -n dm01db01 \
    ...Failed
    /u01/app/11.2.0/grid/perl/bin/perl \
    -I/u01/app/11.2.0/grid/perl/lib \
    -I/u01/app/11.2.0/grid/crs/install \
    /u01/app/11.2.0/grid/crs/install/rootcrs.pl execution failed

    After the scripts are run, the following message is displayed:

    The Cluster Node Addition of /u01/app/12.1.0.2/grid was successful.
    Please check '/tmp/silentInstall.log' for more details.
    
  6. Check the cluster.
    $ GRID_HOME/bin/crsctl check cluster -all
    
    **************************************************************
    node1:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    node2:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    node3:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
  7. If you are running Oracle Grid Infrastructure release 11.2, then re-enable the listener resource.

    Run the following commands on the replacement database server.

    # GRID_HOME/grid/bin/srvctl enable listener -l LISTENER \
      -n replacement_server
    
    # GRID_HOME/grid/bin/srvctl start listener -l LISTENER  \
      -n replacement_server
  8. Start the disk groups on the replacement server.
    1. Check disk group status.

      In the following example, notice that disk groups are offline on the replacement server.

      $ crsctl stat res -t
      --------------------------------------------------------------------------------
      Name           Target  State        Server                   State details       
      --------------------------------------------------------------------------------
      Local Resources
      --------------------------------------------------------------------------------
      ora.DATAC1.dg
                     ONLINE  ONLINE       node1              STABLE
                     OFFLINE OFFLINE      node2              STABLE
      ora.DBFS_DG.dg
                     ONLINE  ONLINE       node1              STABLE
                     ONLINE  ONLINE       node2              STABLE
      ora.LISTENER.lsnr
                     ONLINE  ONLINE       node1              STABLE
                     ONLINE  ONLINE       node2              STABLE
      ora.RECOC1.dg
                     ONLINE  ONLINE       node1              STABLE
                     OFFLINE OFFLINE      node2              STABLE
      
    2. For each offline disk group, run the START DISKGROUP command for each disk group that is offline from either the original server or the replacement server.
      $ srvctl start diskgroup -diskgroup dgname

2.14.9 Clone Oracle Database Homes to the Replacement Database Server

The following procedure describes how to clone the Oracle Database homes to the replacement server.

Run the commands from a working database server as the oracle user. When the root user is needed to run a command, it will be called out.

  1. Add the Oracle Database ORACLE_HOME to the replacement database server using the following commands:
    $ cd /u01/app/oracle/product/12.1.0.2/dbhome_1/addnode
    
    $ ./addnode.sh -silent "CLUSTER_NEW_NODES={replacement_server}"
    

    The second command causes Oracle Universal Installer to copy the Oracle Database software to the replacement database server.

    WARNING: The following configuration scripts need to be executed as the "root"
    user in each cluster node.
    /u01/app/oracle/product/12.1.0.2/dbhome_1/root.sh #On nodes dm01db01
    To execute the configuration scripts:
    Open a terminal window.
    Log in as root.
    Run the scripts on each cluster node.
     

    After the scripts are finished, the following messages appear:

    The Cluster Node Addition of /u01/app/oracle/product/12.1.0.2/dbhome_1 was successful.
    Please check '/tmp/silentInstall.log' for more details.
    
  2. Run the following script on the replacement database server:
    # /u01/app/oracle/product/12.1.0.2/dbhome_1/root.sh
     

    Check the /u01/app/orcale/product/12.1.0.2/dbhome_1/install/root_replacement_server.com_date.log file for the output of the script.

  3. Run the Oracle Database Configuration Assistant (DBCA) in interactive mode to add database instances to the target nodes.
    1. Start up DBCA.

      $ cd /u01/app/oracle/product/12.1.0.2/dbhome_1/bin
      
      $ ./dbca
    2. On the Database Operation screen, select Instance Management. Click Next.

    3. On the Instance Operation screen, select Add an instance. Click Next.

    4. On the Database List screen, select the cluster database to which you want to add an instance.

    5. The List Instance screen displays the current instances. Click Next to add a new instance.

    6. The Add Instance screen displays the default name and the newly added node to the cluster. Accept the defaults and click Next.

    7. On the Summary screen, verify the plan and click Finish.

    8. On the Progress screen, watch for 100% completion.

    9. On the Finish screen, acknowledge the confirmation that the new instance was successfully added.

    Verify that the instance has been added:

    $ srvctl config database -db dbm01
    

    Verify the administrative privileges on the target node:

    $ cd /u01/app/oracle/product/12.1.0.2/dbhome_1/bin
    
    $ ./cluvfy comp admprv -o db_config -d /u01/app/oracle/product/12.1.0.2/dbhome_1 -n new_node
  4. Ensure the instance parameters are set for the replaced database instance. The following is an example for the CLUSTER_INTERCONNECTS parameter.
    SQL> SHOW PARAMETER cluster_interconnects
    
    NAME                                 TYPE        VALUE
    ------------------------------       --------    -------------------------
    cluster_interconnects                string
     
    SQL> ALTER SYSTEM SET cluster_interconnects='192.168.73.90' SCOPE=spfile SID='dbm1';
    
  5. Validate the configuration files as follows:
    • The Oracle_home/dbs/initSID.ora file points to the SPFILE in the Oracle ASM shared storage.

    • The password file that is copied in the Oracle_home/dbs directory has been changed to orapwSID.

  6. Check that any services that incorporated this instance before and ensure the services are updated to include this replacement instance.
  7. If this procedure was performed on Oracle Exadata Database Machine Eighth Rack, then perform the procedure described in Configuring Oracle Exadata Database Machine Eighth Rack Oracle Linux Database Server After Recovery.

2.15 Changing Existing Elastic Configurations for Database Servers

Elastic configurations provide a flexible and efficient mechanism to change the server configuration of your Oracle Exadata Database Machine.

2.15.1 Adding a New Database Server to the Cluster

You can add a new database server to an existing Oracle Real Application Clusters (Oracle RAC) cluster running on Oracle Exadata Database Machine.

  1. Determine if the new database server needs to be re-imaged or upgraded.

    Check the image label of the database servers in the cluster to which you want to add the new database server.

  2. Add the database server to the cluster by completing the following tasks:
  3. Download and run the latest version of Oracle EXAchk to ensure that the resulting configuration implements the latest best practices for Oracle Exadata Database Machine.

2.15.2 Moving an Existing Database Server to a Different Cluster

You can repurpose an existing database server and move it to a different cluster within the same Oracle Exadata Rack.

  1. Remove the database server from the existing Oracle Real Application Clusters (Oracle RAC) cluster.
    1. Stop Oracle Grid Infrastructure on the database server.
      Grid_home/bin/crstl stop crs
      
    2. Remove the database server from the cluster by completing the steps in Remove the Database Server from the Cluster.
  2. Determine if the database server that is being repurposed needs to be reimaged.

    Check the image label of the existing database servers in the cluster to which you want to add the database server. If the image label of the database server being added does not match the image label of the existing database servers in the cluster, then reimage the database server being added. Complete the following tasks:

    If an upgrade is required, the upgrade may be performed using patchmgr. See Updating Exadata Software for details.

  3. Add the database server to the cluster.
  4. Download and run the latest version of Oracle EXAchk to ensure that the resulting configuration implements the latest best practices for Oracle Exadata Database Machine.

2.15.3 Dropping a Database Server from an Oracle RAC Cluster

You can remove a database server that is a member of an Oracle Real Application Clusters (Oracle RAC) cluster.

  1. Stop Oracle Grid Infrastructure on the database server to be removed.
    $ Grid_home/bin/crstl stop crs
    
  2. Remove the database server from the cluster by completing the steps in Remove the Database Server from the Cluster.
  3. Download and run the latest Oracle EXAchk to ensure that the resulting configuration implements the latest best practices for Oracle Exadata Database Machine.

2.16 Managing Quorum Disks for High Redundancy Disk Groups

This section contains the following subsections:

2.16.1 Overview of Quorum Disk Manager

The Quorum Disk Manager utility, introduced in Oracle Exadata release 12.1.2.3.0, helps you to manage the quorum disks.

The utility enables you to create an iSCSI quorum disk on two of the database nodes and store a voting file on those two quorum disks. These two additional voting files are used to meet the minimum requirement of five voting files for a high redundancy disk group. This feature is only applicable to Oracle Exadata racks that meet the following requirements:

  • The Oracle Exadata rack has fewer than five storage servers.

  • The Oracle Exadata rack has at least two database nodes.

  • The Oracle Exadata rack has at least one high redundancy disk group.

The feature allows for the voting files to be stored in a high redundancy disk group on Oracle Exadata racks with fewer than five storage servers due to the presence of two extra failure groups.

Without this feature, voting files are stored in a normal redundancy disk group on Exadata racks with fewer than five storage servers. This makes Oracle Grid Infrastructure vulnerable to a double partner storage server failure that results in the loss of the voting file quorum, in turn resulting in a complete cluster and database outage. Refer to My Oracle Support note 1339373.1 for restarting the clusterware and databases in this scenario.

The iSCSI quorum disk implementation has high availability because the IP addresses on ib0 and ib1 are highly available using RDS. The multipathing feature ensures the iSCSI quorum disk implementation will work seamlessly if a more flexible or isolated internal network configuration is implemented in the future.

Each iSCSI device shown in the figure below corresponds to a particular path to the iSCSI target. Each path corresponds to an InfiniBand port on the database node. For each multipath quorum disk device in an active–active system, there are two iSCSI devices, one for ib0 and the other for ib1.

Figure 2-1 Multipath Device Connects to Both iSCSI Devices in an Active-Active System

Description of Figure 2-1 follows
Description of "Figure 2-1 Multipath Device Connects to Both iSCSI Devices in an Active-Active System"

The feature is applicable to bare metal Oracle RAC clusters as well as Oracle VM Oracle RAC clusters. For Oracle VM Oracle RAC clusters, the quorum disk devices reside in the Oracle RAC cluster nodes which are Oracle VM user domains as shown in the following figure.

Figure 2-2 Quorum Disk Devices on Oracle VM Oracle RAC Cluster

Description of Figure 2-2 follows
Description of "Figure 2-2 Quorum Disk Devices on Oracle VM Oracle RAC Cluster"

Note that for pkey-enabled environments, the interfaces used for discovering the targets should be the pkey interfaces used for the Oracle Clusterware communication. These interfaces are listed using the following command:

Grid_home/bin/oifcfg getif | grep cluster_interconnect | awk '{print $1}'

The Quorum Disk Manager utility (quorumdiskmgr) is used to create and manage all the necessary components including the iSCSI configuration, the iSCSI targets, the iSCSI LUNs, and the iSCSI devices for implementing this feature.

2.16.1.1 Software Requirements for Quorum Disk Manager

To use this feature, the following releases are required:

  • Oracle Exadata software release 12.1.2.3.0 and above

  • Patch 23200778 for all Database homes

  • Oracle Grid Infrastructure 12.1.0.2.160119 with patches 22722476 and 22682752, or Oracle Grid Infrastructure 12.1.0.2.160419 and above

    Note that for new deployments, OEDA installs the patches automatically.

2.16.1.2 quorumdiskmgr Reference

The quorum disk manager utility (quorumdiskmgr) runs on each database server to enable you to create and manage iSCSI quorum disks on database servers. You use quorumdiskmgr to create, list, alter, and delete iSCSI quorum disks on database servers. The utility is installed on database servers when they are shipped.

This reference section contains the following topics:

2.16.1.2.1 Syntax for the Quorum Disk Manager Utility

The quorum disk manager utility is a command-line tool. It has the following syntax:

quorumdiskmgr --verb --object [--options] 

verb is an action performed on an object. It is one of: alter, create, delete, list.

object is an object on which the command performs an action.

options extend the use of a command combination to include additional parameters for the command.

When using the quorumdiskmgr utility, the following rules apply:

  • Verbs, objects, and options are case-sensitive except where explicitly stated.

  • Use the double quote character around the value of an option that includes spaces or punctuation.

2.16.1.2.2 quorumdiskmgr Objects
Object Description

config

The quorum disk configurations include the owner and group of the ASM instance to which the iSCSI quorum disks will be added, and the list of network interfaces through which local and remote iSCSI quorum disks will be discovered.

target

A target is an endpoint on each database server that waits for an iSCSI initiator to establish a session and provides required IO data transfer.

device

A device is an iSCSI device created by logging into a local or remote target.

2.16.1.2.3 Creating a Quorum Disk Configuration (--create --config)

The --create --config action creates a quorum disk configuration. The configuration must be created before any targets or devices can be created.

Syntax

quorumdiskmgr --create --config [--owner owner --group group] --network-iface-list network-iface-list

Parameters

The following table lists the parameters for the --create --config action:

Parameter Description

owner

Specifies the owner of the ASM instance to which the iSCSI quorum disks will be added. This is an optional parameter. The default value is grid.

group

Specifies the group of the ASM instance to which the iSCSI quorum disks will be added. This is an optional parameter. The default value is dba.

network-iface-list

Specifies the list of network interface names through which the local and remote targets will be discovered.

Example

quorumdiskmgr --create --config --owner=oracle --group=dba --network-iface-list="ib0, ib1"
2.16.1.2.4 Creating a Target (--create --target)

The --create --target action creates a target that can be accessed by database servers with an InfiniBand IP address in the specified InfiniBand IP address list and will be used to create devices that will be added to the specified ASM disk group.

After a target is created, its asm-disk-group, host-name, and size attributes cannot be changed.

Syntax

quorumdiskmgr --create --target --asm-disk-group asm_disk_group --visible-to infiniband_ip_list [--host-name host_name] [--size size]

Parameters

Parameter Description

asm-disk-group

Specifies the ASM disk group to which the device created from the target will be added. The value of asm-disk-group is not case-sensitive.

visible-to

Specifies a list of InfiniBand IP addresses. Database servers with an InfiniBand IP address in the list will have access to the target.

host-name

Specifies the host name of the database server on which quorumdiskmgr runs. The total length of asm-disk-group and host-name cannot exceed 26 characters. If the host name is too long, a shorter host name can be specified as long as a different host name is specified for each database server in the quarter rack.

This is an optional parameter. The default value is the host name of the database server on which quorumdiskmgr runs. The value of host-name is not case-sensitive.

size

Specifies the size of the target. This is an optional parameter. The default value is 128 MB.

Example

quorumdiskmgr --create --target --size=128MB --asm-disk-group=datac1 --visible-to="192.168.10.45, 192.168.10.46" --host-name=db01
2.16.1.2.5 Creating a Device (--create --device)

The --create --device action creates devices by discovering and logging into targets on database servers with an InfiniBand IP address in the specified list of IP addresses.

The created devices will be automatically discovered by the ASM instance with the owner and group specified during configuration creation.

Syntax

quorumdiskmgr --create --device --target-ip-list target_ip_list

Parameters

Parameter Description

target-ip-list

Specifies a list of InfiniBand IP addresses. quorumdiskmgr will discover targets on database servers with an InfiniBand IP address in the list and log into the targets to create devices.

Example

quorumdiskmgr --create --device --target-ip-list="192.168.10.45, 192.168.10.46"
2.16.1.2.6 Listing Quorum Disk Configurations (--list --config)

The --list --config action lists the quorum disk configurations.

Syntax

quorumdiskmgr --list --config

Sample Output

Owner: grid
Group: dba
ifaces: exadata_ib1 exadata_ib0
2.16.1.2.7 Listing Targets (--list --target)

The --list --target action lists the attributes of targets, including target name, size, host name, ASM disk group name, the list of IP addresses (a visible-to IP address list) indicating which database servers have access to the target, and the list of IP addresses (a discovered-by IP address list) indicating which database servers have logged into the target.

If an ASM disk group name is specified, the action lists all local targets created for the specified ASM disk group. Otherwise, the action lists all local targets created for quorum disks.

Syntax

quorumdiskmgr --list --target [--asm-disk-group asm_disk_group]

parameters

Parameter Description

asm-disk-group

Specifies the ASM disk group. quorumdiskmgr displays all local targets for this ASM disk group. The value of asm-disk-group is not case-sensitive.

Sample Output

Name: iqn.2015-05.com.oracle:QD_DATAC1_DB01 
Size: 128 MB 
Host name: DB01 
ASM disk group name: DATAC1 
Visible to: 192.168.10.48, 192.168.10.49, 192.168.10.46, 192.168.10.47 
Discovered by: 192.168.10.47, 192.168.10.46
2.16.1.2.8 Listing Devices (--list --device)

The --list --device action lists the attributes of devices, including device path, size, host name and ASM disk group name.

If only an ASM disk group name is specified, the action lists all the devices that have been added to the ASM disk group.

If only a host name is specified, the action lists all the devices created from the targets on the host.

If both an ASM disk group name and a host name are specified, the action lists a single device created from the target on the host and that has been added to the ASM disk group.

If neither an ASM disk group name nor a host name is specified, the action lists all quorum disk devices.

Syntax

quorumdiskmgr --list --device [--asm-disk-group asm_disk_group] [--host-name host_name]

parameters

Parameter Description

asm-disk-group

Specifies the ASM disk group to which devices have been added. The value of asm-disk-group is not case-sensitive.

host-name

Specifies the host name of the database server from whose targets devices are created. The value of host-name is not case-sensitive.

Sample Output

Device path: /dev/exadata_quorum/QD_DATAC1_DB01 
Size: 128 MB 
Host name: DB01 
ASM disk group name: DATAC1 

Device path: /dev/exadata_quorum/QD_DATAC1_DB02 
Size: 128 MB 
Host name: DB02
ASM disk group name: DATAC1
2.16.1.2.9 Deleting Configurations (--delete --config)

The --delete --config action deletes quorum disk configurations. The configurations can only be deleted when there are no targets or devices present.

Syntax

quorumdiskmgr --delete --config

2.16.1.2.10 Deleting Targets (--delete --target)

The --delete --target action deletes the targets created for quorum disks on database servers.

If an ASM disk group name is specified, the action deletes all local targets created for the specified ASM disk group. Otherwise, the action deletes all local targets created for quorum disks.

Syntax

quorumdiskmgr --delete --target [--asm-disk-group asm_disk_group]

Parameters

Parameter Description

asm-disk-group

Specifies the ASM disk group. Local targets created for this disk group will be deleted. The value of asm-disk-group is not case-sensitive.

Example

quorumdiskmgr --delete --target --asm-disk-group=datac1
2.16.1.2.11 Deleting Devices (--delete --device)

The --delete --device action deletes quorum disk devices.

If only an ASM disk group name is specified, the action deletes all the devices that have been added to the ASM disk group.

If only a host name is specified, the action deletes all the devices created from the targets on the host.

If both an ASM disk group name and a host name are specified, the action deletes a single device created from the target on the host and that has been added to the ASM disk group.

If neither an ASM disk group name nor a host name is specified, the action deletes all quorum disk devices.

Syntax

quorumdiskmgr --delete --device [--asm-disk-group asm_disk_group] [--host-name host_name]

Parameters

Parameter Description

asm-disk-group

Specifies the ASM disk group whose device you want to delete. The value of asm-disk-group is not case-sensitive.

host-name

Specifies the host name of the database server. Devices created from targets on this host will be deleted. The value of host-name is not case-sensitive.

Example

quorumdiskmgr --delete --device --host-name=db01
2.16.1.2.12 Changing Owner and Group Values (--alter --config)

The --alter --config action changes the owner and group configurations.

Syntax

quorumdiskmgr --alter --config --owner owner --group group

Parameters

Parameter Description

owner

Specifies the new owner for the quorum disk configuration. This parameter is optional. If not specified, the owner is unchanged.

group

Specifies the new group for the quorum disk configuration. This parameter is optional. If not specified, the group is unchanged.

Example

quorumdiskmgr --alter --config --owner=grid --group=dba
2.16.1.2.13 Changing the InfiniBand IP Addresses (--alter --target)

The --alter --target action changes the InfiniBand IP addresses of the database servers that have access to the local target created for the specified ASM disk group.

Syntax

quorumdiskmgr --alter --target --asm-disk-group asm_disk_group --visible-to infiniband_ip_list

Parameters

Parameter Description

asm-disk-group

Specifies the ASM disk group to which the device created from the target will be added. The value of asm-disk-group is not case-sensitive.

visible-to

Specifies a list of InfiniBand IP addresses. Database servers with an InfiniBand IP address in the list will have access to the target.

Example

quorumdiskmgr --alter --target --asm-disk-group=datac1 --visible-to="192.168.10.45, 192.168.10.47"

2.16.2 Add Quorum Disks to Database Nodes

You can add quorum disks to database nodes on an Oracle Exadata rack that contains a high redundancy disk group with fewer than 5 storage servers.

The example in this section creates quorum disks for a quarter rack with two database nodes: db01 and db02.

This is an active-active system. On both db01 and db02 there are two InfiniBand ports: ib0 and ib1.

The network interfaces to be used for communication with the iSCSI devices can be found using the following command:

$ oifcfg getif | grep cluster_interconnect | awk '{print $1}'

The IP address of each interface can be found using the following command:

ip addr show interface_name

The InfiniBand IP addresses for this example are as follows:

On db01:

  • Network interface: ib0, IP address: 192.168.10.45
  • Network interface: ib1, IP address: 192.168.10.46

On db02:

  • Network interface: ib0, IP address: 192.168.10.47
  • Network interface: ib1, IP address: 192.168.10.48

The Oracle ASM disk group to which the quorum disks will be added is DATAC1. The Oracle ASM owner is grid, and the user group is dba.

Initially, the voting files reside on a normal redundancy disk group RECOC1:

$ Grid_home/bin/crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   21f5507a28934f77bf3b7ecf88b26c47 (o/192.168.76.187;192.168.76.188/RECOC1_CD_00_celadm12) [RECOC1]
 2. ONLINE   387f71ee81f14f38bfbdf0693451e328 (o/192.168.76.189;192.168.76.190/RECOC1_CD_00_celadm13) [RECOC1]
 3. ONLINE   6f7fab62e6054fb8bf167108cdbd2f64 (o/192.168.76.191;192.168.76.192/RECOC1_CD_00_celadm14) [RECOC1]
Located 3 voting disk(s).
  1. Log into db01 and db02 as root.
  2. Run the quorumdiskmgr command with the --create --config options to create quorum disk configurations on both db01 and db02.
    # /opt/oracle.SupportTools/quorumdiskmgr --create --config --owner=grid --group=dba --network-iface-list="ib0, ib1"
    
  3. Run the quorumdiskmgr command with the --list --config options to verify that the configurations have been successfully created on both db01 and db02.
    # /opt/oracle.SupportTools/quorumdiskmgr --list --config
    
    Your output should resemble one of the following:
    • For Oracle Exadata System Software release 18.x or earlier, the output should look like this:

      Owner: grid
      Group: dba
      ifaces: exadata_ib1 exadata_ib0
      
    • If you have upgraded to Oracle Exadata System Software release 19.1.0 (or later) from an earlier release, then the output should look like this:

      Owner: grid 
      Group: dba 
      ifaces: exadata_ib0 
      Initiatior name: iqn.1988-12.com.oracle:da96db61f86a
    • If you have a system that was imaged with Oracle Exadata System Software release 19.1.0 or later (not upgraded), then the Initiator name in the output shown above is followed by the IP address of the first interface defined during the --create -config command.
  4. Run the quorumdiskmgr command with the --create --target options to create a target on both db01 and db02 for Oracle ASM disk group DATAC1 and make the target visible to both db01 and db02.
    # /opt/oracle.SupportTools/quorumdiskmgr --create --target --asm-disk-group=datac1 
    --visible-to="192.168.10.45, 192.168.10.46, 192.168.10.47, 192.168.10.48"
    
  5. Run the quorumdiskmgr command with the --list --target options to verify the target has been successfully created on both db01 and db02.
    # /opt/oracle.SupportTools/quorumdiskmgr --list --target
    
    1. If you are running Oracle Exadata System Software release 18.x or earlier, then:

      On db01, the output should look like:

      Name: iqn.2015-05.com.oracle:QD_DATAC1_DB01 
      Size: 128 MB 
      Host name: DB01
      ASM disk group name: DATAC1 
      Visible to: 192.168.10.45, 192.168.10.46, 192.168.10.47, 192.168.10.48
      Discovered by:
      

      On db02, the output should be:

      Name: iqn.2015-05.com.oracle:QD_DATAC1_DB02 
      Size: 128 MB 
      Host name: DB02
      ASM disk group name: DATAC1 
      Visible to: 192.168.10.45, 192.168.10.46, 192.168.10.47, 192.168.10.48
      Discovered by:
      
    2. If you are running Oracle Exadata System Software release 19.x or later, then:

      On db01, the output should look like:

      
      Name: iqn.2015-05.com.oracle:QD_DATAC1_DB01 
      Size: 128 MB 
      Host name: DB01
      ASM disk group name: DATAC1 
      Visible  to: 192.168.10.45, 192.168.10.46, 192.168.10.47, 
      192.168.10.48, iqn.1988-12.com.oracle:ee657eb81b53, 
      iqn.1988-12.com.oracle:db357ba82b24
          

      On db02, the output should be:

      Name: iqn.2015-05.com.oracle:QD_DATAC1_DB02
      Size: 128 MB
      Host name: DB02
      ASM disk group name: DATAC1
      Visible to: 192.168.10.45, 192.168.10.46, 192.168.10.47, 
      192.168.10.48, iqn.1988-12.com.oracle:ee657eb81b53,
      iqn.1988-12.com.oracle:db357ba82b24

      The display shows both IP addresses and initiator names in the Visible to: list only if you have a system that was upgraded from a release older than Oracle Exadata System Software release 19.1.0. Otherwise, the Visible to: list shows only IP addresses in it.

  6. Run the quorumdiskmgr command with the --create --device options to create devices on both db01 and db02 from targets on both db01 and db02.
    # /opt/oracle.SupportTools/quorumdiskmgr --create --device --target-ip-list="192.168.10.45, 192.168.10.46,
     192.168.10.47, 192.168.10.48"
    
  7. Run the quorumdiskmgr command with the --list --device options to verify the devices have been successfully created on both db01 and db02.
    # /opt/oracle.SupportTools/quorumdiskmgr --list --device
    

    On both db01 and db02, the output should look like:

    Device path: /dev/exadata_quorum/QD_DATAC1_DB01 
    Size: 128 MB 
    Host name: DB01
    ASM disk group name: DATAC1 
    
    Device path: /dev/exadata_quorum/QD_DATAC1_DB02 
    Size: 128 MB 
    Host name: DB02
    ASM disk group name: DATAC1
    
  8. Switch to the grid user on either db01 or db02.
  9. Set up Oracle ASM environments.
  10. Alter the asm_diskstring initialization parameter and add /dev/exadata_quorum/* to the existing string
    SQL> alter system set asm_diskstring='o/*/DATAC1_*','o/*/RECOC1_*','/dev/exadata_quorum/*' scope=both sid='*';
    
  11. Verify the two quorum disk devices have been automatically discovered by Oracle ASM.
    SQL> set linesize 200
    SQL> col path format a50
    SQL> select inst_id, label, path, mode_status, header_status
    from gv$asm_disk where path like '/dev/exadata_quorum/%';
    

    The output should look like:

    INST_ID LABEL          PATH                                MODE_STATUS HEADER_STATUS
    ------- -------------- ----------------------------------  ----------- ---------
          1 QD_DATAC1_DB01 /dev/exadata_quorum/QD_DATAC1_DB01  ONLINE      CANDIDATE
          1 QD_DATAC1_DB02 /dev/exadata_quorum/QD_DATAC1_DB02  ONLINE      CANDIDATE
          2 QD_DATAC1_DB01 /dev/exadata_quorum/QD_DATAC1_DB01  ONLINE      CANDIDATE
          2 QD_DATAC1_DB02 /dev/exadata_quorum/QD_DATAC1_DB02  ONLINE      CANDIDATE
    
  12. Add the two quorum disk devices to a high redundancy Oracle ASM disk group.

    If there is no high redundancy disk group, create a high redundancy disk group and include the two new quorum disks. For example:

    SQL> CREATE DISKGROUP DATAC1 HIGH REDUNDANCY ADD QUORUM FAILGROUP db01 DISK '/dev/exadata_quorum/QD_ DATAC1_DB01' 
    QUORUM FAILGROUP db02 DISK '/dev/exadata_quorum/QD_ DATAC1_DB02' ...
    

    If a high redundancy disk group already exists, add the two new quorum disks. For example:

    SQL> ALTER DISKGROUP datac1 ADD QUORUM FAILGROUP db01 DISK '/dev/exadata_quorum/QD_DATAC1_DB01' 
    QUORUM FAILGROUP db02 DISK '/dev/exadata_quorum/QD_DATAC1_DB02';
    
  13. Relocate the existing voting files from the normal redundancy disk group to the high redundancy disk group.
    $ Grid_home/bin/crsctl replace votedisk +DATAC1
    
  14. Verify the voting disks have been successfully relocated to the high redundancy disk group and that five voting files exist.
    crsctl query css votedisk
    

    The output should show 3 voting disks from storage servers and 2 voting disks from database nodes:

    ## STATE File Universal Id File Name Disk group
    -- ----- ----------------- --------- ---------
    1. ONLINE ca2f1b57873f4ff4bf1dfb78824f2912 (o/192.168.10.42/DATAC1_CD_09_celadm12) [DATAC1]
    2. ONLINE a8c3609a3dd44f53bf17c89429c6ebe6 (o/192.168.10.43/DATAC1_CD_09_celadm13) [DATAC1]
    3. ONLINE cafb7e95a5be4f00bf10bc094469cad9 (o/192.168.10.44/DATAC1_CD_09_celadm14) [DATAC1]
    4. ONLINE 4dca8fb7bd594f6ebf8321ac23e53434 (/dev/exadata_quorum/QD_ DATAC1_DB01) [DATAC1]
    5. ONLINE 4948b73db0514f47bf94ee53b98fdb51 (/dev/exadata_quorum/QD_ DATAC1_DB02) [DATAC1]
    Located 5 voting disk(s).
    
  15. Move the Oracle ASM password file and the Oracle ASM SPFILE to the high redundancy disk group.
    1. Move the Oracle ASM password file:

      i) Get the source Oracle ASM password file location.

      $ asmcmd pwget --asm
      

      ii) Move the Oracle ASM password file to the high redundancy disk group.

      $ asmcmd pwmove --asm full_path_of_source_file full_path_of_destination_file
      

      Example:

      asmcmd pwmove --asm +recoc1/ASM/PASSWORD/pwdasm.256.898960531 +datac1/asmpwdfile
      
    2. Move the Oracle ASM SPFILE.

      i) Get the Oracle ASM SPFILE in use:

      $ asmcmd spget
      

      ii) Copy the Oracle ASM SPFILE to the high redundancy disk group.

      $ asmcmd spcopy full_path_of_source_file full_path_of_destination_file
      

      iii) Modify the Oracle Grid Infrastructure configuration to use the relocated SPFILE upon next restart.

      $ asmcmd spset full_path_of_destination_file
      

      The above commands should run from any one Oracle VM cluster node for Oracle RAC.

      At this point if a downtime can be afforded, restart Oracle Grid Infrastructure using the commands below.

      # Grid_home/bin/crsctl stop crs
      # Grid_home/bin/crsctl start crs
      

      If a downtime is not permitted, repeat step 15.b every time an initialization parameter modification to the Oracle ASM SPFILE is required until Oracle Grid Infrastructure is restarted.

  16. Relocate the MGMTDB to the high redundancy disk group.

    Move the MGMTDB (if running) to the high redundancy disk group using How to Move/Recreate GI Management Repository to Different Shared Storage (Diskgroup, CFS or NFS etc) (Doc ID 1589394.1).

    Configure the MGMTDB to not use hugepages using the steps below:

    export ORACLE_SID=-MGMTDB
    export ORACLE_HOME=$GRID_HOME
    sqlplus ”sys as sysdba”
    SQL> alter system set use_large_pages=false scope=spfile  sid='*';
    
  17. These steps are optional.
    1. Restart Oracle Grid Infrastructure.
      # Grid_home/bin/crsctl stop crs
      # Grid_home/bin/crsctl start crs
    2. Convert the normal redundancy disk group to a high redundancy disk group.

2.16.3 Recreate Quorum Disks

In certain circumstances, you might need to recreate a quorum disk.

Some examples of when you might need to recreate a quorum disk are:
  • When recreating a guest domU

  • If you deleted the quorum disks without first dropping the quorum disks from the Oracle ASM disk group

  1. Force drop the lost quorum disk.
    ALTER DISKGROUP dg_name DROP QUORUM DISK disk_name FORCE;
  2. Follow the instructions in "Adding Quorum Disks to Database Nodes" to add a new quorum disk.

2.16.4 Use Cases

2.16.4.1 New Deployments on Oracle Exadata 12.1.2.3.0 or Later

For new deployments on Oracle Exadata release 12.1.2.3.0 and above, OEDA implements this feature by default when all of the following requirements are satisfied:

  • The system has at least two database nodes and fewer than five storage servers.

  • You are running OEDA release February 2016 or later.

  • You meet the software requirements listed in Software Requirements for Quorum Disk Manager.

  • Oracle Database is 11.2.0.4 and above.

  • The system has at least one high redundancy disk group.

If the system has three storage servers in place, then two quorum disks will be created on the first two database nodes of the cluster picked by OEDA.

If the system has four storage servers in place, then one quorum disk will be created on the first database node picked by OEDA.

2.16.4.2 Upgrading to Oracle Exadata Release 12.1.2.3.0 or Later

If the target Exadata system has fewer than five storage servers, at least one high redundancy disk group, and two or more database nodes, you can implement this feature manually using quorumdiskmgr.

2.16.4.3 Downgrading to a Pre-12.1.2.3.0 Oracle Exadata Release

Rolling back to a pre-12.1.2.3.0 Oracle Exadata release, which does not support quorum disks, from a release that supports quorum disks, which is any release 12.1.2.3.0 and later, requires quorum disk configuration to be removed if the environment has quorum disk implementation in place. You need to remove the quorum disk configuration before performing the Exadata software rollback.

To remove quorum disk configuration, perform these steps:

  1. Ensure there is at least one normal redundancy disk group in place. If not, create one.

  2. Relocate the voting files to a normal redundancy disk group:

    $GI_HOME/bin/crsctl replace votedisk +normal_redundancy_diskgroup
    
  3. Drop the quorum disks from ASM. Run the following command for each quorum disk:

    SQL> alter diskgroup diskgroup_name drop quorum disk quorum_disk_name force;
    

    Wait for the rebalance operation to complete. You can tell it is complete when v$asm_operation returns no rows for the disk group.

  4. Delete the quorum devices. Run the following command from each database node that has quorum disks in place:

    /opt/oracle.SupportTools/quorumdiskmgr --delete --device [--asm-disk-group asm_disk_group] [--host-name host_name]
    
  5. Delete the targets. Run the following command from each database node that has quorum disks in place:

    /opt/oracle.SupportTools/quorumdiskmgr --delete --target [--asm-disk-group asm_disk_group]
    
  6. Delete the configuration. Run the following command from each database node that has quorum disks in place:

    /opt/oracle.SupportTools/quorumdiskmgr --delete –config
    
2.16.4.4 Changing Elastic Configurations
2.16.4.4.1 Adding a Database Node

If the existing Oracle RAC cluster has fewer than two database nodes and fewer than five storage servers, and the voting files are not stored in a high redundancy disk group, then Oracle recommends adding quorum disks to the database node(s) and relocating the voting files to a high redundancy disk group.

Note:

The requirements listed in "Software Requirements for Quorum Disk Manager" must be met.

If the existing Oracle RAC cluster already has quorum disks in place, the quorum disks need to be made visible to the newly added node prior to adding the node to the Oracle RAC cluster using the addnode.sh procedure. Follow the steps below for making the quorum disks visible to the node being added:

  1. Login to the 2 database nodes that contain the quorum devices and retrieve the quorum disk ISCSI target configuration.
    /opt/oracle.SupportTools/quorumdiskmgr --list --target

    The output of this command should be similar to the folllowing (where the hostname is db01 and the diskgroup_name is DATA):

    Name: iqn.2015-05.com.oracle:QD_DATA_DB01
    Host name: DB01
    ASM disk group name: DATA
    Size: 128 MB
    Visible to: IP_address1, IP_address2, IP_address3, IP_address4... IP_address2n
    Discovered by: IP_address1, IP_address2, IP_address3, IP_address4
    

    IP_address1, IP_address2, IP_address3, IP_address4IP_address2n above refer to the IP addresses of the InfiniBand interfaces of all the existing cluster nodes. In the example above, the number of number of nodes in the cluster is n.

  2. Login to the 2 database nodes that contain the quorum devices and modify the target in each of them to make it visible to the node being added.
    /opt/oracle.SupportTools/quorumdiskmgr --alter --target --asm-disk-group asm_diskgroupname --visible-to 'comma_delimited_list_of_IP_addresses_from_visibleToList_in_step_2_above, IP_addressX, IP_addressY'

    IP_addressX and IP_addressY in the previous command refer to the IP addresses of the 2 InfiniBand interfaces of the node being added.

  3. Run /opt/oracle.SupportTools/quorumdiskmgr --list –target in the 2 database nodes that contain the quorum devices and make sure the 2 IP addresses of the node being added are seen in the Visible to list.
  4. Log in as the root user on the node being added.
  5. Run the quorumdiskmgr command with the --create --config option to create quorum disk configurations.
    # /opt/oracle.SupportTools/quorumdiskmgr --create --config --owner=grid --group=dba --network-iface-list="ib0, ib1"
  6. Run the quorumdiskmgr command with the --list --config option to verify that the configurations have been successfully created on the node.
    # /opt/oracle.SupportTools/quorumdiskmgr --list --config
    Owner: grid
    Group: oinstall
    ifaces: exadata_ib1 exadata_ib0
    
  7. Run the quorumdiskmgr command with the --create --device option to create the quorum devices on the node being added pointing to the targets for the existing quorum devices.
    # /opt/oracle.SupportTools/quorumdiskmgr --create --device --target-ip-list='comma_delimited_list_of_IP_addresses_from_step_3_above'
  8. Run the quorumdiskmgr command with the --list --device options to verify the existing quorum devices have been successfully discovered and are visible on the node being added.
    # /opt/oracle.SupportTools/quorumdiskmgr --list –device

    On the newly added node the output should be similar to the following and it should be the same as in any of the existing cluster nodes :

    Device path: /dev/exadata_quorum/QD_DATAC1_DB01 
    Size: 128 MB 
    Host name: DB01
    ASM disk group name: DATA 
    Device path: /dev/exadata_quorum/QD_DATAC1_DB02 
    Size: 128 MB 
    Host name: DB02
    ASM disk group name: DATA
2.16.4.4.2 Removing a Database Node

If the database node being removed did not host a quorum disk, then no action is required.

If database node being removed hosted a quorum disk containing a voting file and there are fewer than five storage servers in the RAC cluster, then a quorum disk must be created on a different database node before the database node is removed. Follow these steps:

  1. Create a quorum disk on a database node that does not currently host a quorum disk.

    1. Log into db01 and db02 as root.

    2. Run the quorumdiskmgr command with the --create --config options to create quorum disk configurations on both db01 and db02.

      # /opt/oracle.SupportTools/quorumdiskmgr --create --config --owner=grid
       --group=dba --network-iface-list="ib0, ib1"
      
    3. Run the quorumdiskmgr command with the --list --config options to verify that the configurations have been successfully created on both db01 and db02.

      # /opt/oracle.SupportTools/quorumdiskmgr --list --config
      

      The output should look like:

      Owner: grid
      Group: dba
      ifaces: exadata_ib1 exadata_ib0
      
    4. Run the quorumdiskmgr command with the --create --target options to create a target on both db01 and db02 for ASM disk group DATAC1 and make the target visible to both db01 and db02.

      # /opt/oracle.SupportTools/quorumdiskmgr --create --target
       --asm-disk-group=datac1
       --visible-to="192.168.10.45, 192.168.10.46, 192.168.10.47, 192.168.10.48"
      
    5. Run the quorumdiskmgr command with the --list --target options to verify the target has been successfully created on both db01 and db02.

      # /opt/oracle.SupportTools/quorumdiskmgr --list --target
      

      On db01, the output should look like:

      Name: iqn.2015-05.com.oracle:QD_DATAC1_DB01 
      Size: 128 MB 
      Host name: DB01
      ASM disk group name: DATAC1 
      Visible to: 192.168.10.45, 192.168.10.46, 192.168.10.47, 192.168.10.48
      Discovered by:
      

      On db02, the output should be:

      Name: iqn.2015-05.com.oracle:QD_DATAC1_DB02 
      Size: 128 MB 
      Host name: DB02
      ASM disk group name: DATAC1 
      Visible to: 192.168.10.45, 192.168.10.46, 192.168.10.47, 192.168.10.48
      Discovered by:
      
    6. Run the quorumdiskmgr command with the --create --device options to create devices on both db01 and db02 from targets on both db01 and db02.

      # /opt/oracle.SupportTools/quorumdiskmgr --create --device
       --target-ip-list="192.168.10.45, 192.168.10.46, 192.168.10.47, 192.168.10.48"
      
    7. Run the quorumdiskmgr command with the --list --device options to verify the devices have been successfully created on both db01 and db02.

      # /opt/oracle.SupportTools/quorumdiskmgr --list --device
      

      On both db01 and db02, the output should look like:

      Device path: /dev/exadata_quorum/QD_DATAC1_DB01 
      Size: 128 MB 
      Host name: DB01
      ASM disk group name: DATAC1 
      Device path: /dev/exadata_quorum/QD_DATAC1_DB02 
      Size: 128 MB 
      Host name: DB02
      ASM disk group name: DATAC1
      
    8. Add the two quorum disk devices to a high redundancy ASM disk group.

      If there is no high redundancy disk group, create a high redundancy disk group and include the two new quorum disks. For example:

      SQL> create diskgroup DATAC1 high redundancy quorum failgroup db01 disk '/dev/exadata_quorum/QD_ DATAC1_DB01' quorum failgroup db02 disk '/dev/exadata_quorum/QD_ DATAC1_DB02' ...
      

      If a high redundancy disk group already exists, add the two new quorum disks. For example:

      SQL> alter diskgroup datac1 add quorum failgroup db01 disk '/dev/exadata_quorum/QD_DATAC1_DB02' quorum failgroup db02 disk '/dev/exadata_quorum/QD_DATAC1_DB01';
      
  2. Once the database node is removed, its voting file will get relocated automatically to the quorum disk added in step 1.

2.16.4.4.3 Adding an Oracle Exadata Storage Server and Expanding an Existing High Redundancy Disk Group

When you add a storage server that uses quorum disks, Oracle recommends relocating a voting file from a database node to the newly added storage server.

  1. Add the Exadata storage server. See Adding a Cell Node for details.

    In the example below, the new storage server added is called "celadm04".

  2. After the storage server is added, verify the new fail group from v$asm_disk.

    SQL> select distinct failgroup from v$asm_disk;
    FAILGROUP
    ------------------------------
    ADM01
    ADM02
    CELADM01
    CELADM02
    CELADM03
    CELADM04
    
  3. Verify at least one database node has a quorum disk containing a voting file.

    $ crsctl query css votedisk
    ##  STATE    File Universal Id                File Name Disk group
    --  -----    -----------------                --------- ---------
     1. ONLINE   834ee5a8f5054f12bf47210c51ecb8f4 (o/192.168.12.125;192.168.12.126/DATAC5_CD_00_celadm01) [DATAC5]
     2. ONLINE   f4af2213d9964f0bbfa30b2ba711b475 (o/192.168.12.127;192.168.12.128/DATAC5_CD_00_celadm02) [DATAC5]
     3. ONLINE   ed61778df2964f37bf1d53ea03cd7173 (o/192.168.12.129;192.168.12.130/DATAC5_CD_00_celadm03) [DATAC5]
     4. ONLINE   bfe1c3aa91334f16bf78ee7d33ad77e0 (/dev/exadata_quorum/QD_DATAC5_ADM01) [DATAC5]
     5. ONLINE   a3a56e7145694f75bf21751520b226ef (/dev/exadata_quorum/QD_DATAC5_ADM02) [DATAC5]
    Located 5 voting disk(s).
    

    The example above shows there are two quorum disks with voting files on two database nodes.

  4. Drop one of the quorum disks.

    SQL> alter diskgroup datac5 drop quorum disk QD_DATAC5_ADM01;
    

    The voting file on the dropped quorum disk will be relocated automatically to the newly added storage server by the Grid Infrastructure as part of the voting file refresh. You can verify this as follows:

    $ crsctl query css votedisk
    ##  STATE    File Universal Id                File Name Disk group
    --  -----    -----------------                --------- ---------
     1. ONLINE   834ee5a8f5054f12bf47210c51ecb8f4 (o/192.168.12.125;192.168.12.126/DATAC5_CD_00_celadm01) [DATAC5]
     2. ONLINE   f4af2213d9964f0bbfa30b2ba711b475 (o/192.168.12.127;192.168.12.128/DATAC5_CD_00_celadm02) [DATAC5]
     3. ONLINE   ed61778df2964f37bf1d53ea03cd7173 (o/192.168.12.129;192.168.12.130/DATAC5_CD_00_celadm03) [DATAC5]
     4. ONLINE   a3a56e7145694f75bf21751520b226ef (/dev/exadata_quorum/QD_DATAC5_ADM02) [DATAC5]
     5. ONLINE   ab5aefd60cf84fe9bff6541b16e33787 (o/192.168.12.131;192.168.12.132/DATAC5_CD_00_celadm04) [DATAC5]
    
2.16.4.4.4 Removing an Oracle Exadata Storage Server

If removing a storage server results in the number of storage servers being used by the Oracle RAC cluster to fewer than five, and the voting files reside in a high redundancy disk group, then Oracle recommends adding quorum disks to the database nodes, if not in place already.

Prior to removing the storage server, add the quorum disks so that five copies of the voting files are available immediately after removing the storage server.

2.16.5 Reconfigure Quorum Disk After Restoring a Database Server

After restoring a database server, lvdisplay shows the quorum disk was not restored.

When you restore a database server, Exadata image rescue mode restores the layout of disks and file systems, with the exception of custom partitions, including quorum disks. These files must be recreated after being restored from backup.

The logical volumes created for quorum disks are in /dev/VGExaDb and have the name-prefix LVDbVd*.

  1. Using the configuration backed up under /etc/lvm/archive, make a logical volume (LV) for the quorum disk on every node.

    For example, you would use a command similar to the following, but using values from the backup configuration information.

    # lvcreate -L 128MB -n <LVName> VGExaDb
  2. Reboot all database servers.
  3. After the server restarts, verify the quorum disks were restored.
    # /opt/oracle.SupportTools/quorumdiskmgr --list --config
    Owner: grid
    Group: dba
    ifaces: exadata_ib1 exadata_ib0
    
    # /opt/oracle.SupportTools/quorumdiskmgr --list --target
    Name: iqn.2015-05.com.oracle:QD_DATAC1_DB01
    Host name: DB01
    ASM disk group name: DATAC1
    Size: 128 MB
    Visible to: 192.168.10.45, 192.168.10.46
    Discovered by: 192.168.10.45, 192.168.10.46
    
    # /opt/oracle.SupportTools/quorumdiskmgr --list --device
    Device path: /dev/exadata_quorum/QD_DATAC1_DB01
    Host name: DB01
    ASM disk group name: DATAC1
    Size: 128 MB
    
    Device path: /dev/exadata_quorum/QD_DATAL1_DB01
    Host name: DB01
    ASM disk group name: DATAC1
    Size: 128 MB
  4. Query the voting disks for the cluster to see if all voting disks are available.
    # crsctl query css votedisk
    ##  STATE    File Universal Id                
      File Name                               Disk group
    --  -----    -----------------                
      ------------------------------------    -----------
     1. ONLINE   ca2f1b57873f4ff4bf1dfb78824f2912 
      (o/192.168.10.42/DATAC1_CD_09_celadm12) [DATAC1]
     2. ONLINE   a8c3609a3dd44f53bf17c89429c6ebe6 
    (o/192.168.10.43/DATAC1_CD_09_celadm13)   [DATAC1]
     3. ONLINE   4948b73db0514f47bf94ee53b98fdb51  
    (/dev/exadata_quorum/QD_ DATAC1_DB02) [DATAC1]
     4. ONLINE   cafb7e95a5be4f00bf10bc094469cad9  
    (o/192.168.10.44/DATAC1_CD_09_celadm14) [DATAC1]
    Located 4 voting disk(s).

    Notice that there is one voting disk missing, for the recovered database server (DB01). If you query V$ASM_DISK, you can see that the quorum disk was offlined by the recovery process.

    SQL> set line 200
     col LABEL for a20
     col path for a30
     col mode_status for a20
     col header_status for a30
     SELECT label, path, mode_status, header_status, mount_status 
     FROM v$asm_disk
     WHERE path LIKE '/dev/%';
    
    LABEL                PATH                           MODE_STATUS          
    HEADER_STATUS                  MOUNT_S
    -------------------- ------------------------------ --------------------
    ------------------------------ -------
    QD_DATAC1_DB01       /dev/exadata_quorum/QD_DATAC1_ ONLINE              
    CANDIDATE                      CLOSED
    
    QD_DATAC1_DB02       /dev/exadata_quorum/QD_DATAC1_ ONLINE              
    MEMBER                         CACHED
  5. Drop the unavailable quorum disk from the Oracle ASM disk group using the FORCE option.
    SQL> alter diskgroup DATA_C1 drop quorum disk QD_DATAC1_DB01 force;
  6. Add the same quorum disk to the Oracle ASM disk group.
    SQL> alter diskgroup DATA_C1 add quorum failgroup DB01 disk '
    /dev/exadata_quorum/QD_DATAC1_DB01';
  7. Requery V$ASM_DISK to verify both quorum disks are available.
    SQL> SELECT label, path, mode_status, header_status, mount_status 
     FROM v$asm_disk
     WHERE path LIKE '/dev/%';
    
    LABEL                PATH                           MODE_STATUS          
    HEADER_STATUS                  MOUNT_S
    -------------------- ------------------------------ --------------------
    ------------------------------ -------
    QD_DATAC1_DB01       /dev/exadata_quorum/QD_DATAC1_ ONLINE              
    MEMBER                         CACHED
    
    QD_DATAC1_DB02       /dev/exadata_quorum/QD_DATAC1_ ONLINE              
    MEMBER                         CACHED
  8. Query the voting disks for the cluster to verify all voting disks are now available.
    # crsctl query css votedisk
    ##  STATE    File Universal Id                
      File Name                               Disk group
    --  -----    -----------------                
      ------------------------------------    -----------
     1. ONLINE   ca2f1b57873f4ff4bf1dfb78824f2912 
      (o/192.168.10.42/DATAC1_CD_09_celadm12) [DATAC1]
     2. ONLINE   a8c3609a3dd44f53bf17c89429c6ebe6 
    (o/192.168.10.43/DATAC1_CD_09_celadm13)   [DATAC1]
     3. ONLINE   4948b73db0514f47bf94ee53b98fdb51  
    (/dev/exadata_quorum/QD_ DATAC1_DB02) [DATAC1]
     4. ONLINE   cafb7e95a5be4f00bf10bc094469cad9  
    (o/192.168.10.44/DATAC1_CD_09_celadm14) [DATAC1]
     5. ONLINE   4dca8fb7bd594f6ebf8321ac23e53434  
    (/dev/exadata_quorum/QD_ DATAC1_DB01) [DATAC1]
    Located 5 voting disk(s).

2.17 Using vmetrics

The vmetrics package enables you to display system statistics gathered by the vmetrics service. You can access the system statistics from dom0 or domU. The vmetrics service runs on dom0, collects the statistics, and pushes them to the xenstore. This allows the domU's to access the statistics.

System statistics collected by the vmetrics service are shown below, with sample values:

com.sap.host.host.VirtualizationVendor=Oracle Corporation;

com.sap.host.host.VirtProductInfo=Oracle VM 3;

com.sap.host.host.PagedInMemory=0;

com.sap.host.host.PagedOutMemory=0;

com.sap.host.host.PageRates=0;

com.sap.vm.vm.uuid=2b80522b-060d-47ee-8209-2ab65778eb7e;

com.sap.host.host.HostName=scac10adm01.example.com;

com.sap.host.host.HostSystemInfo=scac10adm01;

com.sap.host.host.NumberOfPhysicalCPUs=24;

com.sap.host.host.NumCPUs=4;

com.sap.host.host.TotalPhyMem=98295;

com.sap.host.host.UsedVirtualMemory=2577;

com.sap.host.host.MemoryAllocatedToVirtualServers=2577;

com.sap.host.host.FreeVirtualMemory=29788;

com.sap.host.host.FreePhysicalMemory=5212;

com.sap.host.host.TotalCPUTime=242507.220000;

com.sap.host.host.Time=1453150151;

com.sap.vm.vm.PhysicalMemoryAllocatedToVirtualSystem=8192;

com.sap.vm.vm.ResourceMemoryLimit=8192;

com.sap.vm.vm.TotalCPUTime=10160.1831404;

com.sap.vm.vm.ResourceProcessorLimit=4;

2.17.1 Installing and Starting the vmetrics Service

To install the vmetrics service, run the install.sh script as the root user on dom0:

[root@scac10adm01]# cd /opt/oracle.SupportTools/vmetrics
[root@scac10adm01]# ./install.sh

The install.sh script verifies that it is running on dom0, stops any vmetrics services currently running, copies the package files to /opt/oracle.vmetrics, and copies vmetrics.svc to /etc/init.d.

To start the vmetrics service on dom0, run the following command as the root user on dom0:

[root@scac10adm01 vmetrics]# service vmetrics.svc start

The commands to gather the statistics are run every 30 seconds.

2.17.2 Files in the vmetrics Package

The vmetrics package contains the following files:

File Description

install.sh

This file installs the package.

vm-dump-metrics

This script reads the statistics from the xenstore and displays them in XML format.

vmetrics

This Python script runs the system commands and uploads them to the xenstore. The system commands are listed in the vmetrics.conf file.

vmetrics.conf

This XML file specifies the metrics that the dom0 should push to the xenstore, and the system commands to run for each metric.

vmetrics.svc

The init.d file that makes vmetrics a Linux service.

2.17.3 Displaying the Statistics

Once the statistics have been pushed to the xenstore, you can view the statistics on dom0 and domU by running either of the following commands:

Note:

On domU's, ensure that the xenstoreprovider and ovmd packages are installed.

xenstoreprovider is the library which communicates with the ovmapi kernel infrastructure.

ovmd is a daemon that handles configuration and reconfiguration events and provides a mechanism to send/receive messages between the VM and the Oracle VM Manager.

The following command installs the necessary packages on Oracle Linux 5 and 6 to support the Oracle VM API.

# yum install ovmd xenstoreprovider
  • The /usr/sbin/ovmd -g vmhost command displays the statistics on one line. The sed command breaks up the line into multiple lines, one statistic per line. You need to run this command as the root user.

    root@scac10db01vm04 ~]# /usr/sbin/ovmd -g vmhost |sed 's/; */;\n/g;s/:"/:"\n/g'
    com.sap.host.host.VirtualizationVendor=Oracle Corporation;
    com.sap.host.host.VirtProductInfo=Oracle VM 3;
    com.sap.host.host.PagedInMemory=0;
    com.sap.host.host.PagedOutMemory=0;
    com.sap.host.host.PageRates=0;
    com.sap.vm.vm.uuid=2b80522b-060d-47ee-8209-2ab65778eb7e;
    com.sap.host.host.HostName=scac10adm01.example.com;
    com.sap.host.host.HostSystemInfo=scac10adm01;
    com.sap.host.host.NumberOfPhysicalCPUs=24;
    com.sap.host.host.NumCPUs=4;
    ...
    
  • The vm-dump-metrics command displays the metrics in XML format.

    [root@scac10db01vm04 ~]# ./vm-dump-metrics
    <metrics>
    <metric type='real64' context='host'>
    <name>TotalCPUTime</name>
    <value>242773.600000</value>
    </metric>
    <metric type='uint64' context='host'>
    <name>PagedOutMemory</name>
    <value>0</value>
    </metric>
    ...
    

    Note that you have copy the vm-dump-metrics command to the domU's from which you want to run the command.

2.17.4 Adding Metrics to vmetrics

You can add your own metric to be collected by the vmetrics service.

  1. In /opt/oracle.SupportTools/vmetrics/vmetrics.conf, add the new metric and the system commands to retrieve and parse that metric. For example:
    <metric type="uint32" context="host">
     <name>NumCPUs</name>
     <action>grep -c processor /proc/cpuinfo</action>
     <action2>xm list | grep '^Domain-0' |awk '{print $4}'</action2>
    </metric>
    

    In the <name> element, enter the name of the new metric.

    In the <action> and <action2> elements, specify the system command for the new metric. You only need to have <action2>, but you can use <action> as a fallback in case <action2> does not work on some systems.

    Note that any action that needs the name of the vm should be done with scas07client07vm01. When vmetrics runs, it swaps out this dummy name for the actual domU names that are running in the dom0.

  2. In /opt/oracle.SupportTools/vmetrics/vmetrics, add the metric in the list gFieldsList. Prefix the metric name with "host" if the metric is about the host (dom0) or with "vm" if the metric is about the vm (domU). For example:

    Suppose the gFieldsList looks like this:

    gFieldsList = [ 'host.VirtualizationVendor',
        'host.VirtProductInfo',
        'host.PagedInMemory',
        'vm.ResourceProcessorLimit' ]
    

    If you are adding a new metric called "NumCPUs" (as shown in the example in step 1), and this metric is intended to tell the domU how many cpu's the dom0 has available, then gFieldsList would now look like:

     gFieldsList = [ 'host.VirtualizationVendor',
        'host.VirtProductInfo',
        'host.PagedInMemory',
        'vm.ResourceProcessorLimit',
        'host.NumCPUs']
    
  3. (optional) In /opt/oracle.SupportTools/vmetrics/vm-dump-metrics, add the new metric if you want the new metric to be included in the XML output.

    If you skip this step, you can view the new metric using the ovmd -g vmhost command.