5.20.2.3 Recovering a Management Domain and Its User Domains (Release 18.1 and X7 and Later)

You can recover a management domain from a snapshot-based backup when severe disaster conditions damage the management domain, or when the server hardware is replaced to such an extent that it amounts to new hardware.

  1. Prepare an NFS server to host the backup archive mybackup.tar.bz2.

    The NFS server must be accessible by IP address. For example, on an NFS server with the IP address nfs_ip, where the directory /export is exported from NFS mounts, put the mybackup.tar.bz2 file in the /export directory

  2. Restart the recovery target system using the diagnostics.iso file.
    See Booting a Server using the Diagnostic ISO File in Oracle Exadata System Software User's Guide.
  3. Log in to the diagnostics shell as the root user.
    When prompted, enter the diagnostics shell.

    For example:

    Choose from following by typing letter in '()':
    (e)nter interactive diagnostics shell. Must use credentials 
    from Oracle support to login (reboot or power cycle to exit
    the shell),
    (r)estore system from NFS backup archive, 
    Type e to enter the diagnostics shell and log in as the root user.
    If prompted, log in to the system as the root user. If you are prompted for the root user password and do not have it, then contact Oracle Support Services.
  4. If required, use /opt/MegaRaid/storcli/storcli64(or/opt/MegaRAID/MegaCli/MegaCli64 for releases earlier than Oracle Exadata System Software 19c) to configure the disk controller to set up the disks.
  5. Remove the logical volumes, the volume group, and the physical volume, in case they still exist after the disaster.
    # lvm vgremove VGExaDb --force
    # lvm pvremove /dev/sda3 --force
  6. Remove the existing partitions, then verify all partitions were removed.
    # parted
    GNU Parted 2.1
    Using /dev/sda
    Welcome to GNU Parted! Type 'help' to view a list of commands.
    (parted) print 
    Model: AVAGO MR9361-16i (scsi)
    Disk /dev/sda: 4193GB
    Sector size (logical/physical): 512B/512B
    Partition Table: gpt
    
    Number  Start   End     Size    File system  Name     Flags
     1      32.8kB  537MB   537MB   ext4         primary  boot
     2      537MB   805MB   268MB   fat32        primary  boot
     3      805MB   4193GB  4192GB               primary  lvm
    
    (parted) rm 1
    [ 1730.498593]  sda: sda2 sda3 
    (parted) rm 2 
    [ 1736.203794]  sda: sda3
    
    (parted) rm 3 
    [ 1738.546845]  sda:
    (parted) print
     Model: AVAGO MR9361-16i (scsi)
    Disk /dev/sda: 4193GB
    Sector size (logical/physical): 512B/512B
    Partition Table: gpt
    
    Number  Start  End  Size  File system  Name  Flags
    
    (parted) q 
    Information: You may need to update /etc/fstab.
  7. Create the three partitions on /dev/sda.
    1. Get the end sector for the disk /dev/sda from a running management domain (dom0) and store it in a variable:
      # end_sector_logical=$(parted -s /dev/sda unit s print|perl -ne '/^Disk\s+\S+:\s+(\d+)s/ and print $1')
      # end_sector=$( expr $end_sector_logical - 34 )
      # echo $end_sector

      The values for the start and end sectors in the commands below were taken from an existing management domain. Because these values can change over time, it is recommended that these values are checked from an existing dom0. For example, for an Oracle Exadata X7-2 database server with 8 hard disk drives, you might see the following:

      # parted -s /dev/sda unit s print
      Model: AVAGO MR9361-16i (scsi)
      Disk /dev/sda: 8189440000s
      Sector size (logical/physical): 512B/512B
      Partition Table: gpt
      
      Number  Start     End          Size         File system  Name     Flags
       1      64s       1048639s     1048576s     ext4         primary  boot
       2      1048640s  1572927s     524288s      fat32        primary  boot
       3      1572928s  8189439966s  8187867039s               primary  lvm
      

      Note:

      The s (sector) value for the following sub-steps are based on a system with 8 hard disk drives. If you have 4 hard disk drives, then you need to view the partition table from the management domain on a running node and adjust the sector values accordingly.
    2. Create the boot partition, /dev/sda1.
      # parted -s /dev/sda mklabel gpt mkpart primary 64s 1048639s set 1 boot on
    3. Create the partition that will hold the LVMs, /dev/sda2.
      # parted -s /dev/sda mkpart primary fat32 1048640s 1572927s set 2 boot on
    4. Create the partition that will hold the LVMs, /dev/sda3.
      # parted -s /dev/sda mkpart primary 1572928s 8189439966s set 3 lvm on
  8. Use the /sbin/lvm command to re-create the logical volumes and mkfs to create the file systems.
    1. Create the physical volume and the volume group.
      # lvm pvcreate /dev/sda3
      # lvm vgcreate VGExaDb /dev/sda3
      
    2. Create the logical volume for the file system that will contain the / (root) directory and label it.
      # lvm lvcreate -n LVDbSys3 -L30G VGExaDb
      # mkfs -t ext4 /dev/VGExaDb/LVDbSys3
      # e2label /dev/VGExaDb/LVDbSys3 DBSYSOVS
      
    3. Create the logical volume for the swap directory, and label it.
      # lvm lvcreate -n LVDbSwap1 -L24G VGExaDb
      # mkswap -L SWAP /dev/VGExaDb/LVDbSwap1
      
    4. Create the logical volume for the backup partition, and build a file system on top of it.
      # lvm lvcreate -n LVDbSys2 -L30G VGExaDb
      # mkfs -t ext4 /dev/VGExaDb/LVDbSys2
    5. Create the logical volume for the reserved partition, which is needed for creating snapshots.
      # lvm lvcreate -n LVDoNotRemoveOrUse –L 1G VGExaDb

      Note:

      Do not create any file system on this logical volume.
    6. Create the logical volume for the guest storage repository.
      # lvm lvcreate -l 100%FREE -n LVDbExaVMImages VGExaDb
      
    7. Create a file system on the /dev/sda1 partition, and label it.
      # mkfs.ext4 /dev/sda1
      # e2label /dev/sda1 BOOT
      # tune2fs -l /dev/sda1
    8. Create a file system on the /dev/sda2 partition, and label it.
      # mkfs.vfat -v -c -F 32 -s 2 /dev/sda2
      # dosfslabel /dev/sda2 ESP
  9. Create mount points for all the partitions, and mount the respective partitions.

    For example, if /mnt is used as the top-level directory, the mounted list of partitions might look like:

    • /dev/VGExaDb/LVDbSys3 on /mnt
    • /dev/sda1 on /mnt/boot
    • /dev/sda2 on /mnt/boot/efi

    The following example mounts the root (/) file system, and creates three mount points:

    # mount /dev/VGExaDb/LVDbSys3 /mnt -t ext4
    # mkdir /mnt/boot
    # mount /dev/sda1 /mnt/boot -t ext4
    # mkdir /mnt/boot/efi
    # mount /dev/sda2 /mnt/boot/efi -t vfat
    
  10. Bring up the network on eth0 and (if not using DHCP) assign the host's IP address and netmask to it.

    If you are using DHCP then you do not have to manually configure the IP address for the host.

    # ip address add ip_address_for_eth0/netmask_for_eth0 dev eth0
    # ip link set up eth0
    # ip route add default via gateway_ip_address dev eth0
  11. Mount the NFS server holding the backups.
    # mkdir -p /root/mnt
    # mount -t nfs -o ro,intr,soft,proto=tcp,nolock nfs_ip:/location_of_backup /root/mnt
    
  12. From the backup which was created in Backing up the Management Domain dom0 Using Snapshot-Based Backup, restore the root directory (/) and the boot file system.
    # tar -pjxvf /root/mnt/backup-of-root-and-boot.tar -C /mnt
  13. Use the efibootmgr command to set the boot device.
    1. Disable and delete the Oracle Linux boot device. If you see the entry ExadataLinux_1, then remove this entry and recreate it.

      For example:

      # efibootmgr
      BootCurrent: 000F
      Timeout: 1 seconds
      BootOrder: 000F,0000,0001,0002,0003,0004,0005,0007,0008,0009,000A,000D,000E
      Boot0000* ExadataLinux_1
      Boot0001* NET0:PXE IP4 Intel(R) I210 Gigabit  Network Connection
      Boot0002* NET1:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller
      Boot0003* NET2:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller
      Boot0004* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter
      Boot0005* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter
      Boot0007* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot0008* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot0009* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot000A* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot000D* Oracle Linux
      Boot000E* UEFI OS
      Boot000F* USB:SUN
      

      In this example, you would disable and remove Oracle Linux (Boot00D) and ExadataLinux_1 (Boot000). Use commands similar to the following to disable and delete the boot devices:

      Disable 'Oracle Linux':
      # efibootmgr -b 000D -A
      Delete 'Oracle Linux':
      # efibootmgr -b 000D -B
      Disable old 'ExadataLinux_1':
      # efibootmgr -b 0000 -A
      Delete old 'ExadataLinux_1':
      # efibootmgr -b 0000 -B

    2. Recreate the boot entry for ExadataLinux_1 and then view the boot order entries.
      # efibootmgr -c -d /dev/sda -p 2 -l '\EFI\XEN\XEN.EFI' -L 
      'ExadataLinux_1'
      
      # efibootmgr
      BootCurrent: 000F
      Timeout: 1 seconds
      BootOrder: 0000,0001,0002,0003,0004,0005,0007,0008,0009,000A,000E,000F
      Boot0001* NET0:PXE IP4 Intel(R) I210 Gigabit  Network Connection
      Boot0002* NET1:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller
      Boot0003* NET2:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller
      Boot0004* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter
      Boot0005* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter
      Boot0007* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot0008* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot0009* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot000A* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot000E* UEFI OS
      Boot000F* USB:SUN
      Boot0000* ExadataLinux_1

      In the output from the efibootmgr command, make note of the boot order number for ExadataLinux_1 and use that value in the following commands:

      # efibootmgr -b (entry number) -A
      # efibootmgr -b (entry number) -a

      For example, in the previous output shown in step 13.a, ExadataLinux_1 was listed as (Boot000). So you would use the following commands:

      # efibootmgr -b 0000 -A
      # efibootmgr -b 0000 -a
    3. Set the correct boot order.
      Set ExadataLinux_1 as the first boot device. The remaining devices should stay in the same boot order, except for USB:SUN, which should be last.
      # efibootmgr -o
      0000,0001,0002,0003,0004,0005,0007,0008,0009,000A,000E,000F
      

      The boot order should now look like the following:

      # efibootmgr
      BootCurrent: 000F
      Timeout: 1 seconds
      BootOrder: 0000,0001,0002,0003,0004,0005,0007,0008,0009,000A,000E,000F
      Boot0000* ExadataLinux_1
      Boot0001* NET0:PXE IP4 Intel(R) I210 Gigabit  Network Connection
      Boot0002* NET1:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller
      Boot0003* NET2:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller
      Boot0004* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter
      Boot0005* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter
      Boot0007* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot0008* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot0009* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot000A* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot000E* UEFI OS
      Boot000F* USB:SUN
    4. Check the boot order using the ubiosconfig command.
      # ubiosconfig export all -x /tmp/ubiosconfig.xml
      Make sure the ExadataLinux_1 entry is the first child element of boot_order.
       <boot_order>
          <boot_device>
            <description>ExadataLinux_1</description>  
            <instance>1</instance>
          </boot_device>
          <boot_device>
            <description>NET0:PXE IP4 Intel(R) I210 Gigabit  Network
      Connection</description>
            <instance>1</instance>
          </boot_device>
      ...
  14. Check the restored /etc/fstab file and comment out any reference to /EXAVMIMAGES.
    # cd /mnt/etc

    Comment out any line that references /EXAVMIMAGES.

  15. Detach the diagnostics.iso file.

    Using the ILOM Web interface, navigate to the Storage Devices dialog and click Disconnect.

    The Storage Devices dialog is the interface that you earlier used to attach the diagnostics.iso image. See Booting a Server using the Diagnostic ISO File in Oracle Exadata System Software User's Guide.

  16. Unmount the restored /dev/sda1 partitions so /dev/sda1 can be remounted on /boot.
    # umount /mnt/boot/efi
    # umount /mnt/boot
    # umount /mnt
    # umount /root/mnt
  17. Restart the system.
    # shutdown -r now

    This completes the restoration procedure for the management domain (dom0).

  18. Convert to Eighth Rack, if required.

    If the recovery is on an Oracle Exadata Eighth Rack, then perform the procedure described in Configuring Oracle Exadata Database Machine Eighth Rack Oracle Linux Database Server After Recovery.

  19. When the server comes back up, build an OCFS2 file system on the LVDbExaVMImages logical volume.
    # mkfs -t ocfs2 -L ocfs2 -T vmstore --fs-features=local /dev/VGExaDb/LVDbExaVMImages --force
  20. Mount the OCFS2 partition on /EXAVMIMAGES.
    # mount -t ocfs2 /dev/VGExaDb/LVDbExaVMImages /EXAVMIMAGES
  21. In /etc/fstab, uncomment the references to /EXAVMIMAGES and /dev/mapper/VGExaDb-LVDbExaVMImages, which you commented out earlier.
  22. Mount the backup NFS server that holds the storage repository (/EXAVMIMAGES) backup to restore the /EXAVMIMAGES file system.
    # mkdir -p /root/mnt
    # mount -t nfs -o ro,intr,soft,proto=tcp,nolock nfs_ip:/location_of_backup /root/mnt
    
  23. Restore the /EXAVMIMAGES file system.

    To restore all user domains, use this command:

    # tar -Spxvf /root/mnt/backup-of-exavmimages.tar -C /EXAVMIMAGES

    To restore a single user domain from the backup, use the following command instead:

    # tar -Spxvf /root/mnt/backup-of-exavmimages.tar -C /EXAVMIMAGES EXAVMIMAGES/<user-domain-name-to-be-restored>
  24. Bring up each user domain.
    # xm create /EXAVMIMAGES/GuestImages/user_domain_hostname/vm.cfg

At this point all the user domains should come up along with Oracle Grid Infrastructure and the Oracle Database instances.