6.19.2.1.2 Recover the KVM Host on Exadata X9M-2

This procedure describes how to recover the KVM host on an Oracle Exadata X9M-2 database server.

  1. Boot the server and use the system BIOS menus to check the disk controller status. If required, configure the disk controller and set up the disks.
  2. Boot the server in diagnostic mode.
    See Booting a Server using the Diagnostic ISO File in Oracle Exadata System Software User's Guide.
  3. Log in to the diagnostics shell as the root user.
    When prompted, enter the diagnostics shell.

    For example:

    Choose from following by typing letter in '()':
    (e)nter interactive diagnostics shell. Must use credentials 
    from Oracle support to login (reboot or power cycle to exit
    the shell),
    (r)estore system from NFS backup archive, 
    Type e to enter the diagnostics shell and log in as the root user.
    If prompted, log in to the system as the root user. If you are prompted for the root user password and do not have it, then contact Oracle Support Services.
  4. If it is mounted, unmount /mnt/cell
    # umount /mnt/cell
  5. Confirm the md devices on the server.

    Confirm that the server contains the devices listed in the following example. Do not proceed and contact Oracle Support if your server differs substantially.

    # ls -al /dev/md*
    brw-rw---- 1 root disk   9, 126 Jul 15 06:59 /dev/md126
    brw-rw---- 1 root disk 259,   4 Jul 15 06:59 /dev/md126p1
    brw-rw---- 1 root disk 259,   5 Jul 15 06:59 /dev/md126p2
    brw-rw---- 1 root disk   9, 127 Jul 15 06:28 /dev/md127
    brw-rw---- 1 root disk   9,  25 Jul 15 06:28 /dev/md25
    
    /dev/md:
    total 0
    drwxr-xr-x  2 root root  140 Jul 15 06:59 .
    drwxr-xr-x 18 root root 3400 Jul 15 06:59 ..
    lrwxrwxrwx  1 root root    8 Jul 15 06:59 24_0 -> ../md126
    lrwxrwxrwx  1 root root   10 Jul 15 06:59 24_0p1 -> ../md126p1
    lrwxrwxrwx  1 root root   10 Jul 15 06:59 24_0p2 -> ../md126p2
    lrwxrwxrwx  1 root root    7 Jul 15 06:28 25 -> ../md25
    lrwxrwxrwx  1 root root    8 Jul 15 06:28 imsm0 -> ../md127
  6. Remove the logical volumes, the volume group, and the physical volume, in case they still exist after the disaster.
    # lvm vgremove VGExaDb --force
    # lvm pvremove /dev/md25 --force
  7. Remove the existing partitions, then verify all partitions were removed.
    1. Use the following command to remove the existing partitions:
      # for v_partition in $(parted -s /dev/md126 print|awk '/^ / {print $1}')
      do
        parted -s /dev/md126 rm ${v_partition}
      done
    2. Verify by running the following command:
      # parted  -s /dev/md126 unit s print

      The command output should not display any partitions.

  8. Create the boot partition.
    1. Start an interactive session using the partd command.
      # parted /dev/md126
    2. Assign a disk label.
      (parted) mklabel gpt
    3. Set the unit size as sector.
      (parted) unit s
    4. Check the partition table by displaying the existing partitions.
      (parted) print
    5. Remove the partitions listed in the previous step.
      (parted) rm part#
    6. Create a new first partition.
      (parted) mkpart primary 64s 15114206s
    7. Make the new partition bootable.
      (parted) set 1 boot on
  9. Create the second primary (boot) partition.
    1. Create a second primary partition as a UEFI boot partition with fat32.
      (parted) mkpart primary fat32 15114207s 15638494s 
      (parted) set 2 boot on
    2. Write the information to disk, then quit.
      (parted) quit
  10. Create the physical volume and volume group.
    # lvm pvcreate /dev/md25
    # lvm vgcreate VGExaDb /dev/md25

    If the physical volume or volume group already exists, then remove and then re-create them as follows:.

    # lvm vgremove VGExaDb
    # lvm pvremove /dev/md25
    # lvm pvcreate /dev/md25
    # lvm vgcreate VGExaDb /dev/md25
  11. Create the LVM partitions, then create and mount the file systems.
    1. Create the logical volumes.
      # lvm lvcreate -n LVDbSys1 -L15G VGExaDb -y
      # lvm lvcreate -n LVDbSwap1 -L16G VGExaDb -y
      # lvm lvcreate -n LVDbSys2 -L15G VGExaDb -y
      # lvm lvcreate -n LVDbHome -L4G VGExaDb -y
      # lvm lvcreate -n LVDbVar1 -L2G VGExaDb -y
      # lvm lvcreate -n LVDbVar2 -L2G VGExaDb -y
      # lvm lvcreate -n LVDbVarLog -L18G VGExaDb -y
      # lvm lvcreate -n LVDbVarLogAudit -L1G VGExaDb -y
      # lvm lvcreate -n LVDbTmp -L3G VGExaDb -y
      # lvm lvcreate -n LVDoNotRemoveOrUse -L2G VGExaDb -y
      # lvm lvcreate -n LVDbExaVMImages -L1500G VGExaDb -y
      # lvextend -l +98%FREE /dev/VGExaDb/LVDbExaVMImages
    2. Create the file systems.
      # mkfs.xfs -f /dev/VGExaDb/LVDbSys1
      # mkfs.xfs -f /dev/VGExaDb/LVDbSys2
      # mkfs.xfs -f /dev/VGExaDb/LVDbHome
      # mkfs.xfs -f /dev/VGExaDb/LVDbVar1
      # mkfs.xfs -f /dev/VGExaDb/LVDbVar2
      # mkfs.xfs -f /dev/VGExaDb/LVDbVarLog
      # mkfs.xfs -f /dev/VGExaDb/LVDbVarLogAudit
      # mkfs.xfs -f /dev/VGExaDb/LVDbTmp
      # mkfs.xfs -m crc=1 -m reflink=1 -f /dev/VGExaDb/LVDbExaVMImages
      # mkfs.xfs -f /dev/md126p1
      # mkfs.vfat -v -c -F 32 -s 2 /dev/md126p2
    3. Label the file systems.
      # xfs_admin -L DBSYS /dev/VGExaDb/LVDbSys1
      # xfs_admin -L HOME /dev/VGExaDb/LVDbHome
      # xfs_admin -L VAR /dev/VGExaDb/LVDbVar1
      # xfs_admin -L DIAG /dev/VGExaDb/LVDbVarLog
      # xfs_admin -L AUDIT /dev/VGExaDb/LVDbVarLogAudit
      # xfs_admin -L TMP /dev/VGExaDb/LVDbTmp
      # xfs_admin -L EXAVMIMAGES /dev/VGExaDb/LVDbExaVMImages
      # xfs_admin -L BOOT /dev/md126p1
      # dosfslabel /dev/md126p2 ESP
    4. Create mount points for all the partitions, and mount the respective partitions.

      For example, assuming that /mnt is used as the top level directory for the recovery operation, you could use the following commands to create the directories and mount the partitions:

      # mount -t xfs /dev/VGExaDb/LVDbSys1 /mnt
      # mkdir -p /mnt/home
      # mount -t xfs /dev/VGExaDb/LVDbHome /mnt/home
      # mkdir -p /mnt/var
      # mount -t xfs /dev/VGExaDb/LVDbVar1 /mnt/var
      # mkdir -p /mnt/var/log
      # mount -t xfs /dev/VGExaDb/LVDbVarLog /mnt/var/log
      # mkdir -p /mnt/var/log/audit
      # mount -t xfs /dev/VGExaDb/LVDbVarLogAudit /mnt/var/log/audit
      # mkdir -p /mnt/tmp
      # mount -t xfs /dev/VGExaDb/LVDbTmp /mnt/tmp
      # mkdir -p /mnt/EXAVMIMAGES
      # mount -t xfs /dev/VGExaDb/LVDbExaVMImages /mnt/EXAVMIMAGES
      # mkdir -p /mnt/boot
      # mount -t xfs /dev/md126p1 /mnt/boot
      # mkdir -p /mnt/boot/efi
      # mount -t vfat /dev/md126p2 /mnt/boot/efi
  12. Create the system swap space.

    For example:

    # mkswap -L SWAP /dev/VGExaDb/LVDbSwap1
  13. Bring up the network.
    # ip address add ip_address_for_eth0/netmask_for_eth0 dev eth0
    # ip link set up eth0
    # ip route add default via gateway_address dev eth0
  14. Mount the NFS server containing the backup.

    The following example assumes that the backup is located in the /export directory of the NFS server with IP address nfs_ip.

    # mkdir -p /root/mnt
    # mount -t nfs -o ro,intr,soft,proto=tcp,nolock nfs_ip:/export /root/mnt
  15. Restore the files from the backup.

    Assuming that the backup was created using the procedure in Backing up the KVM host Using Snapshot-Based Backup, you can restore the files by using the following command:

    # tar --acls --xattrs --xattrs-include=* --format=pax -pjxvf /root/mnt/myKVMbackup.tar.bz2 -C /mnt
  16. Create the directory for kdump service.
    # mkdir /mnt/EXAVMIMAGES/crashfiles
  17. Check the restored fstab file (at /mnt/etc/fstab), and comment out any line that references /EXAVMIMAGES.
  18. Unmount the restored file systems.

    For example:

    # umount /mnt/tmp
    # umount /mnt/var/log/audit
    # umount /mnt/var/log
    # umount /mnt/var
    # umount /mnt/home
    # umount /mnt/EXAVMIMAGES
    # umount /mnt/boot/efi
    # umount /mnt/boot
    # umount /mnt
  19. Check the boot devices and set the boot order.
    1. Check the available boot devices, and identify the boot device that is associated with Redhat Boot Manager (\EFI\REDHAT\SHIMX64.EFI).

      For example:

      # efibootmgr -v
      BootCurrent: 0019
      Timeout: 1 seconds
      BootOrder:
      0019,0000,0002,0010,0009,0017,000A,000B,0018,0005,0006,0007,0008,0013,0014,0015,0016,0003,0011,0004,0012,001A
      Boot0000* RedHat Boot Manager HD(2,GPT,eec54dfd-8928-4874-833d-5b0b9e914b99,0xe69fdf,0x80000)/File(\EFI\REDHAT\SHIMX64.EFI)
      Boot0002* NET0:PXE IPv4 Intel(R) I210 Gigabit  Network Connection /Pci(0x1c,0x4)/Pci(0x0,0x0)/MAC(0010e0fc6e94,1)/IPv4(0.0.0.00.0.0.0,0,0)..BO
      Boot0003* PCIE5:PXE IPv4 Mellanox Network Adapter - B8:CE:F6:22:38:0A /Pci(0x2,0x0)/Pci(0x0,0x0)/MAC(b8cef622380a,1)/IPv4(0.0.0.00.0.0.0,0,0)..BO
      Boot0004* PCIE5:PXE IPv4 Mellanox Network Adapter - B8:CE:F6:22:38:0B /Pci(0x2,0x0)/Pci(0x0,0x1)/MAC(b8cef622380b,1)/IPv4(0.0.0.00.0.0.0,0,0)..BO
      Boot0005* PCIE3:PXE IPv4 Oracle Quad Port 10GBase-T Adapter /Pci(0x2,0x0)/Pci(0x0,0x0)/MAC(3cfdfe915070,1)/IPv4(0.0.0.00.0.0.0,0,0)..BO
      Boot0006* PCIE3:PXE IPv4 Oracle Quad Port 10GBase-T Adapter /Pci(0x2,0x0)/Pci(0x0,0x1)/MAC(3cfdfe915071,1)/IPv4(0.0.0.00.0.0.0,0,0)..BO
      Boot0007* PCIE3:PXE IPv4 Oracle Quad Port 10GBase-T Adapter /Pci(0x2,0x0)/Pci(0x0,0x2)/MAC(3cfdfe915072,1)/IPv4(0.0.0.00.0.0.0,0,0)..BO
      Boot0008* PCIE3:PXE IPv4 Oracle Quad Port 10GBase-T Adapter /Pci(0x2,0x0)/Pci(0x0,0x3)/MAC(3cfdfe915073,1)/IPv4(0.0.0.00.0.0.0,0,0)..BO
      Boot0009* PCIE1:PXE IPv4 Mellanox Network Adapter - B8:CE:F6:44:51:9C /Pci(0x2,0x0)/Pci(0x0,0x0)/MAC(b8cef644519c,1)/IPv4(0.0.0.00.0.0.0,0,0)..BO
      Boot000A* PCIE1:PXE IPv4 Mellanox Network Adapter - B8:CE:F6:44:51:9D /Pci(0x2,0x0)/Pci(0x0,0x1)/MAC(b8cef644519d,1)/IPv4(0.0.0.00.0.0.0,0,0)..BO
      Boot000B* PCIE1:PXE IPv4 Mellanox Network Adapter - B8:CE:F6:44:51:9D /Pci(0x2,0x0)/Pci(0x0,0x1)/MAC(b8cef644519d,1)/IPv4(0.0.0.00.0.0.0,0,0)..BO
      Boot0010* NET0:PXE IPv4 Intel(R) I210 Gigabit  Network Connection /Pci(0x1c,0x4)/Pci(0x0,0x0)/MAC(0010e0fc6e94,1)/IPv4(0.0.0.00.0.0.0,0,0)..BO
      Boot0011* PCIE5:PXE IPv4 Mellanox Network Adapter - B8:CE:F6:22:38:0A /Pci(0x2,0x0)/Pci(0x0,0x0)/MAC(b8cef622380a,1)/IPv4(0.0.0.00.0.0.0,0,0)..BO
      Boot0012* PCIE5:PXE IPv4 Mellanox Network Adapter - B8:CE:F6:22:38:0B /Pci(0x2,0x0)/Pci(0x0,0x1)/MAC(b8cef622380b,1)/IPv4(0.0.0.00.0.0.0,0,0)..BO
      Boot0013* PCIE3:PXE IPv4 Oracle Quad Port 10GBase-T Adapter /Pci(0x2,0x0)/Pci(0x0,0x0)/MAC(3cfdfe915070,1)/IPv4(0.0.0.00.0.0.0,0,0)..BO
      Boot0014* PCIE3:PXE IPv4 Oracle Quad Port 10GBase-T Adapter /Pci(0x2,0x0)/Pci(0x0,0x1)/MAC(3cfdfe915071,1)/IPv4(0.0.0.00.0.0.0,0,0)..BO
      Boot0015* PCIE3:PXE IPv4 Oracle Quad Port 10GBase-T Adapter /Pci(0x2,0x0)/Pci(0x0,0x2)/MAC(3cfdfe915072,1)/IPv4(0.0.0.00.0.0.0,0,0)..BO
      Boot0016* PCIE3:PXE IPv4 Oracle Quad Port 10GBase-T Adapter /Pci(0x2,0x0)/Pci(0x0,0x3)/MAC(3cfdfe915073,1)/IPv4(0.0.0.00.0.0.0,0,0)..BO
      Boot0017* PCIE1:PXE IPv4 Mellanox Network Adapter - B8:CE:F6:44:51:9C /Pci(0x2,0x0)/Pci(0x0,0x0)/MAC(b8cef644519c,1)/IPv4(0.0.0.00.0.0.0,0,0)..BO
      Boot0018* PCIE1:PXE IPv4 Mellanox Network Adapter - B8:CE:F6:44:51:9D /Pci(0x2,0x0)/Pci(0x0,0x1)/MAC(b8cef644519d,1)/IPv4(0.0.0.00.0.0.0,0,0)..BO
      Boot0019* USB:SP:SUN Remote ISO CDROM1.01 /Pci(0x14,0x0)/USB(7,0)/USB(3,0)/CDROM(1,0x28,0x3100)..BO
      Boot001A* Oracle Linux (grubx64.efi) HD(2,GPT,eec54dfd-8928-4874-833d-5b0b9e914b99,0xe69fdf,0x80000)/File(\EFI\REDHAT\GRUBX64.EFI)..BO
      MirroredPercentageAbove4G: 0.00
      MirrorMemoryBelow4GB: false
    2. Configure the device that is associated with Redhat Boot Manager (\EFI\REDHAT\SHIMX64.EFI) to be first in the boot order.

      In this example, Redhat Boot Manager is associated with boot device 0000:

      # efibootmgr -o 0000
      BootCurrent: 0019
      Timeout: 1 seconds
      BootOrder: 0000
      Boot0000* RedHat Boot Manager
      Boot0002* NET0:PXE IPv4 Intel(R) I210 Gigabit  Network Connection
      Boot0003* PCIE5:PXE IPv4 Mellanox Network Adapter - B8:CE:F6:22:38:0A
      Boot0004* PCIE5:PXE IPv4 Mellanox Network Adapter - B8:CE:F6:22:38:0B
      Boot0005* PCIE3:PXE IPv4 Oracle Quad Port 10GBase-T Adapter
      Boot0006* PCIE3:PXE IPv4 Oracle Quad Port 10GBase-T Adapter
      Boot0007* PCIE3:PXE IPv4 Oracle Quad Port 10GBase-T Adapter
      Boot0008* PCIE3:PXE IPv4 Oracle Quad Port 10GBase-T Adapter
      Boot0009* PCIE1:PXE IPv4 Mellanox Network Adapter - B8:CE:F6:44:51:9C
      Boot000A* PCIE1:PXE IPv4 Mellanox Network Adapter - B8:CE:F6:44:51:9D
      Boot000B* PCIE1:PXE IPv4 Mellanox Network Adapter - B8:CE:F6:44:51:9D
      Boot0010* NET0:PXE IPv4 Intel(R) I210 Gigabit  Network Connection
      Boot0011* PCIE5:PXE IPv4 Mellanox Network Adapter - B8:CE:F6:22:38:0A
      Boot0012* PCIE5:PXE IPv4 Mellanox Network Adapter - B8:CE:F6:22:38:0B
      Boot0013* PCIE3:PXE IPv4 Oracle Quad Port 10GBase-T Adapter
      Boot0014* PCIE3:PXE IPv4 Oracle Quad Port 10GBase-T Adapter
      Boot0015* PCIE3:PXE IPv4 Oracle Quad Port 10GBase-T Adapter
      Boot0016* PCIE3:PXE IPv4 Oracle Quad Port 10GBase-T Adapter
      Boot0017* PCIE1:PXE IPv4 Mellanox Network Adapter - B8:CE:F6:44:51:9C
      Boot0018* PCIE1:PXE IPv4 Mellanox Network Adapter - B8:CE:F6:44:51:9D
      Boot0019* USB:SP:SUN Remote ISO CDROM1.01
      Boot001A* Oracle Linux (grubx64.efi)
      MirroredPercentageAbove4G: 0.00
      MirrorMemoryBelow4GB: false
  20. Restart the system.
    # reboot
  21. Disconnect the diagnostics.iso file.
    See Booting a Server using the Diagnostic ISO File in Oracle Exadata System Software User's Guide.
  22. Log back into the server as the root user.
  23. Run the imageinfo command and verify that the image status is success.

    For example:

    # imageinfo
    
    Kernel version: 4.14.35-2047.502.5.el7uek.x86_64 #2 SMP Wed Apr 14 15:08:41
    PDT 2021 x86_64
    Uptrack kernel version: 4.14.35-2047.503.1.el7uek.x86_64 #2 SMP Fri Apr 23
    15:20:41 PDT 2021 x86_64
    Image kernel version: 4.14.35-2047.502.5.el7uek
    Image version: 21.2.1.0.0.210608
    Image activated: 2021-07-12 14:58:03 +0900
    Image status: success
    Node type: COMPUTE
    System partition on device: /dev/mapper/VGExaDb-LVDbSys1
The KVM host has been recovered.