5 Managing Oracle VM Domains on Oracle Exadata Database Machine

Oracle VM user domains on Oracle Exadata Database Machine are managed using the xm(1) command run from the management domain (also known as domain-0, or dom0). For a full list of Oracle VM administrative commands, run the xm help command.

Note:

The following xm subcommands are not supported on Oracle Exadata Database Machine:

mem-set
mem-max
migrate
restore
resume
save
suspend
sched-*
cpupool-*
tmem-*

Note:

Unless otherwise noted, all commands run in the following procedures are run as the root user.

5.1 Oracle VM and Oracle Exadata Database Machine

When deploying Oracle Exadata Database Machine, you can decide to implement Oracle VM on the database servers.

Oracle VM Server and one or more Oracle VM guests are installed on every database server. You can configure Oracle VM environments on your initial deployment using scripts created by Oracle Exadata Deployment Assistant (OEDA) or you can migrate an existing environment to Oracle VM.

5.1.1 About Oracle VM

Oracle VM enables you to deploy the Oracle Linux operating system and application software within a supported virtualization environment.

If you use Oracle VM on Oracle Exadata Database MachineExadata, then they provide CPU, memory, operating system, and sysadmin isolation for your workloads. You can combine VMs with network and I/O prioritization to achieve full stack isolation. For consolidation, you can create multiple trusted databases or pluggable databases in an Oracle VM, allowing resources to be shared more dynamically.

An Oracle VM environment consists of an Oracle VM Server, virtual machines, and resources. An Oracle VM Server is a managed virtualization environment providing a lightweight, secure, server platform which runs virtual machines, also known as domains.

Oracle VM Server is installed on a bare metal computer. The hypervisor present on each Oracle VM Server is an extremely small-footprint virtual machine manager and scheduler. It is designed so that it is the only fully privileged entity in the system. It controls only the most basic resources of the system, including CPU and memory usage, privilege checks, and hardware interrupts.

The hypervisor securely executes multiple virtual machines on one host computer. Each virtual machine runs in its own domain and has its own guest operating system. A primary management domain, dom0, an abbreviation for domain zero, also runs as a guest on top of the hypervisor. Dom0 has privileged access to the hardware and device drivers.

A user domain (domU) is an unprivileged domain that can access the InfiniBand HCA. DomU is started and managed on an Oracle VM Server by dom0. Because a user-domain operates independently of other domains, a configuration change applied to the virtual resources of a domU does not affect any other domains. A failure of the domU does not impact any other domains.

The terms "domain", "guest", and "virtual machine" are often used interchangeably, but they have subtle differences:

  • A domain is a configurable set of resources, including memory, virtual CPUs, network devices and disk devices, in which virtual machines run.
  • A domain or virtual machine is granted virtual resources and can be started, stopped and restarted independently of other domains or the host server itself.
  • A guest is a virtualized operating system running within a domain. Guest operating systems each have their own management domain called a user domain, abbreviated to domU.

Up to 8 guests can run on the same Oracle VM Server, each within its own domain. These domains are unprivileged domains that can access the InfiniBand HCA. Each domU is started alongside dom0 running on Oracle VM Server. Other domains never interact with dom0 directly. Their requirements are handled by the hypervisor itself. Dom0 only provides a means to administer the hypervisor.

You use Oracle Exadata Deployment Assistant (OEDA) to create and configure Oracle VMs on Oracle Exadata Database Machine.

5.1.2 Maximum Supported Virtual Machines on Oracle Exadata Database Machine

For Oracle Exadata Database Servers, the maximum number of supported virtual machines is eight.

For the software prerequisites, refer to My Oracle Support documents 888828.1.

5.1.3 Supported Operations in the Management Domain (dom0)

Manually modifying the dom0 can result in configuration issues for Oracle VM Server, which can degrade performance or cause a loss of service.

WARNING:

Oracle does not support any changes that are made to the dom0 beyond what is documented. Oracle does not support running any third party software within the dom0.

If you are in doubt whether an operation on the dom0 is supported, contact Oracle Support Services.

5.1.4 Oracle VM Resources

Two fundamental parts of the Oracle VM infrastructure – networking and storage – are configured outside of the Oracle VM.

Networking

When specifying the configuration details for your Oracle Exadata Rack using Oracle Exadata Deployment Assistant (OEDA), you provide input on how the required network IP addresses for Oracle VM environments should be created. The generated OEDA setup files are transferred to the Oracle Exadata Rack and used to create the network addresses.

Storage

Oracle VM always requires a location to store environment resources that are essential to the creation and management of virtual machines. These resources include ISO files (virtual DVD images), VM configuration files and VM virtual disks. The location of such a group of resources is called a storage repository.

On Oracle Exadata Database Machine, storage for the Oracle VMs is configured as OCFS2 (Oracle Cluster File System) storage.

If you need more storage space for Oracle VM, you can purchase a disk expansion kit. The additional disk space can be used to support more Oracle VM guests (up to a maximum of 8) by expanding /EXAVMIMAGES or to increase the size of the /u01 partition in each domU.

Maximum Supported VMs on Exadata

For any existing Exadata Database Server, the maximum number of supported VMs is eight. For software prerequisites, refer to My Oracle Support notes 888828.1 and 1270094.1.

5.1.4.1 Storage Configuration for Management Domain

The management domain (dom0) contains the image repository and Oracle VM configuration files.

The management domain contains the following directories:

  • /EXAVMIMAGES, where the images used to create the guests are stored. The ZIP files in this directory contain the ISO files.
  • /conf
  • /GuestImages, where the files representing each user domain are stored.

The management domain exists on the physical disk /dev/sda. There are three disk partitions:

  • /dev/sda1 — Mounted as /boot.
  • /dev/sda2 — Used for swap.
  • /dev/sda3 — Used for the volume group VGExaDb.

The logical volumes created for the management domain are:

  • /dev/VGExaDb/LVDbSys2 — Used by dbnodeupdate.sh while performing a backup
  • /dev/VGExaDb/LVDbSys3 — Mounted as /
  • /dev/VGExaDb/LVDbSwap1 — Used for swap space
  • /dev/VGExaDb/LVDoNotRemoveOrUse — Used by dbnodeupdate.sh while performing a backup
  • /dev/VGExaDb/LVDbExaVMImages — Mounted as /EXAVMIMAGES

The /EXAVMIMAGES directory is where the configuration files for each virtual machine are located. The files are named using the format /EXAVMIMAGES/GuestImages/nodename/vm.cfg. Each virtual machine also has image files that point back to the ISO files in the image repository. The following files, except for pv1_vgeexadb.img, are created with reflinks, an OCFS2 feature:

  • /EXAVMIMAGES/GuestImages/user-domain-name/System.img
  • /EXAVMIMAGES/GuestImages/user-domain-name/gridversion.img
  • /EXAVMIMAGES/GuestImages/user-domain-name/dbversion.img
5.1.4.2 Storage Configuration for User Domain

The user domain (domU) is a virtualized database node.

Each user domain has 4 virtual disks at the management domain (dom0) level. This can be seen from /EXAVMIMAGES/GuestImages/user_domain_name/vm.cfg. These 4 virtual disks are in turn soft linked to 4 files under /EXAVMIMAGES/GuestImages/user_domain_name, which are the real disk image files as described below:

  • /dev/xvda, for the system image file System.img
  • /dev/xvdb, for the Oracle Grid Infrastructure image file, for example, grid12.1.0.2.2.img. This virtual disk is 50 GB in size and is mounted as /u01/app/version/grid.
  • /dev/xvdc, for the Oracle Database image file, for example, db12.1.0.2.2-3.img. This virtual disk is 50 GB in size and is mounted as /u01/app/oracle/product/version/dbhome_1.
  • /dev/xvdd, for the pv1_vgexadb.img image file

The System.img (/dev/xvda) disk has 2 partitions created on pre-grub2 images and 3 partitions on grub2 images.

  • Pre-Grub2 image
    • Partition 1 — The boot partition (/boot) for the user domain (512 MB), represented as xvda1 in the user domain.
    • Partition 2 — Where the bios-grub is stored (24.5 GB), represented as xvda2 in the user domain.
  • Grub2 image
    • Partition 1 — The boot partition (/boot) for the user domain (512 MB), represented as xvda1 in the user domain.
    • Partition 2 — The EFI boot partition on Oracle Exadata Database MachineX7 and later systems
    • Partition 3 — Where the bios-grub is stored (24.5 GB), represented as xvda3 in the user domain.

The pv1_vgexadb.img (/dev/xvdd) disk has 1 partition. The disk partition /dev/xvdd1 is 62 GB in size.

For pre-grub2 images, 2 physical volumes (PVs) are laid on top of the xvda2 and xvdd1 partitions. On grub2 images, 2 physical volumes (PVs) are laid on top of the xvda3 and xvdd1 partitions. A volume group (VgExaDb) of size 86.49G is laid on top of these physical volumes. This volume group contains the following logical volumes (LVMs):

  • /dev/VGExaDb/LVDbSys1 (24 GB) — used for the root file system /. This LVM is confined to the xvda2 partition (for pre-grub2 images) or the xvda3 partition (for grub2 images).
  • /dev/VGExaDb/LVDbSys2 (24 GB) — used for dbnodeupdate backups.
  • /dev/VGExaDb/LVDbOra1 (24 GB) — used for the /u01 file system which holds the diagnostic_dest area.
  • /dev/VGExaDb/LVDbSwap1 (16 GB) — used for swap
  • /dev/VGExaDb/LVDbDoNotRemoveOrUse (1 GB) — a reserved LVM used by dbnodeupdate
  • /dev/VGExaDb/LVDbVdnodenamedgname (128 MB) — for the quorum disks.

All but the first and last LVMs in the list above are contained in the xvdd1 partition.

5.2 Migrating a Bare Metal Oracle RAC Cluster to an Oracle RAC Cluster in Oracle VM

Note:

This topic applies only to two-socket x86 servers. It does not apply to eight-socket servers such as Oracle Exadata Database Machine X5-8.

The migration of a bare metal Oracle RAC cluster to an Oracle RAC cluster in Oracle VM can be achieved in the following ways:

  • Migrate to Oracle RAC cluster in Oracle VM using the existing bare metal Oracle RAC cluster with zero downtime.

  • Migrate to Oracle RAC cluster in Oracle VM by creating a new Oracle RAC cluster in Oracle VM with minimal downtime.

  • Migrate to Oracle RAC cluster in Oracle VM using Oracle Data Guard with minimal downtime.

  • Migrate to Oracle RAC cluster in Oracle VM using Oracle Recovery Manager (RMAN) backup and restore with complete downtime.

The conversion of a bare metal Oracle RAC cluster to an Oracle RAC cluster in Oracle VM has the following implications:

  • Each of the database servers will be converted to an Oracle VM Server on which a management domain (dom0) is created along with one or more user domains, depending on the number of Oracle RAC clusters being deployed. Each user domain on a database server will belong to a particular Oracle RAC cluster.

  • As part of the conversion procedure, the bare metal Oracle RAC cluster will be converted to one Oracle RAC cluster in Oracle VM to start with. There will be one user domain per database server.

  • At the end of the conversion, the cell disk and grid disk configuration of the storage cells are the same as they were at the beginning of the conversion.

  • The management domain will use a small portion of the system resources on each database server. Typically a management domain uses 8 GB of memory and 4 virtual CPUs. This has to be taken into consideration while sizing the SGA of the databases running on the Oracle RAC cluster in Oracle VM.

  • Refer to My Oracle Support note 2099488.1 for the complete instructions.

5.3 Showing Running Domains

The following procedure describes how to show running domains:

  1. Connect to the management domain (domain zero, or dom0).
  2. Run the xm list command. The following is an example of the output:
    Example
    # xm list
    Name                         ID   Mem   VCPUs      State   Time(s)
    Domain-0                      0   8192     4       r-----  409812.7
    dm01db01vm01                  8   8192     2       -b---- 156610.6
    dm01db01vm02                  9   8192     2       -b---- 152169.8
    dm01db01vm03                 10  10240     4       -b---- 150225.9
    dm01db01vm04                 16  12288     8       -b---- 113519.3
    dm01db01vm05                 12  12288     8       -b---- 174101.6
    dm01db01vm06                 13  12288     8       -b---- 169115.9
    dm01db01vm07                 14   8192     4       -b---- 175573.0
    

5.4 Monitoring a User Domain Console

The following procedure describes how to monitor a user domain console:

  1. Connect as the root user to the management domain.
  2. Obtain the domain name using the xm list command.
  3. Use the following command to attach to the user domain console:
    # xm console DomainName
    

    In the preceding command, DomainName is the name of the domain.

  4. Press CTRL+] to disconnect from the console.

5.5 Monitoring Oracle VMs with Oracle Enterprise Manager

The Exadata plug-in for Oracle Enterprise Manager discovers, manages, and monitors virtualized Oracle Exadata Database Machine in conjunction with the Virtualization Infrastructure plug-in of Oracle Enterprise Manager.

With virtualized Exadata, one Exadata Database Machine target is created for each physical Database Server instead of one DB Machine target for each DB cluster deployed through Oracle Exadata Deployment Assistant (OEDA). Compute nodes, Exadata Storage Servers, InfiniBand switches, compute node ILOM, PDU, KVM, and Cisco switch targets are discovered by the Exadata plug-in. The physical server (physical Oracle Server target), Dom0 (Virtual Platform target), and DomU (virtual Oracle Server target) are discovered and monitored by the Virtualization Infrastructure (VI) plug-in.
  • Refer to Virtualized Exadata Database Machine in Oracle Enterprise Manager Exadata Management Getting Started Guide for instructions on how to discover Oracle VM domains on Oracle Exadata Database Machine.

5.6 Starting a User Domain

The following procedure describes how to start a user domain:

  1. Use the following command to start the user domain:
    # xm create /EXAVMIMAGES/GuestImages/DomainName/vm.cfg
    Using config file "/EXAVMIMAGES/GuestImages/dm01db01vm04/vm.cfg".
    Started domain dm01db01vm04 (id=23)
    

    In the preceding command, DomainName is the name of the domain.

    Note:

    To see Oracle Linux boot messages during user domain startup, connect to the console during startup using the -c option. To disconnect from the console after startup is complete, press CTRL+].

5.7 Disabling User Domain Automatic Start

The following procedure describes how to disable a user domain from automatically starting when the management domain is started:

  1. Connect to the management domain.
  2. Remove the symbolic link to the user domain configuration file in the /etc/xen/auto directory using the following command:
    # rm /etc/xen/auto/DomainName.cfg
    

    In the preceding command, DomainName is the name of the domain.

5.8 Shutting Down a User Domain From Within the User Domain

The following procedure describes how to shut down a user domain from within a user domain:

  1. Connect as the root user to the user domain.
  2. Use the following command to shut down the domain:
    # shutdown -h now
    

5.9 Shutting Down a User Domain From Within the Management Domain

The following procedure describes how to shut down a user domain from within a management domain:

  1. Connect as the root user to the management domain.
  2. Use the following command to shut down the domain:
    # xm shutdown DomainName -w

    In the preceding command, DomainName is the name of the domain.

    Note:

    Use the -w option so that the xm command waits until the domain shutdown completes before returning. The xm shutdown command performs the same orderly shutdown as runningshutdown -h now within the user domain.

    To shut down all user domains within the management domain, use the following command:

    # xm shutdown -a -w
    

    The following is an example of the output:

    Domain dm01db01vm04 terminated
    All domains terminated
    

5.10 Backing Up and Restoring Oracle Databases on Oracle VM User Domains

Backing up and restoring Oracle databases on Oracle VM user domains is the same as backing up and restoring Oracle databases on physical nodes.

  • When backing up Oracle databases on Oracle VM user domains you must set the following four parameters in the /etc/sysctl.conf file on the database nodes (user domains). If you are using Oracle Exadata storage to hold the backups, the parameters need to be set in the /etc/sysctl.conf file on the Exadata storage cells as well.
    net.core.rmem_default = 4194304
    net.core.wmem_default = 4194304
    net.core.rmem_max = 4194304
    net.core.wmem_max = 4194304
  • If you are using Exadata storage, each Oracle VM RAC cluster requires its own Oracle Automatic Storage Management (Oracle ASM) disk group to be designated as the fast recovery area (FRA) such as +RECO. Refer to the "Exadata Database Machine Backup and Restore Configuration and Operational Best Practices" white paper for details.
  • If you are using Oracle ZFS Storage Appliance, refer to the "Protecting Exadata Database Machine with the Oracle ZFS Storage Appliance: Configuration Best Practices" white paper for details.

5.11 Modifying the Memory Allocated to a User Domain

The following procedure describes how to modify the memory allocated to a user domain:

Note:

If you are decreasing the amount of memory allocated to a user domain, you must first review and adjust the SGA size of databases running in the user domain and the corresponding huge pages operating system configuration. Failing to do so may result in user domain that cannot start because too much memory is reserved for huge pages when the Linux operating system attempts to boot. See My Oracle Support note 361468.1 for details.

Note:

This operation requires user domain restart. It is not supported to modify memory allocation using the xm mem-set command.

  1. Connect to the management domain.
  2. Use the following command to determine the amount of free memory available, when increasing the allocation:
    # xm info | grep free_memory
    

    Note:

    When assigning free memory to a user domain, approximately 1 to 2 percent of free memory is used for metadata and control structures. Therefore, the amount of memory increase possible is 1 to2 percent less than free memory value.

  3. Shut down the user domain gracefully using the name obtained from the xm list command. Use the -w option so the xm command waits until the domain is shut down before returning.
    # xm shutdown DomainName -w
    

    In the preceding command, DomainName is the name of the domain.

  4. Create a backup copy of the/EXAVMIMAGES/GuestImages/DomainName/vm.cfg file.
  5. Edit the memory and maxmem settings in the /EXAVMIMAGES/GuestImages/DomainName/vm.cfg file using a text editor. The memory and maxmem settings must be identical values.

    Note:

    If the memory and maxmem parameters are not identical values, then InfiniBand network interfaces are not configured during user domain start, which prevents proper Oracle CRS and database startup.

  6. Use the following command to start the user domain:
    # xm create /EXAVMIMAGES/GuestImages/DomainName/vm.cfg
    

    Note:

    To see Oracle Linux boot messages during user domain startup, connect to the console during startup using the -c option. To disconnect from the console after startup is complete, press CTRL+].

5.12 Modifying the Number of Virtual CPUs Allocated to a User Domain

Note the following about modifying the number of virtual CPUs (vCPUs):

  • All actions to modify the number of vCPUs allocated to a user domain are performed in the management domain.

  • The number of vCPUs allowed for a user domain may be changed dynamically to a lower value or to a higher value provided it does not exceed the setting of maxvcpus parameter for the user domain.

  • It is possible to over-commit vCPUs such that the total number of vCPUs assigned to all domains exceeds the number of physical CPUs on the system. However, over-committing CPUs should be done only when competing workloads for oversubscribed resources are well understood and concurrent demand does not exceed physical capacity.

The following procedure describes how to modify the number of virtual CPUs allocated to a user domain:

  1. Determine the number of physical CPUs as follows:

    1. Use the following command in the management domain:

      # xm info | grep -A3 nr_cpus
      nr_cpus                : 24
      nr_nodes               : 2
      cores_per_socket       : 6
      threads_per_core       : 2
      

      In the output, note that the nr_nodes line refers to the number of sockets. The Exadata database server where the command is run is a 2-socket 6 cores per socket processor, resulting in 24 physical CPU threads (2 sockets x 6 cores/socket = 12 cores. 12 cores x 2 threads per core = 24 CPU threads).

    2. Use the following command to determine the current setting of vCPUs configured and online for a user domain:

      # xm list DomainName -l | grep vcpus
      
          (vcpus 4)
          (online_vcpus 2)
      

      In the preceding command, DomainName is the name of the user domain. The output from the command indicates the maximum number of vCPUs for the user domain is 4, and the current number of online vCPUs is 2. This user domain may have the number of online vCPUs adjusted to any value not greater than the vcpus parameter while the user domain remains online. The user domain must be taken offline to increase the number of online vCPUs to a value higher than the vcpus parameter.

  2. Reduce or increase the number of vCPUs as follows:

    • To reduce the number of vCPUs:

      1. Determine the currently allocated number of vCPUs for the user domain using the following command:

        # xm list DomainName
        
      2. Reduce the currently allocated number of vCPUs using the following command:

        # xm vcpu-set DomainName vCPUs_preferred
        

        In the preceding command, vCPUs_preferred is the value of the preferred number of vCPUs

    • To increase the number of vCPUs

      1. Determine the current settings of the vcpus parameter using the following command:

        # xm list DomainName -l | grep vcpus
            (vcpus 4)
            (online_vcpus 2)
        
      2. If the preferred number of vCPUs is less than or equal to the value of the vcpus parameter, then run the following command to increase the number of online vCPUs.

        # xm vcpu-set DomainName vCPUs_preferred
        

        In the preceding command, vCPUs_preferred is the value of the preferred number of vCPUs

      3. If the preferred number of vCPUs is greater than the value of the vcpus parameter, then the user domain must be taken offline to increase the number of online vCPUs to a value higher than the vcpus parameter. Do the following:

        i. Shut down the user domain.

        ii. Create a backup copy of the /EXAVMIMAGES/GuestImages/DomainName/vm.cfg file.

        iii. Edit the /EXAVMIMAGES/GuestImages/DomainName/vm.cfg file to set the vcpus parameter to the desired number of vCPUs.

        Note: By default a user domain will online the number of vCPUs configured via the vcpus parameter. If you want a user domain to start with some vCPUs offline, then add the maxvcpus parameter to vm.cfg, setting it to the maximum number of vCPUs the user domain is permitted to have online. Set the vcpus parameter to the number of vCPUs to online when the user domain starts. For example, to start a user domain with 2 vCPUs online and to allow an additional 6 vCPUs to be added to the user domain while it remains online, use the following settings in vm.cfg:

        maxvcpus=8
        vcpus=2
        

        iv. Start the user domain.

5.13 Increasing the Disk Space in a User Domain

You can increase the size of Logical Volume Manager (LVM) partitions, swap space, and file systems in a user domain.

5.13.1 Adding a New LVM Disk to a User Domain

This procedure describes how to add a new LVM disk to a user domain to increase the amount of usable LVM disk space in a user domain. This procedure is done so that the size of a file system or swap LVM partition can be increased. This procedure is performed while the system remains online.

Note:

This procedure requires steps be run in the management domain (Domain-0), and in the user domain.

Run all steps in this procedure as the root user.

  1. In the management domain, verify the free disk space in /EXAVMIMAGES using the following command:
    # df -h /EXAVMIMAGES
    

    The following is an example of the output from the command:

    Filesystem            Size  Used Avail Use% Mounted on
     /dev/sda3            721G  111G  611G  16% /EXAVMIMAGES
    
  2. In the management domain, select a name for the new disk image, and verify that the name is not already used in the user domain.
    # ls -l /EXAVMIMAGES/GuestImages/DomainName/new_disk_image_name
    
    ls: /EXAVMIMAGES/GuestImages/DomainName/new_disk_image_name: No such file or \
    directory
    

    In the preceding command, DomainName is the name of the domain, and new_disk_image_name is the new disk image name.

  3. In the management domain, create a new disk image.
    # qemu-img create /EXAVMIMAGES/GuestImages/DomainName/new_disk_image_name size
    

    In the following example of the command, the new disk image name is pv2_vgexadb.img, and the image size is 10 GB.

    # qemu-img create /EXAVMIMAGES/GuestImages/DomainName/pv2_vgexadb.img 10G
    
  4. In the user domain, determine an available disk name. In the following example, disk names xvda through xvdd are used, and disk name xvde is unused.
    # lsblk -id
    NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
    xvda 202:0    0  13G  0 disk 
    xvdb 202:16   0  20G  0 disk /u01/app/12.1.0.2/grid
    xvdc 202:32   0  20G  0 disk /u01/app/oracle/product/12.1.0.2/dbhome_1
    xvdd 202:48   0  41G  0 disk
    
  5. In the management domain, attach the new disk image to the user domain in read/write mode. In the following example, the new disk image is presented in the user domain as device /dev/xvde.
    # xm block-attach DomainName     \
    file:/EXAVMIMAGES/GuestImages/DomainName/new_disk_image_name /dev/xvde w
    
  6. In the user domain, verify the disk device is available. In the following example, disk name xvde is available in the user domain.
    # lsblk -id
    NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
    xvda 202:0    0  13G  0 disk 
    xvdb 202:16   0  20G  0 disk /u01/app/12.1.0.2/grid
    xvdc 202:32   0  20G  0 disk /u01/app/oracle/product/12.1.0.2/dbhome_1
    xvdd 202:48   0  41G  0 disk 
    xvde 202:64   0  10G  0 disk
    
  7. In the user domain, partition the new disk device. In the following example, disk device /dev/xvde is partitioned.
    # parted /dev/xvde mklabel gpt
    # parted -s /dev/xvde mkpart primary 0 100%
    # parted -s /dev/xvde set 1 lvm on
    

    The parted mkpart command may report the following message. This message can be ignored:

    Warning: The resulting partition is not properly aligned for best performance.
    
  8. In the user domain, create an LVM physical volume on the new disk partition. In the following example, an LVM physical volume is created on disk partition /dev/xvde1.
    # pvcreate /dev/xvde1
    
  9. In the user domain, extend the volume group and verify the additional space in the volume group. In the following example, disk name xvde is now available in the user domain.
    # vgextend VGExaDb /dev/xvde1
    # vgdisplay -s
    
  10. In the management domain, make a backup of the user domain configuration file vm.cfg.
    # cp /EXAVMIMAGES/GuestImages/DomainName/vm.cfg   \
         /EXAVMIMAGES/GuestImages/DomainName/vm.cfg.backup
    
  11. In the management domain, obtain the UUID of the user domain using the following command:
    # grep ^uuid /EXAVMIMAGES/GuestImages/DomainName/vm.cfg
    

    In the following example, the user domain UUID is 49ffddce4efe43f5910d0c61c87bba58.

    # grep ^uuid /EXAVMIMAGES/GuestImages/dm01db01vm01/vm.cfg
    uuid = '49ffddce4efe43f5910d0c61c87bba58'
    
  12. In the management domain, generate a UUID for the new disk image using the following command:
    # uuidgen | tr -d '-'
    

    In the following example, the new disk UUID is 0d56da6a5013428c97e73266f81c3404.

    # uuidgen | tr -d '-'
    0d56da6a5013428c97e73266f81c3404
    
  13. In the management domain, create a symbolic link from /OVS/Repositories to the new disk image using the following command:
    # ln -s /EXAVMIMAGES/GuestImages/DomainName/newDiskImage.img    \
     /OVS/Repositories/user_domain_uuid/VirtualDisks/new_disk_uuid.img
    

    In the following example, a symbolic link is created to the new disk image file pv2_vgexadb.img for user domain dm01db01vm01. The UUID for user domain dm01db01vm01 is 49ffddce4efe43f5910d0c61c87bba58. The UUID for the new disk image is 0d56da6a5013428c97e73266f81c3404.

    # ln -s /EXAVMIMAGES/GuestImages/dm01db01vm01/pv2_vgexadb.img \
    /OVS/Repositories/49ffddce4efe43f5910d0c61c87bba58/VirtualDisks/   \
    0d56da6a5013428c97e73266f81c3404.img
    
  14. In the management domain, append an entry for the new disk to the disk parameter in the user domain configuration file vm.cfg. This makes the new disk image attach automatically to the user domain during the next startup. The new entry matches the following format:
    'file:/OVS/Repositories/user_domain_uuid/VirtualDisks/new_disk_uuid.img,disk_device,w'
    

    The following is an example of an original disk parameter entry in the vm.cfg file:

    disk=['file:/OVS/Repositories/49ffddce4efe43f5910d0c61c87bba58/VirtualDisks/  \
    76197586bc914d3d9fa9d4f092c95be2.img,xvda,w',                                 \
    'file:/OVS/Repositories/49ffddce4efe43f591 0d0c61c87bba58/VirtualDisks/       \
    78470933af6b4253b9ce27814ceddbbd.img,xvdb,w',                                 \
    'file:/OVS/Repositories/49ffddce4efe43f5910d0c61c87bba58/VirtualDisks/        \
    20d5528f5f9e4fd8a96f151a13d2006b.img,xvdc,w',                                 \
    'file:/OVS/Repositories/49ffddce4efe43f5910d0c61c87bba58/VirtualDisks/        \
    058af368db2c4f27971bbe1f19286681.img,xvdd,w']
    

    The following example shows an entry appended to the disk parameter for a new disk image that is accessible within the user domain as disk device /dev/xvde:.

    disk=['file:/OVS/Repositories/49ffddce4efe43f5910d0c61c87bba58/VirtualDisks/  \
    76197586bc914d3d9fa9d4f092c95be2.img,xvda,w',                                 \
    'file:/OVS/Repositories/49ffddce4efe43f591 0d0c61c87bba58/VirtualDisks/       \
    78470933af6b4253b9ce27814ceddbbd.img,xvdb,w',                                 \
    'file:/OVS/Repositories/49ffddce4efe43f5910d0c61c87bba58/VirtualDisks/        \
    20d5528f5f9e4fd8a96f151a13d2006b.img,xvdc,w',                                 \
    'file:/OVS/Repositories/49ffddce4efe43f5910d0c61c87bba58/VirtualDisks/        \
    058af368db2c4f27971bbe1f19286681.img,xvdd,w',                                 \
    'file:/OVS/Repositories/49ffddce4efe43f5910d0c61c87bba58/VirtualDisks/        \
    0d56da6a5013428c97e73266f81c3404.img,xvde,w']
    

5.13.2 Increasing the Size of the root File System

This procedure describes how to increase the size of the system partition and / (root) file system.

This procedure is performed while the file system remains online.

Note:

There are two system partitions, LVDbSys1 and LVDbSys2. One partition is active and mounted. The other partition is inactive and used as a backup location during upgrade. The size of both system partitions must be equal.

Keep at least 1 GB of free space in the VGExaDb volume group. The free space is used for the LVM snapshot created by the dbnodeupdate.sh utility during software maintenance. If you make snapshot-based backups of the / (root) and /u01 directories as described in "Creating a Snapshot-Based Backup of Oracle Linux Database Server," then keep at least 6 GB of free space in the VGExaDb volume group.

  1. Collect information about the current environment.
    1. Use the df command to identify the amount of free and used space in the root partition (/)
      # df -h /
      

      The following is an example of the output from the command:

      Filesystem            Size  Used Avail Use% Mounted on
      /dev/mapper/VGExaDb-LVDbSys1
                             12G  5.1G  6.2G  46% / 
      

      Note:

      The active root partition may be either LVDbSys1 or LVDbSys2, depending on previous maintenance activities.

    2. Use the lvs command to display the current logical volume configuration.
      # lvs -o lv_name,lv_path,vg_name,lv_size
      

      The following is an example of the output from the command:

      LV        Path                   VG      LSize 
      LVDbOra1  /dev/VGExaDb/LVDbOra1  VGExaDb 10.00g
      LVDbSwap1 /dev/VGExaDb/LVDbSwap1 VGExaDb  8.00g
      LVDbSys1  /dev/VGExaDb/LVDbSys1  VGExaDb 12.00g
      LVDbSys2  /dev/VGExaDb/LVDbSys2  VGExaDb 12.00g 
      
  2. Verify there is available space in the volume group VGExaDb using the vgdisplay command.
    # vgdisplay VGExaDb -s
    

    The following is an example of the output from the command:

    "VGExaDb" 53.49 GiB [42.00 GiB used / 11.49 GiB free]
    

    The volume group must contain enough free space to increase the size of both system partitions, and maintain at least 1 GB of free space for the LVM snapshot created by the dbnodeupdate.sh utility during upgrade. If there is not sufficient free space in the volume group, then add a new disk to LVM.

  3. Resize both LVDbSys1 and LVDbSys2 logical volumes using the lvextend command.
    # lvextend -L +size /dev/VGExaDb/LVDbSys1
    # lvextend -L +size /dev/VGExaDb/LVDbSys2
    

    In the preceding command, size is the amount of space to be added to the logical volume. The amount of space added to each system partition must be the same.

    The following example extends the logical volumes by 10 GB:

    # lvextend -L +10G /dev/VGExaDb/LVDbSys1
    # lvextend -L +10G /dev/VGExaDb/LVDbSys2
    
  4. Resize the file system within the logical volume using the resize2fs command.
    # resize2fs /dev/VGExaDb/LVDbSys1
    # resize2fs /dev/VGExaDb/LVDbSys2
    
  5. Verify the space was extended for the active system partition using the df command.
    # df -h /
    

5.13.3 Increasing the Size of the /u01 File System

This procedure describes how to increase the size of the /u01 file system.

This procedure is performed while the file system remains online.

Note:

Keep at least 1 GB of free space in the VGExaDb volume group. The free space is used for the LVM snapshot created by the dbnodeupdate.sh utility during software maintenance. If you make snapshot-based backups of the / (root) and /u01 directories as described in "Creating a Snapshot-Based Backup of Oracle Linux Database Server," then keep at least 6 GB of free space in the VGExaDb volume group

  1. Collect information about the current environment.
    1. Use the df command to identify the amount of free and used space in the /u01 partition.
      # df -h /u01
      

      The following is an example of the output from the command:

      Filesystem            Size  Used Avail Use% Mounted on
      /dev/mapper/VGExaDb-LVDbOra1
                            9.9G  1.7G  7.8G  18% /u01
      
    2. Use the lvs command to display the current logical volume configuration used by the /u01 file system.
      # lvs -o lv_name,lv_path,vg_name,lv_size /dev/VGExaDb/LVDbOra1
      

      The following is an example of the output from the command:

      LV        Path                  VG       LSize 
      LVDbOra1 /dev/VGExaDb/LVDbOra1  VGExaDb 10.00g
      
  2. Verify there is available space in the volume group VGExaDb using the vgdisplay command.
    # vgdisplay VGExaDb -s
    

    The following is an example of the output from the command:

    "VGExaDb" 53.49 GiB [42.00 GiB used / 11.49 GiB free]
    

    If the output shows there is less than 1 GB of free space, then neither the logical volume nor file system should be extended. Maintain at least 1 GB of free space in the VGExaDb volume group for the LVM snapshot created by the dbnodeupdate.sh utility during an upgrade. If there is not sufficient free space in the volume group, then add a new disk to LVM.

  3. Resize the logical volume using the lvextend command.
    # lvextend -L +sizeG /dev/VGExaDb/LVDbOra1
    

    In the preceding command, size is the amount of space to be added to the logical volume.

    The following example extends the logical volume by 10 GB:

    # lvextend -L +10G /dev/VGExaDb/LVDbOra1
    
  4. Resize the file system within the logical volume using the resize2fs command.
    # resize2fs /dev/VGExaDb/LVDbOra1
    
  5. Verify the space was extended using the df command.
    # df -h /u01
    

5.13.4 Increasing the Size of the Grid Infrastructure Home or Database Home File System

You can increase the size of the Oracle Grid Infrastructure or Oracle Database home file system in a user domain.

The Oracle Grid Infrastructure software home and the Oracle Database software home are created as separate disk image files in the management domain. The disk image files are located in the /EXAVMIMAGES/GuestImages/DomainName/ directory. The disk image files are attached to the user domain automatically during virtual machine startup, and mounted as separate, non-LVM file systems in the user domain.

  1. Connect to the user domain, and check the file system size using the df command, where $ORACLE_HOME is an environment variable that points to the Oracle Database home directory, for example, /u01/app/oracle/product/12.1.0.2/dbhome_1.
    # df -h $ORACLE_HOME
    

    The following is an example of the output from the command:

    Filesystem  Size  Used Avail Use% Mounted on
     /dev/xvdc    20G  6.5G   13G  35% /u01/app/oracle/product/12.1.0.2/dbhome_1
    
  2. Connect to the management domain, and then shut down the user domain using the xm command and specifying the name of the domain.
    # xm shutdown DomainName
    
  3. Create an OCFS reflink to serve as a backup of the disk image that will be increased, where version is the release number for example, 12.1.0.2.1-3.
    # cd /EXAVMIMAGES/GuestImages/DomainName
    # reflink dbversion.img before_resize.dbversion.img
    
  4. Create an empty disk image using the qemu-img command, and append it to the database home disk image.

    The empty disk image size is the size to extend the file system. The last command removes the empty disk image after appending to the database home disk image.

    # qemu-img create emptyfile 10G
    # cat emptyfile >> dbversion.img
    # rm emptyfile
    
  5. Check the file system using the e2fsck command.
    # e2fsck -f dbversion.img
    
  6. Resize the file system using the resize2fs command.
    # resize2fs dbversion.img
    
  7. Start the user domain.
    # xm create /EXAVMIMAGES/GuestImages/DomainName/vm.cfg
    
  8. Connect to the user domain, and verify the file system size was increased.
    # df -h $ORACLE_HOME
    

    The following is an example of the output from the command:

    Filesystem      Size  Used Avail Use% Mounted on
    /dev/xvdc        30G  6.5G   22G  23% /u01/app/oracle/product/12.1.0.2/dbhome_1
    
  9. Connect to the management domain, and remove the backup image.

    Use a command similar to the following where back_up_image.img is the name of the backup image file:

    # cd /EXAVMIMAGES/GuestImages/DomainName
    # rm back_up_image.img
    

5.13.5 Increasing the Size of the Swap Area

This procedure describes how to increase the amount of swap configured in a user domain.

  1. Verify there is available space in the volume group VGExaDb using the vgdisplay command.
    # vgdisplay VGExaDb -s
    

    The following is an example of the output from the command:

    "VGExaDb" 53.49 GiB [42.00 GiB used / 11.49 GiB free]
    

    If the command shows that there is less than 1 GB of free space, then neither the logical volume nor file system should be extended. Maintain at least 1 GB of free space in the VGExaDb volume group for the LVM snapshot created by the dbnodeupdate.sh utility during an upgrade. If there is not sufficient free space in the volume group, then add a new disk to LVM.

  2. Create a new logical volume of the size to increase swap space using the lvcreate command.

    In the following example, a new 8 GB logical volume named LVDbSwap2 is created.

    # lvcreate -L 8G -n LVDbSwap2 VGExaDb
    
  3. Setup the new logical volume as a swap device with a unique label, such as SWAP2, using the mkswap command. The unique label is a device LABEL entry that is currently unused in the /etc/fstab file.
    # mkswap -L SWAP2 /dev/VGExaDb/LVDbSwap2
    
  4. Enable the new swap device using the swapon command.
    # swapon -L SWAP2
    
  5. Verify the new swap device is enabled using the swapon command.
    # swapon -s
    

    The following is an example of the output from the command:

    Filename         Type            Size      Used     Priority
    /dev/dm-3        partition       8388604   306108   -1
    /dev/dm-4        partition       8388604   0         -2
    
  6. Edit the /etc/fstab file to add the new swap device by copying the existing swap entry, and then changing the LABEL value in the new entry to the label used to create the new swap device. In the following example, the new swap device was added to the /etc/fstab file as LABEL=SWAP2.
    # cat /etc/fstab
    LABEL=DBSYS   /                       ext4    defaults        1 1
    LABEL=BOOT    /boot                   ext4    defaults,nodev        1 1
    tmpfs         /dev/shm                tmpfs   defaults,size=7998m 0
    devpts        /dev/pts                devpts  gid=5,mode=620  0 0
    sysfs         /sys                    sysfs   defaults        0 0
    proc          /proc                   proc    defaults        0 0
    LABEL=SWAP    swap                    swap    defaults        0 0
    LABEL=SWAP2   swap                    swap    defaults        0 0
    LABEL=DBORA   /u01                    ext4    defaults        1 1
    /dev/xvdb     /u01/app/12.1.0.2/grid  ext4    defaults        1 1
    /dev/xvdc       /u01/app/oracle/product/12.1.0.2/dbhome_1       ext4   defaults        1 1
    

5.14 Expanding /EXAVMIMAGES After Adding the Database Server Disk Expansion Kit

With the addition of a disk expansion kit to the database server, it is important to follow proper procedures to add this additional space to the /EXAVMIMAGES file system.

5.14.1 Expanding /EXAVMIMAGES on Management Domain on Release 18.1.x or Later

If you are using a release of Oracle Exadata System Software release 18c (18.1.0) or later, then use this procedure to expand the /EXAVMIMAGES file system on the management domain following the addition of a disk expansion kit.

During deployment, all available disk space on a database server will be allocated in the management domain (dom0) with the majority of the space allocated to /EXAVMIMAGES for user domain storage. The /EXAVMIMAGES file system is created on /dev/VGExaDb/LVDbExaVMImages.

In the example below, dm01db01 is the name of the management domain, and dm01db01vm01 is a user domain.

  1. Ensure reclaimdisks.sh has been run in the management domain by using the -check option.

    Note that the last line reads "Layout: DOM0". If reclaimdisks.sh was not run, it would read "Layout: DOM0 + Linux".

    [root@dm01db01 ~]# /opt/oracle.SupportTools/reclaimdisks.sh -check
    Model is ORACLE SERVER X6-2
    Number of LSI controllers: 1
    Physical disks found: 4 (252:0 252:1 252:2 252:3)
    Logical drives found: 1
    Linux logical drive: 0
    RAID Level for the Linux logical drive: 5
    Physical disks in the Linux logical drive: 4 (252:0 252:1 252:2 252:3)
    Dedicated Hot Spares for the Linux logical drive: 0
    Global Hot Spares: 0
    Valid. Disks configuration: RAID5 from 4 disks with no global and dedicated hot spare disks.
    Valid. Booted: DOM0. Layout: DOM0.
    
  2. Add the disk expansion kit to the database server.
    The kit consists of 4 additional hard drives to be installed in the 4 available slots. Remove the filler panels and install the drives. The drives may be installed in any order.
  3. Verify that the RAID reconstruction is completed by seeing the warning and clear messages in the alert history.

    This may take several hours to complete. The example below shows that it took approximately 7 hours. Once the clear message (message 1_2 below) is present, the reconstruction is completed and it is safe to proceed.

    [root@dm01db01 ~]# dbmcli -e list alerthistory
    
             1_1     2016-02-15T14:01:00-08:00       warning         "A disk
     expansion kit was installed. The additional physical drives were automatically
     added to the existing RAID5 configuration, and reconstruction of the
     corresponding virtual drive was automatically started."
    
             1_2     2016-02-15T21:01:01-08:00       clear           "Virtual drive
     reconstruction due to disk expansion was completed."
    
  4. Collect information about the current environment.
    [root@dm01db01 ~]# df -h /EXAVMIMAGES
    Filesystem            Size  Used Avail Use% Mounted on
    /dev/sda3             1.6T   44G  1.5T   3% /EXAVMIMAGES
    
    [root@dm01db01 ~]# xm list
    Name                                        ID   Mem VCPUs      State   Time(s)
    Domain-0                                     0  8192     4     r-----  94039.1
    dm01db01vm01.example.com                   4 16384     2     -b----   3597.3
    
  5. Stop all user domains by running the command xm shutdown –a –w from the management domain.

    After all user domains are shut down, only Domain-0 (the management domain) should be listed.

    [root@dm01db01 ~]# xm shutdown –a -w
    Domain dm01db01vm01.example.com terminated 
    All domains terminated
    
    [root@dm01db01 ~]# xm list
    Name                                        ID   Mem VCPUs      State   Time(s)
    Domain-0                                     0  8192     4     r-----  94073.4
    
  6. Run parted to view the sector start and end values.

    Check the size of the disk against the end of the third partition. If you see a request to fix the GPT, respond with F.

    root@dm01db01 ~]# parted /dev/sda 
    GNU Parted 2.1Using /dev/sda
    Welcome to GNU Parted! Type 'help' to view a list of commands.
    (parted) unit s 
    (parted) print
    Warning: Not all of the space available to /dev/sda appears to be used, you can
    fix the GPT to use all of the space (an extra 4679680000 blocks) or continue
    with the current setting? Fix/Ignore? F  
    
    Model: LSI MR9361-8i (scsi) 
    Disk /dev/sda: 8189440000s 
    Sector size (logical/physical): 512B/512B 
    Partition Table: gpt 
    
    Number  Start       End           Size         File system  Name     Flags 
    1       64s         1046591s      1046528s     ext3         primary  boot 
    4       1046592s    1048639s      2048s                     primary  bios_grub
    2       1048640s    240132159s    239083520s                primary  lvm 
    
    (parted) q

    The partition table shown above lists partition 2 as ending at sector 240132159 and disk size as 8189440000 sectors. You will use these values in step 7.

  7. Create a fourth partition.
    The start sector is the end of the third partition from step 6 plus 1 sector (240132159+1=240132160). The end sector of the fourth partition is the size of the disk minus 34 (8189440000-34=8189439966).
    [root@dm01db01 ~]# parted -s /dev/sda mkpart primary 240132160s 8189439966s 

    This command produces no output.

  8. Set the LVM flag for the fourth partition.
    [root@dm01db01 ~]# parted -s /dev/sda set 3 lvm on
    Warning: WARNING: the kernel failed to re-read the partition table on /dev/sda (Device or
     resource busy).  As a result, it may not reflect all of your changes until after reboot.
  9. Review the updated partition table.
    [root@dm01db01 ~]# parted -s /dev/sda unit s print
    Model: LSI MR9361-8i (scsi)
    Disk /dev/sda: 8189440000s
    Sector size (logical/physical): 512B/512B
    Partition Table: gpt 
    Number  Start        End          Size         File system  Name     Flags
    1       64s         1046591s      1046528s     ext4         primary  boot 
    4       1046592s    1048639s      2048s                     primary  bios_grub
    2       1048640s    240132159s    239083520s                primary  lvm 
    3       240132160s  8189439966s   7949307807s               primary  lvm
    
  10. Restart the Exadata server.
    [root@dm01db01 ~]# shutdown -r now
  11. Check the size of the disk against the end of the fourth partition.
    [root@dm01db01 ~]# parted -s /dev/sda unit s print
    Model: LSI MR9361-8i (scsi)
    Disk /dev/sda: 8189440000s
    Sector size (logical/physical): 512B/512B
    Partition Table: gpt 
    Number  Start        End          Size         File system  Name     Flags
    1       64s          1048639s     1048576s     ext4         primary  boot 
    4       1048640s     3509759966s  3508711327s               primary  lvm 
    2       3509759967s  8189439966s  4679680000s               primary  lvm
    3 
  12. Create a LVM physical volume (PV) on the newly created fourth partition.
    [root@dm01db01 ~]# lvm pvcreate --force /dev/sda3
      Physical volume "/dev/sda3" successfully created
  13. Extend the LVM volume group VGExaDb to the newly created third partition.
    [root@dm01db01 ~]# lvm vgextend VGExaDb /dev/sda3
      Volume group "VGExaDb" successfully extended
  14. Dismount the /EXAVMIMAGES OCFS2 partition.
    [root@dm01db01 ~]# umount /EXAVMIMAGES/
  15. Extend the logical volume that contains the OCFS2 partition to include the rest of the free space.
    [root@dm01db01 ~]# lvm lvextend -l +100%FREE /dev/VGExaDb/LVDbExaVMImages
    Size of logical volume VGExaDb/LVDbExaVMImages changed from 1.55 TiB (406549 extents) to 
    3.73 TiB (977798 extents).  
    Logical volume LVDbExaVMImages successfully resized.
  16. Resize the OCFS2 file system to the rest of the logical volume.

    The tunefs.ocfs2 command typically runs very quickly and does not produce output.

    [root@dm01db01 ~]# tunefs.ocfs2 -S /dev/VGExaDb/LVDbExaVMImages
    
  17. Mount the OCFS2 partition and then view the file system disk space usage for this partition.
    [root@dm01db01 ~]# mount -a
    
    [root@dm01db01 ~]# ls -al /EXAVMIMAGES/
    total 4518924
    drwxr-xr-x  3 root root        3896 Jul 18 18:01 .
    drwxr-xr-x 26 root root        4096 Jul 24 14:50 ..
    drwxr-xr-x  2 root root        3896 Jul 18 17:51 lost+found
    -rw-r-----  1 root root 26843545600 Jul 18 18:01 System.first.boot.12.2.1.1.8.180510.1.img
    
    [root@dm01db01 ~]# df -h /EXAVMIMAGES/
    Filesystem                            Size  Used Avail Use% Mounted on
    /dev/mapper/VGExaDb-LVDbExaVMImages   3.8T  9.0G  3.8T   1% /EXAVMIMAGES
    
  18. Restart the user domains.

5.14.2 Expanding /EXAVMIMAGES on Management Domain on Release 12.2.x

If you are using Oracle Exadata System Software release 12.2.x, then use this procedure to expand the /EXAVMIMAGES file system on the management domain following the addition of a disk expansion kit.

During deployment, all available disk space on a database server will be allocated in the management domain (dom0) with the majority of the space allocated to /EXAVMIMAGES for user domain storage. The /EXAVMIMAGES file system is created on /dev/VGExaDb/LVDbExaVMImages.

In the example below, dm01db01 is the name of the management domain, and dm01db01vm01 is a user domain.

  1. Ensure reclaimdisks.sh has been run in the management domain by using the -check option.

    Note that the last line reads "Layout: DOM0". If reclaimdisks.sh was not run, it would read "Layout: DOM0 + Linux".

    [root@dm01db01 ~]# /opt/oracle.SupportTools/reclaimdisks.sh -check
    Model is ORACLE SERVER X5-2
    Number of LSI controllers: 1
    Physical disks found: 4 (252:0 252:1 252:2 252:3)
    Logical drives found: 1
    Linux logical drive: 0
    RAID Level for the Linux logical drive: 5
    Physical disks in the Linux logical drive: 4 (252:0 252:1 252:2 252:3)
    Dedicated Hot Spares for the Linux logical drive: 0
    Global Hot Spares: 0
    Valid. Disks configuration: RAID5 from 4 disks with no global and dedicated hot spare disks.
    Valid. Booted: DOM0. Layout: DOM0.
    
  2. Add the disk expansion kit to the database server.
    The kit consists of 4 additional hard drives to be installed in the 4 available slots. Remove the filler panels and install the drives. The drives may be installed in any order.
  3. Verify that the RAID reconstruction is completed by seeing the warning and clear messages in the alert history.

    This may take several hours to complete. The example below shows that it took approximately 7 hours. Once the clear message (message 1_2 below) is present, the reconstruction is completed and it is safe to proceed.

    [root@dm01db01 ~]# dbmcli -e list alerthistory
    
             1_1     2016-02-15T14:01:00-08:00       warning         "A disk
     expansion kit was installed. The additional physical drives were automatically
     added to the existing RAID5 configuration, and reconstruction of the
     corresponding virtual drive was automatically started."
    
             1_2     2016-02-15T21:01:01-08:00       clear           "Virtual drive
     reconstruction due to disk expansion was completed."
    
  4. Collect information about the current environment.
    [root@dm01db01 ~]# df -h /EXAVMIMAGES
    Filesystem            Size  Used Avail Use% Mounted on
    /dev/sda3             1.6T   44G  1.5T   3% /EXAVMIMAGES
    
    [root@dm01db01 ~]# xm list
    Name                                        ID   Mem VCPUs      State   Time(s)
    Domain-0                                     0  8192     4     r-----  94039.1
    dm01db01vm01.example.com                   4 16384     2     -b----   3597.3
    
  5. Stop all user domains by running the command xm shutdown –a –w from the management domain.

    After all user domains are shut down, only Domain-0 (the management domain) should be listed.

    [root@dm01db01 ~]# xm shutdown –a -w
    Domain dm01db01vm01.example.com terminated 
    All domains terminated
    
    [root@dm01db01 ~]# xm list
    Name                                        ID   Mem VCPUs      State   Time(s)
    Domain-0                                     0  8192     4     r-----  94073.4
    
  6. Run parted to view the sector start and end values.

    Check the size of the disk against the end of the second partition. If you see a request to fix the GPT, respond with F.

    root@dm01db01 ~]# parted /dev/sda 
    GNU Parted 2.1Using /dev/sda
    Welcome to GNU Parted! Type 'help' to view a list of commands.
    (parted) unit s 
    (parted) print
    Warning: Not all of the space available to /dev/sda appears to be used, you can
    fix the GPT to use all of the space (an extra 4679680000 blocks) or continue
    with the current setting? Fix/Ignore? F  
    
    Model: LSI MR9361-8i (scsi) 
    Disk /dev/sda: 8189440000s 
    Sector size (logical/physical): 512B/512B 
    Partition Table: gpt 
    
    Number  Start       End           Size         File system  Name     Flags 
    1       64s         1048639s      1048576s     ext3         primary  boot 
    2       1048640s    3509759966s   3508711327s               primary  lvm 
    
    (parted) q

    The partition table shown above lists partition 2 as ending at sector 3509759966s and disk size as 8189440000. You will use these values in step 7.

  7. Create a third partition.
    The start sector is the end of the second partition from step 6 plus 1 sector (3509759966+1=3509759967). The end sector of the third partition is the size of the disk minus 34 (8189440000-34=8189439966).
    [root@dm01db01 ~]# parted -s /dev/sda mkpart primary 3509759967s 8189439966s 

    This command produces no output.

  8. Set the LVM flag for the third partition.
    [root@dm01db01 ~]# parted -s /dev/sda set 3 lvm on
    Warning: WARNING: the kernel failed to re-read the partition table on /dev/sda (Device or
     resource busy).  As a result, it may not reflect all of your changes until after reboot.
  9. Review the updated partition table.
    [root@dm01db01 ~]# parted -s /dev/sda unit s print
    Model: LSI MR9361-8i (scsi)
    Disk /dev/sda: 8189440000s
    Sector size (logical/physical): 512B/512B
    Partition Table: gpt 
    Number  Start        End          Size         File system  Name     Flags
    1       64s          1048639s     1048576s     ext4         primary  boot 
    2       1048640s     3509759966s  3508711327s               primary  lvm 
    3       3509759967s  8189439966s  4679680000s               primary  lvm
  10. Reboot the Exadata server.
    [root@dm01db01 ~]# shutdown -r now
  11. Check the size of the disk against the end of the third partition.
    [root@dm01db01 ~]# parted -s /dev/sda unit s print
    Model: LSI MR9361-8i (scsi)
    Disk /dev/sda: 8189440000s
    Sector size (logical/physical): 512B/512B
    Partition Table: gpt 
    Number  Start        End          Size         File system  Name     Flags
    1       64s          1048639s     1048576s     ext4         primary  boot 
    2       1048640s     3509759966s  3508711327s               primary  lvm 
    3       3509759967s  8189439966s  4679680000s               primary  lvm
  12. Create a LVM physical volume (PV) on the newly created third partition.
    [root@dm01db01 ~]# lvm pvcreate --force /dev/sda3
      Physical volume "/dev/sda3" successfully created
  13. Extend the LVM volume group VGExaDb to the newly created third partition.
    [root@dm01db01 ~]# lvm vgextend VGExaDb /dev/sda3
      Volume group "VGExaDb" successfully extended
  14. Dismount the /EXAVMIMAGES OCFS2 partition.
    [root@dm01db01 ~]# umount /EXAVMIMAGES/
  15. Extend the logical volume that contains the OCFS2 partition to include the rest of the free space.
    [root@dm01db01 ~]# lvm lvextend -l +100%FREE /dev/VGExaDb/LVDbExaVMImages
    Size of logical volume VGExaDb/LVDbExaVMImages changed from 1.55 TiB (406549 extents) to 
    3.73 TiB (977798 extents).  
    Logical volume LVDbExaVMImages successfully resized.
  16. Resize the OCFS2 file system to the rest of the logical volume.

    The tunefs.ocfs2 command typically runs very quickly and does not produce output.

    [root@dm01db01 ~]# tunefs.ocfs2 -S /dev/VGExaDb/LVDbExaVMImages
    
  17. Mount the OCFS2 partition and then view the file system disk space usage for this partition.
    [root@dm01db01 ~]# mount -a
    
    [root@dm01db01 ~]# ls -al /EXAVMIMAGES/
    total 4518924
    drwxr-xr-x  3 root root        3896 Jul 18 18:01 .
    drwxr-xr-x 26 root root        4096 Jul 24 14:50 ..
    drwxr-xr-x  2 root root        3896 Jul 18 17:51 lost+found
    -rw-r-----  1 root root 26843545600 Jul 18 18:01 System.first.boot.12.2.1.1.8.180510.1.img
    
    [root@dm01db01 ~]# df -h /EXAVMIMAGES/
    Filesystem                            Size  Used Avail Use% Mounted on
    /dev/mapper/VGExaDb-LVDbExaVMImages   3.8T  9.0G  3.8T   1% /EXAVMIMAGES
    
  18. Restart the user domains.

5.14.3 Expanding /EXAVMIMAGES on Management Domain on Releases Earlier than 12.2.x

If you are using a release of Oracle Exadata System Software that is release 12.1.x or earlier, then use this procedure to expand the /EXAVMIMAGES directory on the management domain following the addition of a disk expansion kit.

During deployment, all available disk space on a database server will be allocated in the management domain with the majority of the space allocated to /EXAVMIMAGES for user domain storage. The /EXAVMIMAGES file system is created on /dev/sda3.

In the example below, dm01db01 is the name of the management domain, and dm01db01vm01 is a user domain.

  1. Ensure reclaimdisks.sh has been run by using the -check option.

    Note that the last line reads "Layout: DOM0". If reclaimdisks.sh was not run, it would read "Layout: DOM0 + Linux".

    [root@dm01db01 ~]# /opt/oracle.SupportTools/reclaimdisks.sh -check
    Model is ORACLE SERVER X5-2
    Number of LSI controllers: 1
    Physical disks found: 4 (252:0 252:1 252:2 252:3)
    Logical drives found: 1
    Linux logical drive: 0
    RAID Level for the Linux logical drive: 5
    Physical disks in the Linux logical drive: 4 (252:0 252:1 252:2 252:3)
    Dedicated Hot Spares for the Linux logical drive: 0
    Global Hot Spares: 0
    Valid. Disks configuration: RAID5 from 4 disks with no global and dedicated hot spare disks.
    Valid. Booted: DOM0. Layout: DOM0.
    
  2. Add the disk expansion kit to the database server.
    The kit consists of 4 additional hard drives to be installed in the 4 available slots. Remove the filler panels and install the drives. The drives may be installed in any order.
  3. Verify that the RAID reconstruction is completed by seeing the warning and clear messages in the alert history.

    This may take several hours to complete. The example below shows that it took approximately 7 hours. Once the clear message (message 1_2 below) is present, the reconstruction is completed and it is safe to proceed.

    [root@dm01db01 ~]# dbmcli -e list alerthistory
    
             1_1     2016-02-15T14:01:00-08:00       warning         "A disk
     expansion kit was installed. The additional physical drives were automatically
     added to the existing RAID5 configuration, and reconstruction of the
     corresponding virtual drive was automatically started."
    
             1_2     2016-02-15T21:01:01-08:00       clear           "Virtual drive
     reconstruction due to disk expansion was completed."
    
  4. Collect information about the current environment.
    [root@dm01db01 ~]# cat /proc/partitions |grep sda
       8        0 4094720000 sda
       8        1     524288 sda1
       8        2  119541760 sda2
       8        3 1634813903 sda3
    
    [root@dm01db01 ~]# df -h /EXAVMIMAGES
    Filesystem            Size  Used Avail Use% Mounted on
    /dev/sda3             1.6T   44G  1.5T   3% /EXAVMIMAGES
    
    [root@dm01db01 ~]# xm list
    Name                                        ID   Mem VCPUs      State   Time(s)
    Domain-0                                     0  8192     4     r-----  94039.1
    dm01db01vm01.example.com                   4 16384     2     -b----   3597.3
    
  5. Stop all user domain guests by running the command xm shutdown –a –w from the management domain.

    After all user domain guests are shut down, only Domain-0 (dom0) should be listed.

    [root@dm01db01 ~]# xm shutdown –a -w
    Domain dm01db01vm01.example.com terminated 
    All domains terminated
    
    [root@dm01db01 ~]# xm list
    Name                                        ID   Mem VCPUs      State   Time(s)
    Domain-0                                     0  8192     4     r-----  94073.4
    
  6. Run parted to verify the partition size.

    If you see a request to fix the GPT, respond with F.

    root@dm01db01 ~]# parted /dev/sda 
    GNU Parted 2.1Using /dev/sda
    Welcome to GNU Parted! Type 'help' to view a list of commands.
    (parted) print
    Warning: Not all of the space available to /dev/sda appears to be used, you can
    fix the GPT to use all of the space (an extra 4679680000 blocks) or continue
    with the current setting? Fix/Ignore? F  
    
    Model: LSI MR9361-8i (scsi)
    Disk /dev/sda: 4193GB
    Sector size (logical/physical): 512B/512B
    Partition Table: gpt
    
    Number  Start   End     Size    File system  Name     Flags
     1      32.8kB  537MB   537MB   ext3         primary  boot 
     2      537MB   123GB   122GB                primary  lvm  
     3      123GB   1797GB  1674GB               primary       
    
    (parted) q
    The partition table shown above lists partition 3 as 1674 GB. The size of the disk that contains this partition (/dev/sda) is 4193 GB.
  7. Run parted to view the sector start and end values.
    root@dm01db01 ~]# parted -s /dev/sda unit s print
    Model: LSI MR9361-8i (scsi) 
    Disk /dev/sda: 8189440000s 
    Sector size (logical/physical): 512B/512B 
    Partition Table: gpt 
    
    Number  Start       End          Size         File system  Name     Flags 
    1       64s         1048639s     1048576s     ext3         primary  boot 
    2       1048640s    240132159s   239083520s                primary  lvm 
    3       240132160s  3509759965s  3269627806s               primary
    
    The partition table shown above lists partition 3 as starting at sector 240132160 and disk size as 8189440000. You will use these values in step 10.
  8. Dismount the /EXAVMIMAGES file system on the management domain.
    [root@dm01db01 ~]# umount /EXAVMIMAGES
  9. Remove partition 3.
    [root@dm01db01 ~]# parted -s /dev/sda rm 3
    

    This command produces no output.

  10. Re-create the partition specifying the same starting sector and the new end of the partition sector. The new end of partition sector is calculated by subtracting 34 from the disk size, for example: 8189440000 - 34 = 8189439966
    [root@dm01db01 ~]# parted -s /dev/sda mkpart primary 240132160s 8189439966s 
    

    You might encounter the following warning:

    Warning: WARNING: the kernel failed to re-read the partition table on /dev/sda 
    (Device or resource busy).  As a result, it may not reflect all of your changes until after 
    reboot.

    If you encounter this error, restart the Exadata database server to apply the changes in the partition table.

  11. Mount the /EXAVMIMAGES partition again and view the file system disk space usage for this partition.
    [root@dm01db01 ~]# mount /EXAVMIMAGES
    
    [root@dm01db01 ~]# df -h /EXAVMIMAGES
    Filesystem            Size  Used Avail Use% Mounted on
    /dev/sda3             1.6T   44G  1.5T   3% /EXAVMIMAGES
    

    Note that the size of the file system is still the same, 1.6 TB, as in step 4.

  12. Verify that the partition table as seen by the kernel shows the updated size for partition 3.

    The output for sda3 should now be larger compared to the output observed earlier in step 4.

    [root@dm01db01 ~]# cat /proc/partitions |grep sda
       8        0 4094720000 sda
       8        1     524288 sda1
       8        2  119541760 sda2
       8        3 3974653903 sda3
    
  13. Expand the file system.

    You can do this while the file system is mounted and processes are running. Note the updated file system size, compared to the value in step 4. The tunefs.ocfs2 command typically runs very quickly and should have no output normally.

    [root@dm01db01 ~]# tunefs.ocfs2 -S /dev/sda3
    
    [root@dm01db01 ~]# df -h /EXAVMIMAGES
    Filesystem            Size  Used Avail Use% Mounted on
    /dev/sda3             3.8T   44G  3.7T   2% /EXAVMIMAGES
    
  14. Restart the user domains.

5.15 Creating Oracle VM Oracle RAC Clusters

This procedure creates Oracle VM Oracle RAC clusters using Oracle Exadata Deployment Assistant (OEDA) configuration tool and deployment tool.

The requirements for adding an Oracle VM Oracle RAC cluster are as follows:

  • The system has already been deployed with one or more Oracle VM Oracle RAC clusters.

  • System has available resources, such as memory, CPU, local disk space, and Oracle Exadata Storage Server disk space.

  • OEDA deployment files used for initial system configuration are available.

  1. Verify there are sufficient resources to add a new guest in the kvmhost.

    If you are creating an Oracle VM Oracle RAC cluster, then verify resources in all kvmhosts where you are creating a new guest.

  2. Use the following command to verify the Oracle Exadata Storage Server disk space:
    # dcli -l celladmin -g cell_group "cellcli -e 'list celldisk attributes name, \
     diskType, freeSpace where freeSpace>0'"
    
  3. Download the latest OEDA from My Oracle Support note 888828.1, and place it on a system capable of running a graphic-based program.

    By default, database servers in Oracle Exadata Database Machine contain only packages required to run Oracle Database, and are not capable of running OEDA configuration tool.

  4. Obtain the OEDA template files used to deploy the system.
  5. Run the OEDA configuration tool as follows:
    1. Click Import.
    2. Select and open the XML file used to deploy the system with the name CustomerName-NamePrefix.xml.
    3. Click Next as needed to get to the Define Clusters page, and verify the IP address and host name information as you navigate the pages. If there have been no networking changes since the initial deployment, then no changes are needed.
    4. Increment the number of clusters on the Define Clusters page.
    5. Select the new cluster tab to edit the cluster information. Do not change any other clusters.
    6. Enter a unique cluster name for the cluster.
    7. Select the Oracle VM Server and CELL components for the new cluster, and then click Add.

      Note:

      The recommended practice for best performance and simplest administration is to select all cells.
    8. Click Next as needed to get to the new cluster page. Do not change any other clusters.
    9. Enter the information for the new cluster. Information includes the virtual guest size, disk group details, and database name. The database name must be unique for all databases that use the same Oracle Exadata Storage Servers.
    10. Click Next to get to the Review and Edit page, and verify the information for the new cluster.
    11. Click Next as needed to get to the Generate page.
    12. Click Next to generate the new configuration files.
    13. Select the destination directory for the configuration files.
    14. Click Save.

      Note:

      If the Oracle VM Defaults were altered for this new cluster, then configuration details for existing clusters will be re-written to match the new template settings. For example, if you previously deployed vm01 as SMALL with memory=8GB, and then change the SMALL template to memory=10GB for this new VM, then the new OEDA XML files show vm01 with memory=10GB even though there was no intent to change vm01.
    15. Click Installation Template on the Finish page to review the details of the new cluster.
    16. Click Finish to exit the configuration tool.
  6. Verify the XML file for the new cluster exists and has the name CustomerName-NamePrefix-ClusterName.xml in the destination folder.
  7. Obtain the deployment files for the Oracle Grid Infrastructure and Oracle Database releases selected, and place them in the OEDA WorkDir directory.
  8. Run the OEDA Deployment Tool using the -cf option to specify the XML file for the new cluster, and the -l option to list the steps using the following command:
    $ ./install.sh -cf    \
    ExadataConfigurations/CustomerName-NamePrefix-ClusterName.xml -l
    

    You should see output similar to the following:

    Initializing 
    ||||| 
    1. Validate Configuration File 
    2. Update Nodes for Eighth Rack 
    3. Create Virtual Machine 
    4. Create Users 
    5. Setup Cell Connectivity 
    6. Calibrate Cells 
    7. Create Cell Disks 
    8. Create Grid Disks 
    9. Configure Alerting 
    10. Install Cluster Software 
    11. Initialize Cluster Software 
    12. Install Database Software 
    13. Relink Database with RDS 
    14. Create ASM Diskgroups 
    15. Create Databases 
    16. Apply Security Fixes 
    17. Install Exachk 
    18. Create Installation Summary 
    19. Resecure Machine
  9. Skip the following steps when adding new Oracle VM clusters in an existing Oracle VM environment on Oracle Exadata Database Machine:
    • (For Eight Rack systems only) 2. Update Nodes for Eighth Rack
    • 6. Calibrate Cells
    • 7. Create Cell Disks
    • 19. Resecure Machine

    Note:

    The step numbers change based on the selected hardware configuration. Use the step names to identify the correct steps on your system.

    For example, to execute step 1, run the following command:

    $ ./install.sh -cf \
    ExadataConfigurations/CustomerName-NamePrefix-ClusterName.xml -s 1
    To make OEDA run only a subset of the steps, you can specify a range, for example:
    $ ./install.sh -cf \
    ExadataConfigurations/CustomerName-NamePrefix-ClusterName.xml –r 3–5
  10. For all other systems, run all steps except for the Configure Alerting step using the XML file for the new cluster.

    To run an individual step, use a command similar to the following, which executes the first step:

    $ ./install.sh -cf \
    ExadataConfigurations/CustomerName-NamePrefix-ClusterName.xml -s 1

5.16 Expanding an Oracle VM Oracle RAC Cluster on Exadata Using OEDACLI

You can expand an existing Oracle RAC cluster on Oracle VMOracle RAC by adding guest domains using the Oracle Exadata Deployment Assistant command-line interface (OEDACLI).

OEDACLI is the preferred method if you have a known, good version of the OEDA XML file for your cluster.

Note:

During the execution of this procedure, the existing Oracle RAC cluster nodes along with their database instances incur zero downtime.

Use cases for this procedure include:

  • You have an existing Oracle RAC cluster that uses only a subset of the database servers of an Oracle Exadata Rack, and now the nodes not being used by the cluster have become candidates for use.
  • You have an existing Oracle RAC cluster on Oracle Exadata Database Machine that was recently extended with additional database servers.
  • You have an existing Oracle RAC cluster that had a complete node failure and the node was removed and replaced with a newly re-imaged node.

Before preforming the steps in this section, the new database servers should have been set up as detailed in Adding a New Database Server to the Cluster, including the following:

  • The new database server is installed and configured on the network with a kvmhost or management domain.
  • Download the latest Oracle Exadata Deployment Assistant (OEDA); ensure the version you download is the July 2019 release, or later.
  • You have an OEDA configuration XML file that accurately reflects the existing cluster configuration. You can validate the XML file by generating an installation template from it and comparing it to the current configuration. See the OEDACLI command SAVE FILES.
  • Review the OEDA Installation Template report for the current system configuration to obtain node names and IP addresses for existing nodes. You will need to have new host names and IP addresses for the new nodes being added. The new host names and IP addresses required are:
    • Administration host names and IP addresses (referred to as ADMINNET) for themanagement domain (kvmhost) and the user domains (guests).
    • Private host names and IP addresses (referred to as PRIVNET) for the management domain (kvmhost) and the user domains (guests).
    • Integrated Lights Out Manager (ILOM) host names and IP addresses for the management domain (kvmhost).
    • Client host names and IP addresses (referred to as CLIENTNET) for the user domains (guests).
    • Virtual IP (VIP) host names and IP addresses (referred to as VIPNET) for the user domains (guests).
    • Physical rack number and location of the new node in the rack (in terms of U number)
  • Each management domain or kvmhost has been imaged or patched to the same image in use on the existing database servers. The current system image must match the version of the /EXAVMIMAGES/ System.first.boot.*.img file on the new management domain (kvmhost) node.

    Note:

    The ~/dom0_group file referenced below is a text file that contains the host names of the management domains or kvmhosts for all existing and new nodes being added.

    Check the image version across all management domains or kvmhosts are the same.

    dcli -g ~/dom0_group -l root "imageinfo -ver"
    
    exa01adm01: 19.2.0.0.0.190225
    exa01adm02: 19.2.0.0.0.190225
    exa01adm03: 19.2.0.0.0.190225

    If any image versions differ, you must upgrade the nodes as needed so that they match.

    Ensure that the System.first.boot version across all management domains or kvmhosts matches the image version retrieved in the previous step.

    dcli -g ~/dom0_group -l root "ls  -1 /EXAVMIMAGES/System.first.boot*.img" 
    exa01adm01:  /EXAVMIMAGES/System.first.boot.19.2.0.0.0.190225.img
    exa01adm02:  /EXAVMIMAGES/System.first.boot.19.2.0.0.0.190225.img
    exa01adm03:  /EXAVMIMAGES/System.first.boot.19.2.0.0.0.190225.img

    If any nodes are missing the System.first.boot.img file that corresponds to the current image, then obtain the required file. See the “Supplemental README note” for your Exadata release in My Oracle Support Doc ID 888828.1 and look for the patch file corresponding to this description, “DomU System.img OS image for V.V.0.0.0 VM creation on upgraded dom0s”

  • Place the klone.zip files (gi-klone*.zip and db-klone*.zip) in the /EXAVMIMAGES location on the freshly imaged management domain or kvmhost node you are adding to the cluster. These files can be found in the/EXAVMIMAGES directory on the management domain or kvmhost node from where the system was initially deployed.

The steps here show how to add a new management domain or kvmhost node called exa01adm03 that will have a new user domain or guest called exa01adm03vm01. The steps show how to extend an existing Oracle RAC cluster onto the user domain (guest) using OEDACLI commands. The existing cluster has management domain (kvmhost) nodes named exa01adm01 and exa01adm02 and user domain(guest) nodes named exa01adm01vm01 and exa01adm02vm01.

  1. Add the management domain (kvmhost) information to the OEDA XML file using the CLONE COMPUTE command.

    In the examples below, the OEDA XML file is assumed to be in: unzipped_OEDA_location/ExadataConfigurations.

    OEDACLI> LOAD FILE NAME=exa01_original_deployment.xml 
    
    OEDACLI> CLONE COMPUTE SRCNAME  = exa01adm01 TGTNAME = exa01adm03
    SET ADMINNET NAME=exa01adm03,IP=xx.xx.xx.xx
    SET PRIVNET NAME1=exa01adm03-priv1,IP1=  xx.xx.xx.xx, 
    SET PRIVNET NAME2=exa01adm03-priv2,IP2=  xx.xx.xx.xx
    SET ILOMNET NAME=exa01adm03-c,IP=xx.xx.xx.xx
    SET RACK NUM=NN,ULOC=XX 
    
    OEDACLI> SAVE ACTION
    OEDACLI> MERGE ACTIONS FORCE
    OEDACLI> SAVE FILE NAME=exa01_plus_adm03_node.xml

    At this point we have a new XML file that has the new compute node management domain (kvmhost) in the configuration. This file will be used by the subsequent steps.

  2. Add the new guest information to the OEDA XML file using the CLONE GUEST command and deploy the guest.
    OEDACLI> LOAD FILE NAME=exa01_plus_adm03_node.xml 
    
    OEDACLI> CLONE GUEST SRCNAME  = exa01adm01vm01 TGTNAME = exa01adm03vm01
    WHERE STEPNAME=CREATE_GUEST
    SET PARENT NAME = exa01adm03
    SET ADMINNET NAME=exa01adm03vm01,IP=xx.xx.xx.xx
    SET PRIVNET NAME1=exa01db03vm01-priv1,IP1=  xx.xx.xx.xx, 
    SET PRIVNET NAME2=exa01db03vm01-priv2,IP2=  xx.xx.xx.xx
    SET CLIENTNET NAME=exa01client03vm01,IP=xx.xx.xx.xx
    SET VIPNET NAME=exa01client03vm01-vip,IP=xx.xx.xx.xx
    
    
    OEDACLI> SAVE ACTION
    OEDACLI> MERGE ACTIONS
    OEDACLI> DEPLOY ACTIONS

    If you prefer that OEDACLI runs all steps automatically, omit the following clause above, WHERE STEPNAME=CREATE_GUEST and skip step 3 below.

    At this point we have a guest created on our new compute node.

  3. Use OEDACLI to extend the cluster to the new guest.

    Note:

    Continue using the same XML file, exa01_plus_adm03_node.xml in this example. You will continue to update this file as you proceed through these steps. At the very end of the procedure, this XML file will properly reflect the new state of the clusters.
    OEDACLI> CLONE GUEST TGTNAME=exa01adm03vm01 WHERE STEPNAME = CREATE_USERS

    OEDACLI> SAVE ACTION
    OEDACLI> MERGE ACTIONS
    OEDACLI> DEPLOY ACTIONS
    
    OEDACLI> CLONE GUEST TGTNAME=exa01adm03vm01 WHERE STEPNAME = CELL_CONNECTIVITY

    OEDACLI> SAVE ACTION
    OEDACLI> MERGE ACTIONS
    OEDACLI> DEPLOY ACTIONS
    
    OEDACLI> CLONE GUEST TGTNAME=exa01adm03vm01 WHERE STEPNAME = ADD_NODE

    OEDACLI> SAVE ACTION
    OEDACLI> MERGE ACTIONS
    OEDACLI> DEPLOY ACTIONS
    
    OEDACLI> CLONE GUEST TGTNAME=exa01adm03vm01 WHERE STEPNAME = EXTEND_DBHOME

    OEDACLI> SAVE ACTION
    OEDACLI> MERGE ACTIONS
    OEDACLI> DEPLOY ACTIONS
    
    OEDACLI> CLONE GUEST TGTNAME=exa01adm03vm01 WHERE STEPNAME = ADD_INSTANCE

    OEDACLI> SAVE ACTION
    OEDACLI> MERGE ACTIONS
    OEDACLI> DEPLOY ACTIONS

    OEDACLI prints out messages similar to the following as each step completes:

    Deploying Action ID : 39 CLONE GUEST TGTNAME=exa01adm03vm01 where STEPNAME = ADD_INSTANCE 
    Deploying CLONE GUEST 
    Cloning Guest 
    Cloning Guest  :  exa01adm03vm01.us.oracle.com_id 
    Adding new instance for database [dbm] on exa01adm03vm01.us.oracle.com 
    Setting up Huge Pages for Database..[dbm] 
    Adding instance dbm3 on host exa01adm03vm01.us.oracle.com 
    Successfully completed adding database instance on the new node [elapsed Time [Elapsed = 
    249561 mS [4.0  minutes] Fri Jun 28 13:35:52 PDT 2019]] 
    Done...
    Done
  4. Save the current state of the configuration and generate configuration information.
    OEDACLI> SAVE FILES LOCATION=/tmp/exa01_plus_adm03_config

    The above command writes all the configuration files to the directory /tmp/exa01_plus_adm03_config. Save a copy of these files in a safe place since they now reflect the changes made to your cluster.

  5. Gather an Oracle EXAchk report and examine it to ensure the cluster is in good health.

5.17 Creating a User Domain Without Oracle Grid Infrastructure and Oracle Database

A user domain can be created without Oracle Grid Infrastructure and Oracle Database installed on the system. The new user domain has the following characteristics:

  • Operating system image is Oracle Linux

  • Access to the management, client, and InfiniBand networks

  • No Oracle Grid Infrastructure and Oracle Database is installed

The following procedure creates a user domain without Oracle Grid Infrastructure and Oracle Database installed:

  1. Allocate new, unused, IP addresses and host names for the new user domain. IP addresses and host names are needed for the management network, client (SCAN) network, and the private InfiniBand network.

    Note:

    Ensure the intended InfiniBand network IP addresses are unused by using the ping command for each address. The ibhosts command cannot be used to determine all InfiniBand network IP addresses in use because it does not contain entries for user domains.

  2. If necessary, obtain an updated user domain (domU) system image file.

    The exadata.img.domu_maker command that you will run later in this procedure to create a user domain requires the user domain (domU) system image file System.first.boot.version.img in /EXAVMIMAGES, where version matches the management domain Exadata software version as determined by running the "imageinfo -ver" command in the management domain.

    For example, when exadata.img.domu_maker is run to create a new user domain and the management domain Exadata software version is 12.1.2.1.1.150316.2, the user domain (domU) system image file /EXAVMIMAGES/System.first.boot.12.1.2.1.1.150316.2.img must exist.

    # imageinfo -ver
    12.1.2.1.1.150316.2
    
    # ls -l /EXAVMIMAGES/System.first.boot.12.1.2.1.1.150316.2.img
    -rw-r--r-- 1 root root 13958643712 Mar 23 12:25 /EXAVMIMAGES/System.first.boot.12.1.2.1.1.150316.2.img
    

    If the user domain (domU) system image file does not exist, then it must be obtained from My Oracle Support and placed in /EXAVMIMAGES in the management domain. See My Oracle Support note 888828.1 for additional information.

  3. In the management domain, copy an existing XML configuration file from a deployed user domain to a new file name using the following command:

    # cp /EXAVMIMAGES/conf/existingDomainName-vm.xml /EXAVMIMAGES/conf/newDomainName-vm.xml
    

    In the preceding command, existingDomainName-vm.xml is the XML configuration file of the deployed user domain, and newDomainName-vm.xml is the name of the new file.

    In the following example, the configuration file for user domain "dm01db01vm01" is copied to nondbdomain-vm.xml.

    # cp /EXAVMIMAGES/conf/dm01db01vm01-vm.xml /EXAVMIMAGES/conf/nondbdomain-vm.xml
    
  4. In the management domain, edit the new XML file as follows:

    1. Change all <Hostname> tags to match the new host names for the respective networks.

    2. Change all <IP_address> tags to match the new IP addresses for the respective networks.

    3. Change the <virtualMachine> tag to contain the new host name.

    4. Change the <hostName> tag to contain the new host name.

    5. Delete the entire <disk id="disk_2"> and <disk id="disk_3"> elements, including all their sub-elements. You must delete the entire entry between the starting <disk> tag to the corresponding closing </disk>.

  5. In the management domain, allocate InfiniBand network GUIDs for the new user domain using the /opt/exadata_ovm/exadata.img.domu_maker command.

    # /opt/exadata_ovm/exadata.img.domu_maker allocate-guids \
         /EXAVMIMAGES/conf/newDomainName-vm.xml              \
         /EXAVMIMAGES/conf/final-newDomainName-vm.xml
    
  6. In the management domain, create the new user domain using the /opt/exadata_ovm/exadata.img.domu_maker command.

    # /opt/exadata_ovm/exadata.img.domu_maker start-domain \
         /EXAVMIMAGES/conf/final-newDomainName-vm.xml
    

5.18 Moving a User Domain to a Different Database Server

User domains can move to different database servers.

The target Oracle Exadata Database Server must meet the following requirements:

  • The target database server must have the same Oracle Exadata System Software release installed with Oracle VM.

  • The target database server must have the same network visibility.

  • The target database server must have access to the same Oracle Exadata Storage Servers.

  • The target database server must have sufficient free resources (CPU, memory, and local disk storage) to operate the user domain.

    • It is possible to over-commit virtual CPUs such that the total number of virtual CPUs assigned to all domains exceeds the number of physical CPUs on the system. Over-committing CPUs can be done only when the competing workloads for over-subscribed resources are well understood and the concurrent demand does not exceed physical capacity.

    • It is not possible to over-commit memory.

    • Copying disk images to the target database server may increase space allocation of the disk image files because the copied files are no longer able to benefit from the disk space savings gained by using OCFS2 reflinks.

  • The user domain name must not be already in use on the target database server.

The following procedure moves a user domain to a new database server in the same Oracle Exadata System Software configuration. All steps in this procedure are performed in the management domain.

  1. Shut down the user domain.
    # xm shutdown DomainName -w
    
  2. Copy the user domain disk image and configuration files to the target database server.

    In the following examples, replace DomainName with the name of the domain.

    # scp -r /EXAVMIMAGES/GuestImages/DomainName/ target:/EXAVMIMAGES/GuestImages
    
  3. Obtain the UUID of the user domain.
    # grep ^uuid /EXAVMIMAGES/GuestImages/DomainName/vm.cfg
    

    An example of the user domain UUID is 49ffddce4efe43f5910d0c61c87bba58.

  4. Using the UUID of the user domain, copy the user domain symbolic links from /OVS/Repositories to the target database server.
    # tar cpvf - /OVS/Repositories/UUID/ | ssh target_db_server "tar xpvf - -C /"
    
  5. Start the user domain on the target database server.
    # xm create /EXAVMIMAGES/GuestImages/DomainName/xm.cfg
    

5.19 Backing up the Management Domain and User Domains in an Oracle VM Deployment

In an Oracle VM deployment, you need to back up the management domain (dom0) and the user domains (domU):

5.19.1 Backing up the Management Domain dom0 Using Snapshot-Based Backup

This procedure describes how to take a snapshot-based backup of the management domain, dom0.

The logical volume /dev/VGExaDb/LVDoNotRemoveOrUse is a placeholder to make sure there is always free space available to create a snapshot. If you run dbserver_backup.sh, then the placeholder LVM is removed by the script, the free space is used for a snapshot, and the LVM is re-created after the snapshot is created. If you follow the manual procedure described here, then you have to perform all these tasks manually.

The values shown in the steps below are examples. All steps must be performed as the root user.

  1. Prepare a destination to hold the backup.

    The destination should reside outside of the local machine, such as a writable NFS location, and be large enough to hold the backup tar file(s). For non-customized partitions, the space needed for holding the backup is around 60 GB.

    The following commands may be used to prepare the backup destination.

    # mkdir -p /remote_FS
    
    # mount -t nfs -o rw,intr,soft,proto=tcp,nolock ip_address:/nfs_location/ /remote_FS
    

    ip_address is the IP address of the NFS server, and nfs_location is the NFS location holding the backups.

  2. Take a snapshot-based backup of the file system hosting the / (root) directory.
    1. Check for the existence of the LVDoNotRemoveOrUse logical volume.

      If this volume is present, then remove the volume to make space for the snapshot. Execute the script below to check for the existence of the LVDoNotRemoveOrUse logical volume and remove it if present.

      lvm lvdisplay --ignorelockingfailure /dev/VGExaDb/LVDoNotRemoveOrUse
      if [ $? -eq 0 ]; then
        # LVDoNotRemoveOrUse logical volume exists.
        lvm lvremove -f /dev/VGExaDb/LVDoNotRemoveOrUse
        if [ $? -ne 0 ]; then
             echo "Unable to remove logical volume: LVDoNotRemoveOrUse. Unable to proceed with backup"
        fi
      fi

      If the LVDoNotRemoveOrUse logical volume does not exist, then investigate the reason and do not proceed with the steps below.

    2. Create a snapshot named LVDbSys3_snap for the file system hosting the / (root) directory.

      This example assumes LVDbSys3 is the active partition.

      # lvcreate -L1G -s -n LVDbSys3_snap /dev/VGExaDb/LVDbSys3
      
    3. Label the snapshot.
      # e2label /dev/VGExaDb/LVDbSys3_snap DBSYSOVS_SNAP
      
    4. Mount the snapshot.
      # mkdir /root/mnt
      
      # mount /dev/VGExaDb/LVDbSys3_snap /root/mnt -t ext4
      
    5. Change to the directory for the backup.
      # cd /root/mnt
      
    6. Create the backup file.
      # tar -pjcvf /remote_FS/mybackup.tar.bz2 * /boot > /tmp/backup_tar.stdout 2> /tmp/backup_tar.stderr
      
    7. Check the /tmp/backup_tar.stderr file for any significant errors.

      Errors about failing to tar open sockets, and other similar errors, can be ignored.

  3. Unmount the snapshot and remove the snapshot for the root directory.
    # cd /
    # umount /root/mnt
    # /bin/rmdir /root/mnt
    # lvremove /dev/VGExaDb/LVDbSys3_snap
  4. Unmount the NFS share.
    # umount /remote_FS
  5. Recreate the /dev/VGExaDb/LVDoNotRemoveOrUse logical volume.
    # lvm lvcreate -n LVDoNotRemoveOrUse -L1G VGExaDb

5.19.2 Backing up the User Domains

You can create a backup of all the user domains on a host, or of individual user domains.

There are three ways to back up the user domains:

  • Method 1: Back up all user domains in the storage repository using Oracle Cluster File System (OCFS) reflinks to get a consistent backup

    This method backs up the storage repository that is the /EXAVMIMAGES OCFS2 file system. This method provides a more robust and a comprehensive backup than method 2 or 3. Method 3 provides a quicker and an easier backup method, especially in role separated environments.

    Method 1 is best-suited for when a management domain (dom0) administrator is responsible for user domain backups.

  • Method 2: Back up individual user domains in the storage repository using Oracle Cluster File System (OCFS) reflinks to get a consistent backup.

    You select which user domains you want to back up from the /EXAVMIMAGES OCFS2 file system. The user domains are located in the /EXAVMIMAGES/GuestImages/user directories.

    Method 2 is best-suited for when a management domain (dom0) administrator is responsible for user domain backups.

  • Method 3: Back up a user domain using snapshot-based backup

    This method backs up a single user domain using snapshot-based backup from inside the user domain.

    Method 3 is ideal where a user domain administrator is responsible for the user domain backups.

5.19.2.1 Method 1: Back up All the User Domains

You can back up all the user domains by backing up the storage repository that is the /EXAVMIMAGES OCFS2 file system.

The backup destination should reside outside of the local machine, such as a writable NFS location, and be large enough to hold the backup. The space needed for the backup is proportional to the number of Oracle VMs deployed on the system, up to a maximum space of about 1.6 TB.

This procedure assumes there are 15 or less user domains per management domain.

  1. Use the following script to prepare the backup destination and prepare the user domains for backup.
    ScriptStarttime=$(date +%s)
    printf "This script is going to remove the directory /EXAVMIMAGES/Backup.
    If that is not acceptable, exit the script by typing n, manually 
    remove /EXAVMIMAGES/Backup and come back to rerun the script. Otherwise, 
    press y to continue  :"
    read proceed 
    
    if [[ ${proceed} == "n" ]] || [[  ${proceed} == "N" ]]
    then
      exit 0
    fi 
    
    rm -rf /EXAVMIMAGES/Backup 
    
    ##  Create the Backup Directory 
    
    mkdirStartTime=$(date +%s)
    find /EXAVMIMAGES -type d|grep -v 'lost+found'|
    awk '{print "mkdir -p /EXAVMIMAGES/Backup"$1}'|sh
    mkdirEndTime=$(date +%s)
    mkdirTime=$(expr ${mkdirEndTime} - ${mkdirStartTime})
    echo "Backup Directory creation time :" ${mkdirTime}" seconds" 
    
    ##  Create reflinks for files not in /EXAVMIMAGES/GuestImages
    relinkothesStartTime=$(date +%s)
    find /EXAVMIMAGES/ -not -path "/EXAVMIMAGES/GuestImages/*" 
    -not -path "/EXAVMIMAGES/Backup/*" -type f|awk '{print
    "reflink",$0,"/EXAVMIMAGES/Backup"$0}'|sh
    relinkothesEndTime=$(date +%s)
    reflinkothesTime=$(expr ${relinkothesEndTime} - ${relinkothesStartTime})
    echo "Reflink creation time for files other than in /EXAVMIMAGES/GuestIm
    ages :" ${reflinkothesTime}" seconds" 
    
    ##  Pause the user domains
    for hostName in $(xm list|egrep -v '^Domain-0|^Name'|awk '{print $1}')
    do
    PauseStartTime=$(date +%s)
    xm pause ${hostName}
    PauseEndTime=$(date +%s)
    PauseTime=$(expr ${PauseEndTime} - ${PauseStartTime})
    echo "PauseTime for guest - ${hostName} :" ${PauseTime}" seconds" 
    
    ## Create reflinks for all the files in /EXAVMIMAGES/GuestImages
    relinkStartTime=$(date +%s)
    find /EXAVMIMAGES/GuestImages/${hostName} -type f|awk '{print "reflink",
    $0,"/EXAVMIMAGES/Backup"$0}'|sh
    relinkEndTime=$(date +%s)
    reflinkTime=$(expr ${relinkEndTime} - ${relinkStartTime})
    echo "Reflink creation time for guest - ${hostName} :" ${reflinkTime}" seconds" 
    
    ## Unpause the user domains
    unPauseStartTime=$(date +%s)
    xm unpause ${hostName}
    unPauseEndTime=$(date +%s)
    unPauseTime=$(expr ${unPauseEndTime} - ${unPauseStartTime})
    echo "unPauseTime for guest - ${hostName} :" ${unPauseTime}" seconds"
    done 
    
    ScriptEndtime=$(date +%s) 
    ScriptRunTime=$(expr ${ScriptEndtime} - ${ScriptStarttime}) 
    echo ScriptRunTime ${ScriptRunTime}" seconds”
  2. Create a backup of the snapshot.

    Backup the reflink files in the /EXAVMIMAGES/Backup directory that was created with the script in Step 1 to a remote location. For example:

    1. Create a tarball file comprising of all files under /EXAVMIMAGES/Backup.
    2. Copy the tarball to a remote location.

    This allows for restore operations if the management domain (Dom0) is permanently lost or damaged.

  3. Remove the reflinks created by the script.
5.19.2.2 Method 2: Back up Individual User Domains

You can back up an individual user domain by backing up its specific folder in /EXAVMIMAGES file system.

The backup destination should reside outside of the local machine, such as a writable NFS location, and be large enough to hold the backup. The space needed for the backup is proportional to the number of Oracle VMs deployed on the system, up to a maximum space of about 1.6 TB.

  1. Use the following script to prepare the backup destination and prepare the user domain for backup.
    ScriptStarttime=$(date +%s)
    printf "This script is going to remove the directory /EXAVMIMAGES/Backup.
    If that is not acceptable, exit the script by typing n, manually 
    remove /EXAVMIMAGES/Backup and come back to rerun the script. Otherwise, 
    press y to continue  :"
    read proceed 
    
    if [[ ${proceed} == "n" ]] || [[  ${proceed} == "N" ]]
    then
      exit 0
    fi 
    
    rm -rf /EXAVMIMAGES/Backup 
    
    printf "Enter the name of the user domains to be backed up :"
    read userDomainName
    
    ##  Create the Backup Directory 
    
    mkdirStartTime=$(date +%s)
    find /EXAVMIMAGES/GuestImages/${userDomainName} -type d|grep -v 
    'lost+found'|awk '{print "mkdir -p /EXAVMIMAGES/Backup"$1}'|sh
    mkdirEndTime=$(date +%s)
    mkdirTime=$(expr ${mkdirEndTime} - ${mkdirStartTime})
    echo "Backup Directory creation time :" ${mkdirTime}" seconds" 
    
    ##  Pause the user domain
    PauseStartTime=$(date +%s)
    xm pause ${userDomainName}
    PauseEndTime=$(date +%s)
    PauseTime=$(expr ${PauseEndTime} - ${PauseStartTime})
    echo "PauseTime for guest - ${userDomainName} :" ${PauseTime}" seconds" 
    
    ## Create reflinks for all the files in /EXAVMIMAGES/GuestImages/${userDomainName}
    relinkStartTime=$(date +%s)
    find /EXAVMIMAGES/GuestImages/${userDomainName} -type f|awk '{print "reflink",
    $0,"/EXAVMIMAGES/Backup"$0}'|sh
    relinkEndTime=$(date +%s)
    reflinkTime=$(expr ${relinkEndTime} - ${relinkStartTime})
    echo "Reflink creation time for guest - ${userDomainName} :" ${reflinkTime}" seconds" 
    
    ## Unpause the user domain
    unPauseStartTime=$(date +%s)
    xm unpause ${userDomainName}
    unPauseEndTime=$(date +%s)
    unPauseTime=$(expr ${unPauseEndTime} - ${unPauseStartTime})
    echo "unPauseTime for guest - ${userDomainName} :" ${unPauseTime}" seconds"
    done 
    
    ScriptEndtime=$(date +%s) 
    ScriptRunTime=$(expr ${ScriptEndtime} - ${ScriptStarttime}) 
    echo ScriptRunTime ${ScriptRunTime}" seconds”
  2. Create a backup of the snapshot.

    Backup the reflink files in the /EXAVMIMAGES/Backup directory that was created with the script in Step 1 to a remote location. For example:

    1. Create a tarball file comprising of all files under /EXAVMIMAGES/Backup.
    2. Copy the tarball to a remote location.

    This allows for restore operations if the management domain (Dom0) is permanently lost or damaged.

  3. Remove the reflinks created by the script.
5.19.2.3 Method 3: Back up a User Domain from Inside the User Domain

You can take a snapshot-based backup of a user domain from inside the user domain, which can then be used to restore the user domain to a workable state.

All steps are performed from inside the user domain.

Note:

This method of backing up a user domain from inside the user domain using LVM snapshots will have limited usage in terms of recovery. Such a backup can only be used for recovery purposes when the user domain is still bootable and allows login as the root user. This means the damage is such that some files have been lost or damaged but can be restored from the tar backup after the user domain has booted up and the / (root) partition and the boot partitions are mounted. If that is not the case and the damage is such that the user domain does not boot, then you need a backup taken using methods 1 or 2 above to recover the user domain, and you need to perform the recovery procedure at the user domain level using the recovery procedure described below.

This procedure backs up the following:

  • LVDbSys1
  • LVDbOra1
  • /boot partition
  • Grid Infrastructure home
  • RDBMS home

All steps must be performed as the root user.

  1. Prepare a destination to hold the backup.

    In the following example, ip_address is the IP address of the NFS server, and nfs_location is the NFS location holding the backups.

    # mkdir -p /remote_FS
    
    # mount -t nfs -o rw,intr,soft,proto=tcp,nolock ip_address:/nfs_location/ /remote_FS
  2. Take a snapshot-based backup of the file systems containing / (root) and the /u01 directories, as follows:
    1. Create a snapshot named LVDbSys1_snap for the file system containing the root directory.
      The volume group must have at least 1 GB of free space for the command to succeed.
      # lvcreate -L1G -s -n LVDbSys1_snap /dev/VGExaDb/LVDbSys1
    2. Label the snapshot.
      # e2label /dev/VGExaDb/LVDbSys1_snap DBSYS_SNAP
    3. Mount the snapshot.
      # mkdir /root/mnt
      
      # mount /dev/VGExaDb/LVDbSys1_snap /root/mnt -t ext4
    4. Create a snapshot named u01_snap for the /u01 directory.
      # lvcreate -L256M -s -n u01_snap /dev/VGExaDb/LVDbOra1
    5. Label the snapshot.
      # e2label /dev/VGExaDb/u01_snap DBORA_SNAP
    6. Mount the snapshot.
      # mkdir -p /root/mnt/u01
      
      # mount /dev/VGExaDb/u01_snap /root/mnt/u01 -t ext4
    7. Change to the directory for the backup.
      # cd /root/mnt
    8. Create the backup file to back up the two snapshots taken above, the /boot partition, the Oracle Database home directory, and the Oracle Grid Infrastructure home directory.

      In the following example: Grid_home is the location of the Oracle Grid Infrastructure home, for example, /u01/app/18.1.0/grid; DB_home is the location of the Oracle Database home, for example, /u01/app/oracle/product/18.1.0/dbhome_1.

      # tar -pjcvf /remote_FS/mybackup.tar.bz2 * /boot Grid_home 
      DB_home > /tmp/backup_tar.stdout 2> /tmp/backup_tar.stderr
      
    9. Check the /tmp/backup_tar.stderr file for any significant errors.

      Errors about failing to tar open sockets, and other similar errors, can be ignored.

  3. Unmount and remove the snapshots for the file system containing the root directories.
    # cd /
    # umount /root/mnt/u01
    # umount /root/mnt
    # /bin/rmdir /root/mnt
    # lvremove /dev/VGExaDb/u01_snap
    # lvremove /dev/VGExaDb/LVDbSys1_snap
  4. Unmount the NFS share.
    # umount /remote_FS

5.20 Recovering an Oracle VM Deployment

You can recover an Oracle VM from a snapshot-based backup when severe disaster conditions damage the Oracle VM, or when the server hardware is replaced to such an extent that it amounts to new hardware.

For example, replacing all hard disks leaves no trace of original software on the system. This is similar to replacing the complete system as far as the software is concerned. In addition, it provides a method for disaster recovery of the database servers using an LVM snapshot-based backup taken when the database server was healthy before the disaster condition.

The recovery procedures described in this section do not include backup or recovery of storage servers or the data in an Oracle Database. Oracle recommends testing the backup and recovery procedures on a regular basis.

5.20.1 Overview of Snapshot-Based Recovery of Database Servers

The recovery of the Oracle VM consists of a series of tasks.

The recovery procedures use the diagnostics.iso image as a virtual CD-ROM to restart the Oracle VM in rescue mode using the Integrated Lights Out Manager (ILOM). At a high-level, the steps look like this:

  1. Re-create the following:

    • Boot partitions

    • Physical volumes

    • Volume groups

    • Logical volumes

    • File system

    • Swap partition

  2. Activate the swap partition.

  3. Ensure the /boot partition is the active boot partition.

  4. Restore the data.

  5. Reconfigure GRUB.

  6. Restart the server.

5.20.2 Scenario 1: Recovering a Management Domain and Its User Domains from Backup

You can recover the management domain and all its user domains from a backup.

The following procedures step you through the recovery process. Chose one of the following procedures, based on the version of Oracle Exadata System Software that is installed on your system.

WARNING:

All existing data on the disks is lost during these procedures.
5.20.2.1 Recovering a Management Domain and Its User Domains (Releases Prior to 12.2.1.1.0)

You can recover a management domain from a snapshot-based backup when severe disaster conditions damage the dom0, or when the server hardware is replaced to such an extent that it amounts to new hardware.

To use this recovery method, it is assumed that you have previously completed the steps in Backing up the Management Domain dom0 Using Snapshot-Based Backup.

  1. Prepare an NFS server to host the backup archive mybackup.tar.bz2.

    The NFS server must be accessible by IP address. For example, on an NFS server with the IP address nfs_ip, where the directory /export is exported from NFS mounts, put the mybackup.tar.bz2 file in the /export directory

  2. Attach the /opt/oracle.SupportTools/diagnostics.iso file from any healthy database server as virtual media to the ILOM of the management domain to be restored.
    The following example shows how to set up a virtual CD-ROM using the ILOM interface:
    1. Copy the diagnostics.iso file to a directory on the machine that will be using the ILOM interface.
    2. Log in to the ILOM web interface.
    3. In the Oracle ILOM web interface, click Remote Control, and then click Redirection.
    4. Select Use Video Redirection.
    5. After the console launches, click Storage in the KVMS menu.
    6. To add a storage image, such as a DVD image, to the Storage Devices dialog box, click Add.
    7. Open the diagnostics.iso file.
    8. To redirect storage media from the Storage Device dialog box, select the storage media and click Connect.

      After a connection to the device has been established, the label on the Connect button in the Storage Device dialog box changes to Disconnect.

    9. Select Host Control from the Host Management tab.
    10. Select CDROM as the next boot device from the list of values.
    11. Click Save.

      When the system is booted, the diagnostics.iso image is used.

  3. Restart the system from the ISO image file.

    You can restart the system using one of the following methods:

    • Choose the CD-ROM as the boot device during startup

    • Preset the boot device by running the ipmitool command from any other machine that can reach the ILOM of the management domain to be restored:

      # ipmitool -H ILOM_ip_address_or_hostname -U root chassis bootdev cdrom
      # ipmitool -H ILOM_ip_address_or_hostname -U root chassis power cycle
  4. Log in to the diagnostics shell as the root user.

    When the system displays the following:

    Choose from following by typing letter in '()':
    (e)nter interactive diagnostics shell. Must use credentials from Oracle support to login (reboot or power cycle to exit the shell),
    (r)estore system from NFS backup archive,

    Type e to enter the diagnostics shell, and log in as the root user.

    Note:

    If you do not have the password for the root user, then contact Oracle Support Services.
  5. If required, use /opt/MegaRAID/MegaCli/MegaCli64 to configure the disk controller to set up the disks.
  6. Remove the logical volumes, the volume group, and the physical volume, in case they still exist after the disaster.
    # lvm vgremove VGExaDb --force
    # lvm pvremove /dev/sda2 --force
  7. Remove the existing partitions and clean up the drive.
    # parted
    GNU Parted 2.1
    Using /dev/sda
    Welcome to GNU Parted! Type 'help' to view a list of commands.
    (parted) rm 1 
    sda: sda2 sda3
    (parted) rm 2 
    sda: sda3
    (parted) rm 3 
    sda:
    (parted) q
    
    # dd if=/dev/zero of=/dev/sda bs=64M count=2
  8. Create the three partitions on /dev/sda.
    1. Get the end sector for the disk /dev/sda from a surviving dom0 and store it in a variable:
      # end_sector=$(parted -s /dev/sda unit s print|perl -ne '/^Disk\s+\S+:\s+(\d+)s/ and print $1')
    2. Create the boot partition, /dev/sda1.
      # parted -s /dev/sda mklabel gpt mkpart primary 64s 1048639s set 1 boot on
    3. Create the partition that will hold the LVMs, /dev/sda2.
      # parted -s /dev/sda mkpart primary 1048640s 240132159s set 2 lvm on
    4. Create the OCFS2 storage repository partition, /dev/sda3.
      # parted -s /dev/sda mkpart primary 240132160s ${end_sector}s set 3
  9. Use the /sbin/lvm command to re-create the logical volumes and mkfs to create file systems.
    1. Create the physical volume and the volume group.
      # lvm pvcreate /dev/sda2
      # lvm vgcreate VGExaDb /dev/sda2
      
    2. Create the logical volume for the file system that will contain the / (root) directory and label it.
      # lvm lvcreate -n LVDbSys3 -L30G VGExaDb
      # mkfs.ext4 /dev/VGExaDb/LVDbSys3
      # e2label /dev/VGExaDb/LVDbSys3 DBSYSOVS
      
    3. Create the logical volume for the swap directory, and label it.
      # lvm lvcreate -n LVDbSwap1 -L24G VGExaDb
      # mkswap -L SWAP /dev/VGExaDb/LVDbSwap1
      
    4. Create the logical volume for the backup partition, and build a file system on top of it.
      # lvm lvcreate -n LVDbSys2 -L30G VGExaDb
      # mkfs.ext4 /dev/VGExaDb/LVDbSys2
      
    5. Create the logical volume for the reserved partition.
      # lvm lvcreate -n LVDoNotRemoveOrUse –L1G VGExaDb

      Note:

      Do not create any file system on this logical volume.
    6. Create a file system on the /dev/sda1 partition, and label it.

      In the mkfs.ext3 command below, the -I 128 option is needed to set the inode size to 128.

      # mkfs.ext3 -I 128 /dev/sda1
      # tune2fs -c 0 -i 0 /dev/sda1
      # e2label /dev/sda1 BOOT
      
  10. Create mount points for all the partitions, and mount the respective partitions.

    For example, if /mnt is used as the top level directory, the mounted list of partitions may look like:

    • /dev/VGExaDb/LVDbSys3 on /mnt

    • /dev/sda1 on /mnt/boot

    The following example mounts the root file system, and creates two mount points:

    # mount /dev/VGExaDb/LVDbSys3 /mnt -t ext4
    # mkdir /mnt/boot
    # mount /dev/sda1 /mnt/boot -t ext3
    
  11. Bring up the network on eth0 and assign the host's IP address and netmask to it.
    # ifconfig eth0 ip_address_for_eth0 netmask netmask_for_eth0 up
    # route add -net 0.0.0.0 netmask 0.0.0.0 gw gateway_ip_address
    
  12. Mount the NFS server holding the backups.
    # mkdir -p /root/mnt
    # mount -t nfs -o ro,intr,soft,proto=tcp,nolock nfs_ip:/location_of_backup /root/mnt
    
  13. From the backup which was created in Backing up the Management Domain dom0 Using Snapshot-Based Backup, restore the root directory (/) and the boot file system.
    # tar -pjxvf /root/mnt/backup-of-root-and-boot.tar -C /mnt
  14. Unmount the restored /dev/sda1 partition, and remount it on /boot.
    # umount /mnt/boot
    # mkdir /boot
    # mount /dev/sda1 /boot -t ext3
    
  15. Set up the grub boot loader using the command below:
    # grub --device-map=/boot/grub/device.map << DOM0_GRUB_INSTALL
    root (hd0,0)
    setup (hd0)
    quit
    DOM0_GRUB_INSTALL
    
  16. Unmount the /boot partition.
    # umount /boot
  17. Detach the diagnostics.iso file.

    This can be done by clicking Disconnect on the ILOM web interface console, where you clicked Connect in step 2.h to attach the DVD ISO image.

  18. Check the restored /etc/fstab file and remove any reference to /EXAVMIMAGES and /dev/sda3.
    # cd /mnt/etc

    Comment out any line that references /EXAVMIMAGES or /dev/sda3.

  19. Restart the system.
    # shutdown -r now

    This completes the restoration procedure for the management domain (dom0).

  20. Convert to eighth rack, if required.

    If the recovery is on an Oracle Exadata Database Machine Eighth Rack, then perform the procedure described in Configuring Oracle Exadata Database Machine Eighth Rack Oracle Linux Database Server After Recovery.

  21. When the server comes back up, build an OCFS2 file system on the /dev/sda3 partition.
    # mkfs -t ocfs2 -L ocfs2 -T vmstore --fs-features=local /dev/sda3 --force
  22. Mount the OCFS2 partition /dev/sda3 on /EXAVMIMAGES.
    # mount -t ocfs2 /dev/sda3 /EXAVMIMAGES
  23. In /etc/fstab, uncomment the commented out references to /EXAVMIMAGES and /dev/sda3 that was performed in step 18.
  24. Mount the backup NFS server that holds the storage repository (/EXAVMIMAGES) backup to restore the /EXAVMIMAGES file system which holds all the user domain images.
    # mkdir -p /root/mnt
    # mount -t nfs -o ro,intr,soft,proto=tcp,nolock nfs_ip:/location_of_backup /root/mnt
    
  25. Restore the /EXAVMIMAGES file system.

    To restore all user domains, use this command:

    # tar -Spxvf /root/mnt/backup-of-exavmimages.tar -C /EXAVMIMAGES

    To restore a single user domain from the backup, use the following command instead:

    # tar -Spxvf /root/mnt/backup-of-exavmimages.tar -C /EXAVMIMAGES EXAVMIMAGES/<user-domain-name-to-be-restored>
  26. Bring up each user domain.
    # xm create /EXAVMIMAGES/GuestImages/user_domain_hostname/vm.cfg

At this point all the user domains should come up along with Oracle Grid Infrastructure and the Oracle Database instances. The database instances should join the Oracle Real Application Clusters (Oracle RAC) cluster formed by the other surviving management domain nodes.

5.20.2.2 Recovering a Management Domain and Its User Domains (Releases 12.2.1.1.0 and Later)

You can recover a management domain from a snapshot-based backup when severe disaster conditions damage the management domain, or when the server hardware is replaced to such an extent that it amounts to new hardware.

To use this recovery method, it is assumed that you have previously completed the steps in Backing up the Management Domain dom0 Using Snapshot-Based Backup.

  1. Prepare an NFS server to host the backup archive mybackup.tar.bz2.

    The NFS server must be accessible by IP address. For example, on an NFS server with the IP address nfs_ip, where the directory /export is exported from NFS mounts, put the mybackup.tar.bz2 file in the /export directory

  2. Attach the /opt/oracle.SupportTools/diagnostics.iso file from any healthy database server as virtual media to the ILOM of the management domain to be restored.
    The following example shows how to set up a virtual CD-ROM using the ILOM interface:
    1. Copy the diagnostics.iso file to a directory on the machine that will be using the ILOM interface.
    2. Log in to the ILOM web interface.
    3. In the Oracle ILOM web interface, click Remote Control, and then click Redirection.
    4. Select Use Video Redirection.
    5. After the console launches, click Storage in the KVMS menu.
    6. To add a storage image, such as a DVD image, to the Storage Devices dialog box, click Add.
    7. Open the diagnostics.iso file.
    8. To redirect storage media from the Storage Device dialog box, select the storage media and click Connect.

      After a connection to the device has been established, the label on the Connect button in the Storage Device dialog box changes to Disconnect.

    9. Select Host Control from the Host Management tab.
    10. Select CDROM as the next boot device from the list of values.
    11. Click Save.

      When the system is booted, the diagnostics.iso image is used.

  3. Restart the system from the ISO image file.

    You can restart the system using one of the following methods:

    • Choose the CD-ROM as the boot device during start up.

    • Preset the boot device by running the ipmitool command from any other machine that can reach the ILOM of the management domain to be restored:

      # ipmitool -H ILOM_ip_address_or_hostname -U root chassis bootdev cdrom
      # ipmitool -H ILOM_ip_address_or_hostname -U root chassis power cycle
  4. Log in to the diagnostics shell as the root user.

    When the system displays the following:

    Choose from following by typing letter in '()':
    (e)nter interactive diagnostics shell. Must use credentials from Oracle support to login (reboot or power cycle to exit the shell),
    (r)estore system from NFS backup archive,

    Type e to enter the diagnostics shell, and log in as the root user.

    Note:

    If you do not have the password for the root user, then contact Oracle Support Services.
  5. If required, use /opt/MegaRAID/MegaCli/MegaCli64 to configure the disk controller to set up the disks.
  6. Remove the logical volumes, the volume group, and the physical volume, in case they still exist after the disaster.
    # lvm vgremove VGExaDb --force
    # lvm pvremove /dev/sda2 --force
  7. Remove the existing partitions and clean up the drive.
    # parted
    GNU Parted 2.1
    Using /dev/sda
    Welcome to GNU Parted! Type 'help' to view a list of commands.
    (parted) rm 1 
    [12064.253824] sda: sda2
    (parted) rm 2 
    [12070.579094] sda: 
    (parted) q
    
    # dd if=/dev/zero of=/dev/sda bs=64M count=2
  8. Create the two partitions on /dev/sda.
    1. Get the end sector for the disk /dev/sda from a surviving dom0 and store it in a variable:
      # end_sector_logical=$(parted -s /dev/sda unit s print|perl -ne '/^Disk\s+\S+:\s+(\d+)s/ and print $1')
      # end_sector=$( expr $end_sector_logical - 34 )
      

      The values for the start and end sectors in the commands below were taken from a surviving management domain. Because these values can change over time, it is recommended that these values are checked from a surviving management domain using the following command:

      # parted -s /dev/sda unit S print
    2. Create the boot partition, /dev/sda1.
      # parted -s /dev/sda mklabel gpt mkpart primary 64s 1048639s set 1 boot on
    3. Create the partition that will hold the LVMs, /dev/sda2.
      # parted -s /dev/sda mkpart primary 1048640s 3509759966s set 2 lvm on
  9. Use the /sbin/lvm command to re-create the logical volumes and mkfs to create file systems.
    1. Create the physical volume and the volume group.
      # lvm pvcreate /dev/sda2
      # lvm vgcreate VGExaDb /dev/sda2
      
    2. Create the logical volume for the file system that will contain the / (root) directory and label it.
      # lvm lvcreate -n LVDbSys3 -L30G VGExaDb
      # mkfs -t ext4 –b 4096 /dev/VGExaDb/LVDbSys3
      # e2label /dev/VGExaDb/LVDbSys3 DBSYSOVS
      
    3. Create the logical volume for the swap directory, and label it.
      # lvm lvcreate -n LVDbSwap1 -L24G VGExaDb
      # mkswap -L SWAP /dev/VGExaDb/LVDbSwap1
      
    4. Create the logical volume for the backup partition, and build a file system on top of it.
      # lvm lvcreate -n LVDbSys2 -L30G VGExaDb
      # mkfs -t ext4 –b 4096 /dev/VGExaDb/LVDbSys2
    5. Create the logical volume for the reserved partition.
      # lvm lvcreate -n LVDoNotRemoveOrUse –L1G VGExaDb

      Note:

      Do not create any file system on this logical volume.
    6. Create the logical volume for the guest storage repository.
      # lvm lvcreate -l 100%FREE -n LVDbExaVMImages VGExaDb
      
    7. Create a file system on the /dev/sda1 partition, and label it.

      In the mkfs.ext3 command below, the -I 128 option is needed to set the inode size to 128.

      # mkfs.ext3 -I 128 /dev/sda1
      # tune2fs -c 0 -i 0 /dev/sda1
      # e2label /dev/sda1 BOOT
      
  10. Create mount points for all the partitions, and mount the respective partitions.

    For example, if /mnt is used as the top-level directory, the mounted list of partitions might look like:

    • /dev/VGExaDb/LVDbSys3 on /mnt

    • /dev/sda1 on /mnt/boot

    The following example mounts the root file system, and creates two mount points:

    # mount /dev/VGExaDb/LVDbSys3 /mnt -t ext4
    # mkdir /mnt/boot
    # mount /dev/sda1 /mnt/boot -t ext3
    
  11. Bring up the network on eth0 and assign the host's IP address and netmask to it.
    # ifconfig eth0 ip_address_for_eth0 netmask netmask_for_eth0 up
    # route add -net 0.0.0.0 netmask 0.0.0.0 gw gateway_ip_address
    
  12. Mount the NFS server holding the backups.
    # mkdir -p /root/mnt
    # mount -t nfs -o ro,intr,soft,proto=tcp,nolock nfs_ip:/location_of_backup /root/mnt
    
  13. From the backup which was created in Backing up the Management Domain dom0 Using Snapshot-Based Backup, restore the root directory (/) and the boot file system.
    # tar -pjxvf /root/mnt/backup-of-root-and-boot.tar -C /mnt
  14. Unmount the restored /dev/sda1 partition, and remount it on /boot.
    # umount /mnt/boot
    # mkdir -p /boot
    # mount /dev/sda1 /boot -t ext3
    
  15. Set up the grub boot loader using the command below:
    # grub --device-map=/boot/grub/device.map << DOM0_GRUB_INSTALL
    root (hd0,0)
    setup (hd0)
    quit
    DOM0_GRUB_INSTALL
    
  16. Unmount the /boot partition.
    # umount /boot
  17. Detach the diagnostics.iso file.

    This can be done by clicking Disconnect on the ILOM web interface console, where you clicked Connect in step 2.h to attach the DVD ISO image.

  18. Check the restored /etc/fstab file and remove any reference to /EXAVMIMAGES.
    # cd /mnt/etc

    Comment out any line that references /EXAVMIMAGES.

  19. Restart the system.
    # shutdown -r now

    This completes the restoration procedure for the management domain (dom0).

  20. Convert to Eighth Rack, if required.

    If the recovery is on an Oracle Exadata Database Machine Eighth Rack, then perform the procedure described in Configuring Oracle Exadata Database Machine Eighth Rack Oracle Linux Database Server After Recovery".

  21. When the server comes back up, build an OCFS2 file system on the LVDbExaVMImages logical volume, which was created in step 9.f.
    # mkfs -t ocfs2 -L ocfs2 -T vmstore --fs-features=local /dev/VGExaDb/LVDbExaVMImages --force
  22. Mount the OCFS2 partition on /EXAVMIMAGES.
    # mount -t ocfs2 /dev/VGExaDb/LVDbExaVMImages /EXAVMIMAGES
  23. In /etc/fstab, uncomment the commented out references to /EXAVMIMAGES and /dev/mapper/VGExaDb-LVDbExaVMImages that was performed in step 18.
  24. Mount the backup NFS server that holds the storage repository (/EXAVMIMAGES) backup to restore the /EXAVMIMAGES file system.
    # mkdir -p /root/mnt
    # mount -t nfs -o ro,intr,soft,proto=tcp,nolock nfs_ip:/location_of_backup /root/mnt
    
  25. Restore the /EXAVMIMAGES file system.

    To restore all user domains, use this command:

    # tar -Spxvf /root/mnt/backup-of-exavmimages.tar -C /EXAVMIMAGES

    To restore a single user domain from the backup, use the following command instead:

    # tar -Spxvf /root/mnt/backup-of-exavmimages.tar -C /EXAVMIMAGES EXAVMIMAGES/<user-domain-name-to-be-restored>
  26. Bring up each user domain.
    # xm create /EXAVMIMAGES/GuestImages/user_domain_hostname/vm.cfg

At this point all the user domains should come up along with Oracle Grid Infrastructure and the Oracle Database instances. The database instances should join the Oracle Real Application Clusters (Oracle RAC) cluster formed by the other surviving management domain nodes.

5.20.2.3 Recovering a Management Domain and Its User Domains (Release 18.1 and X7 and Later)

You can recover a management domain from a snapshot-based backup when severe disaster conditions damage the management domain, or when the server hardware is replaced to such an extent that it amounts to new hardware.

  1. Prepare an NFS server to host the backup archive mybackup.tar.bz2.

    The NFS server must be accessible by IP address. For example, on an NFS server with the IP address nfs_ip, where the directory /export is exported from NFS mounts, put the mybackup.tar.bz2 file in the /export directory

  2. Attach the /opt/oracle.SupportTools/diagnostics.iso file from any healthy database server as virtual media to the ILOM of the management domain to be restored.
    The following example shows how to set up a virtual CD-ROM using the ILOM interface:
    1. Copy the diagnostics.iso file to a directory on the machine that will be using the ILOM interface.
    2. Log in to the ILOM web interface.
    3. In the Oracle ILOM web interface, click Remote Control, and then click Redirection.
    4. Select Use Video Redirection.
    5. After the console launches, click Storage in the KVMS menu.
    6. To add a storage image, such as a DVD image, to the Storage Devices dialog box, click Add.
    7. Open the diagnostics.iso file.
    8. To redirect storage media from the Storage Device dialog box, select the storage media and click Connect.

      After a connection to the device has been established, the label on the Connect button in the Storage Device dialog box changes to Disconnect.

    9. Select Host Control from the Host Management tab.
    10. Select CDROM as the next boot device from the list of values.
    11. Click Save.

      When the system is booted, the diagnostics.iso image is used.

  3. Restart the system from the ISO image file.

    You can restart the system using one of the following methods:

    • Choose the CD-ROM as the boot device during start up.

    • Preset the boot device by running the ipmitool command from any other machine that can reach the ILOM of the management domain to be restored:

      # ipmitool -H ILOM_ip_address_or_hostname -U root chassis bootdev cdrom
      # ipmitool -H ILOM_ip_address_or_hostname -U root chassis power cycle
  4. Log in to the diagnostics shell as the root user.

    When the system displays the following:

    Choose from following by typing letter in '()':
    (e)nter interactive diagnostics shell. Must use credentials from Oracle support to login (reboot or power cycle to exit the shell),
    (r)estore system from NFS backup archive,

    Type e to enter the diagnostics shell, and log in as the root user.

    Note:

    If you do not have the password for the root user, then contact Oracle Support Services.
  5. If required, use /opt/MegaRAID/MegaCli/MegaCli64 to configure the disk controller to set up the disks.
  6. Remove the logical volumes, the volume group, and the physical volume, in case they still exist after the disaster.
    # lvm vgremove VGExaDb --force
    # lvm pvremove /dev/sda3 --force
  7. Remove the existing partitions, then verify all partitions were removed.
    # parted
    GNU Parted 2.1
    Using /dev/sda
    Welcome to GNU Parted! Type 'help' to view a list of commands.
    (parted) print 
    Model: AVAGO MR9361-16i (scsi)
    Disk /dev/sda: 4193GB
    Sector size (logical/physical): 512B/512B
    Partition Table: gpt
    
    Number  Start   End     Size    File system  Name     Flags
     1      32.8kB  537MB   537MB   ext4         primary  boot
     2      537MB   805MB   268MB   fat32        primary  boot
     3      805MB   4193GB  4192GB               primary  lvm
    
    (parted) rm 1
    [ 1730.498593]  sda: sda2 sda3 
    (parted) rm 2 
    [ 1736.203794]  sda: sda3
    
    (parted) rm 3 
    [ 1738.546845]  sda:
    (parted) print
     Model: AVAGO MR9361-16i (scsi)
    Disk /dev/sda: 4193GB
    Sector size (logical/physical): 512B/512B
    Partition Table: gpt
    
    Number  Start  End  Size  File system  Name  Flags
    
    (parted) q 
    Information: You may need to update /etc/fstab.
  8. Create the three partitions on /dev/sda.
    1. Get the end sector for the disk /dev/sda from a surviving management domain (dom0) and store it in a variable:
      # end_sector_logical=$(parted -s /dev/sda unit s print|perl -ne '/^Disk\s+\S+:\s+(\d+)s/ and print $1')
      # end_sector=$( expr $end_sector_logical - 34 )
      # echo $end_sector

      The values for the start and end sectors in the commands below were taken from a surviving management domain. Because these values can change over time, it is recommended that these values are checked from a surviving dom0. For example, for an Oracle Exadata Database Machine X7-2 database server with 8 hard disk drives, you might see the following:

      # parted -s /dev/sda unit s print
      Model: AVAGO MR9361-16i (scsi)
      Disk /dev/sda: 8189440000s
      Sector size (logical/physical): 512B/512B
      Partition Table: gpt
      
      Number  Start     End          Size         File system  Name     Flags
       1      64s       1048639s     1048576s     ext4         primary  boot
       2      1048640s  1572927s     524288s      fat32        primary  boot
       3      1572928s  8189439966s  8187867039s               primary  lvm
      

      Note:

      The s (sector) value for the following sub-steps are based on a system with 8 hard disk drives. If you have 4 hard disk drives, then you need to view the partition table from the management domain on a surviving node and adjust the sector values accordingly.
    2. Create the boot partition, /dev/sda1.
      # parted -s /dev/sda mklabel gpt mkpart primary 64s 1048639s set 1 boot on
    3. Create the partition that will hold the LVMs, /dev/sda2.
      # parted -s /dev/sda mkpart primary fat32 1048640s 1572927s set 2 boot on
    4. Create the partition that will hold the LVMs, /dev/sda3.
      # parted -s /dev/sda mkpart primary 1572928s 8189439966s set 3 lvm on
  9. Use the /sbin/lvm command to re-create the logical volumes and mkfs to create the file systems.
    1. Create the physical volume and the volume group.
      # lvm pvcreate /dev/sda3
      # lvm vgcreate VGExaDb /dev/sda3
      
    2. Create the logical volume for the file system that will contain the / (root) directory and label it.
      # lvm lvcreate -n LVDbSys3 -L30G VGExaDb
      # mkfs -t ext4 /dev/VGExaDb/LVDbSys3
      # e2label /dev/VGExaDb/LVDbSys3 DBSYSOVS
      
    3. Create the logical volume for the swap directory, and label it.
      # lvm lvcreate -n LVDbSwap1 -L24G VGExaDb
      # mkswap -L SWAP /dev/VGExaDb/LVDbSwap1
      
    4. Create the logical volume for the backup partition, and build a file system on top of it.
      # lvm lvcreate -n LVDbSys2 -L30G VGExaDb
      # mkfs -t ext4 /dev/VGExaDb/LVDbSys2
    5. Create the logical volume for the guest storage repository.
      # lvm lvcreate -l 100%FREE -n LVDbExaVMImages VGExaDb
      
    6. Create a file system on the /dev/sda1 partition, and label it.
      # mkfs.ext4 /dev/sda1
      # e2label /dev/sda1 BOOT
      # tune2fs -l /dev/sda1
    7. Create a file system on the /dev/sda2 partition, and label it.
      # mkfs.vfat -v -c -F 32 -s 2 /dev/sda2
      # dosfslabel /dev/sda2 ESP
  10. Create mount points for all the partitions, and mount the respective partitions.

    For example, if /mnt is used as the top-level directory, the mounted list of partitions might look like:

    • /dev/VGExaDb/LVDbSys3 on /mnt
    • /dev/sda1 on /mnt/boot
    • /dev/sda2 on /mnt/boot/efi

    The following example mounts the root (/) file system, and creates three mount points:

    # mount /dev/VGExaDb/LVDbSys3 /mnt -t ext4
    # mkdir /mnt/boot
    # mount /dev/sda1 /mnt/boot -t ext4
    # mkdir /mnt/boot/efi
    # mount /dev/sda2 /mnt/boot/efi -t vfat
    
  11. Bring up the network on eth0 and (if not using DHCP) assign the host's IP address and netmask to it.

    If you are using DHCP then you do not have to manually configure the IP address for the host.

    # ip address add ip_address_for_eth0/netmask_for_eth0 dev eth0
    # ip link set up eth0
    # ip route add default via gateway_ip_address dev eth0
  12. Mount the NFS server holding the backups.
    # mkdir -p /root/mnt
    # mount -t nfs -o ro,intr,soft,proto=tcp,nolock nfs_ip:/location_of_backup /root/mnt
    
  13. From the backup which was created in Backing up the Management Domain dom0 Using Snapshot-Based Backup, restore the root directory (/) and the boot file system.
    # tar -pjxvf /root/mnt/backup-of-root-and-boot.tar -C /mnt
  14. Use the efibootmgr command to set the boot device.
    1. Disable and delete the Oracle Linux boot device. If you see the entry ExadataLinux_1, then remove this entry and recreate it.

      For example:

      # efibootmgr
      BootCurrent: 000F
      Timeout: 1 seconds
      BootOrder: 000F,0000,0001,0002,0003,0004,0005,0007,0008,0009,000A,000D,000E
      Boot0000* ExadataLinux_1
      Boot0001* NET0:PXE IP4 Intel(R) I210 Gigabit  Network Connection
      Boot0002* NET1:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller
      Boot0003* NET2:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller
      Boot0004* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter
      Boot0005* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter
      Boot0007* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot0008* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot0009* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot000A* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot000D* Oracle Linux
      Boot000E* UEFI OS
      Boot000F* USB:SUN
      

      In this example, you would disable and remove Oracle Linux (Boot00D) and ExadataLinux_1 (Boot000). Use commands similar to the following to disable and delete the boot devices:

      Disable 'Oracle Linux':
      # efibootmgr -b 000D -A
      Delete 'Oracle Linux':
      # efibootmgr -b 000D -B
      Disable old 'ExadataLinux_1':
      # efibootmgr -b 0000 -A
      Delete old 'ExadataLinux_1':
      # efibootmgr -b 0000 -B

    2. Recreate the boot entry for ExadataLinux_1 and then view the boot order entries.
      # efibootmgr -c -d /dev/sda -p 2 -l '\EFI\XEN\XEN.EFI' -L 
      'ExadataLinux_1'
      
      # efibootmgr
      BootCurrent: 000F
      Timeout: 1 seconds
      BootOrder: 0000,0001,0002,0003,0004,0005,0007,0008,0009,000A,000E,000F
      Boot0001* NET0:PXE IP4 Intel(R) I210 Gigabit  Network Connection
      Boot0002* NET1:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller
      Boot0003* NET2:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller
      Boot0004* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter
      Boot0005* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter
      Boot0007* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot0008* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot0009* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot000A* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot000E* UEFI OS
      Boot000F* USB:SUN
      Boot0000* ExadataLinux_1

      In the output from the efibootmgr command, make note of the boot order number for ExadataLinux_1 and use that value in the following commands:

      # efibootmgr -b (entry number) -A
      # efibootmgr -b (entry number) -a

      For example, in the previous output shown in step 14.a, ExadataLinux_1 was listed as (Boot000). So you would use the following commands:

      # efibootmgr -b 0000 -A
      # efibootmgr -b 0000 -a
    3. Set the correct boot order.
      Set ExadataLinux_1 as the first boot device. The remaining devices should stay in the same boot order, except for USB:SUN, which should be last.
      # efibootmgr -o
      0000,0001,0002,0003,0004,0005,0007,0008,0009,000A,000E,000F
      

      The boot order should now look like the following:

      # efibootmgr
      BootCurrent: 000F
      Timeout: 1 seconds
      BootOrder: 0000,0001,0002,0003,0004,0005,0007,0008,0009,000A,000E,000F
      Boot0000* ExadataLinux_1
      Boot0001* NET0:PXE IP4 Intel(R) I210 Gigabit  Network Connection
      Boot0002* NET1:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller
      Boot0003* NET2:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller
      Boot0004* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter
      Boot0005* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter
      Boot0007* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot0008* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot0009* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot000A* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter
      Boot000E* UEFI OS
      Boot000F* USB:SUN
    4. Check the boot order using the ubiosconfig command.
      # ubiosconfig export all -x /tmp/ubiosconfig.xml
      Make sure the ExadataLinux_1 entry is the first child element of boot_order.
       <boot_order>
          <boot_device>
            <description>ExadataLinux_1</description>  
            <instance>1</instance>
          </boot_device>
          <boot_device>
            <description>NET0:PXE IP4 Intel(R) I210 Gigabit  Network
      Connection</description>
            <instance>1</instance>
          </boot_device>
      ...
  15. Check the restored /etc/fstab file and remove any reference to /EXAVMIMAGES.
    # cd /mnt/etc

    Comment out any line that references /EXAVMIMAGES.

  16. Detach the diagnostics.iso file.

    This can be done by clicking Disconnect on the ILOM web interface console, where you clicked Connect in step 2.h to attach the DVD ISO image.

  17. Unmount the restored /dev/sda1 partitions so /dev/sda1 can be remounted on /boot.
    # umount /mnt/boot/efi
    # umount /mnt/boot
    # umount /mnt
    # umount /root/mnt
  18. Restart the system.
    # shutdown -r now

    This completes the restoration procedure for the management domain (dom0).

  19. Convert to Eighth Rack, if required.

    If the recovery is on an Oracle Exadata Database Machine Eighth Rack, then perform the procedure described in Configuring Oracle Exadata Database Machine Eighth Rack Oracle Linux Database Server After Recovery.

  20. When the server comes back up, build an OCFS2 file system on the LVDbExaVMImages logical volume, which was created in step 9.e.
    # mkfs -t ocfs2 -L ocfs2 -T vmstore --fs-features=local /dev/VGExaDb/LVDbExaVMImages --force
  21. Mount the OCFS2 partition on /EXAVMIMAGES.
    # mount -t ocfs2 /dev/VGExaDb/LVDbExaVMImages /EXAVMIMAGES
  22. In /etc/fstab, uncomment the commented out references to /EXAVMIMAGES and /dev/mapper/VGExaDb-LVDbExaVMImages that was performed in step 15.
  23. Mount the backup NFS server that holds the storage repository (/EXAVMIMAGES) backup to restore the /EXAVMIMAGES file system.
    # mkdir -p /root/mnt
    # mount -t nfs -o ro,intr,soft,proto=tcp,nolock nfs_ip:/location_of_backup /root/mnt
    
  24. Restore the /EXAVMIMAGES file system.

    To restore all user domains, use this command:

    # tar -Spxvf /root/mnt/backup-of-exavmimages.tar -C /EXAVMIMAGES

    To restore a single user domain from the backup, use the following command instead:

    # tar -Spxvf /root/mnt/backup-of-exavmimages.tar -C /EXAVMIMAGES EXAVMIMAGES/<user-domain-name-to-be-restored>
  25. Bring up each user domain.
    # xm create /EXAVMIMAGES/GuestImages/user_domain_hostname/vm.cfg

At this point all the user domains should come up along with Oracle Grid Infrastructure and the Oracle Database instances. The database instances should join the Oracle RAC cluster formed by the other surviving management domain nodes.

5.20.3 Scenario 2: Re-imaging the Management Domain and Restoring User Domains from Backups

This procedure re-images the management domain and reconstructs all the user domains.

The following procedure can be used when the management domain is damaged beyond repair and no backup exists for the management domain, but there is a backup available of the storage repository (/EXAVMIMAGES file system) housing all the user domains.

  1. Re-image the management domain with the image used in the other management domains in the rack using the procedure described in Re-Imaging the Oracle Exadata Database Server.
  2. Run the following commands:
    # /opt/oracle.SupportTools/switch_to_ovm.sh
    
    # /opt/oracle.SupportTools/reclaimdisks.sh –free –reclaim
    
  3. If the recovery is on Oracle Exadata Database Machine eighth rack, then perform the procedure described in Configuring Oracle Exadata Database Machine Eighth Rack Oracle Linux Database Server After Recovery.
  4. Rebuild the OCFS2 file system on the /dev/sda3 partition.
    # umount /EXAVMIMAGES
    
    # mkfs -t ocfs2 -L ocfs2 -T vmstore --fs-features=local /dev/sda3 --force
    
  5. Mount the OCFS2 partition /dev/sda3 on /EXAVMIMAGES.
    # mount -t ocfs2 /dev/sda3 /EXAVMIMAGES
    
  6. Mount the backup NFS server to restore the /EXAVMIMAGES file system which holds the user domain images.
    # mkdir -p /remote_FS
    
    # mount -t nfs -o ro,intr,soft,proto=tcp,nolock nfs_ip:/location_of_backup /remote_FS
    
  7. Restore the /EXAVMIMAGES file system.
    # tar -Spxvf /remote_FS/backup-of-exavmimages.tar -C /EXAVMIMAGES
    

    Note:

    The restore process of storage repository restores the user domain specific files (files under /EXAVMINAGES/GuestImages/user_domain/) as regular files and not as OCFS2 reflinks, which is what these files in the storage repository were originally at the time of the user domain creation. Consequently, the space usage in /EXAVMINAGES may go up after the restoration process when compared to the original space usage at the time of the backup.
  8. Manually configure the network bridges.
    1. Determine the version of the ovmutils RPM.
      # rpm -qa|grep ovmutils
      
    2. If the version of the ovmutils RPM is earlier than 12.1.2.2.0, perform these steps:
      1. Back up /opt/exadata_ovm/exadata.img.domu_maker. You will need the backup copy later.

        # cp /opt/exadata_ovm/exadata.img.domu_maker /opt/exadata_ovm/exad
        ata.img.domu_maker-orig
        
      2. Open the /opt/exadata_ovm/exadata.img.domu_maker file in a text editor such as vi, and search for g_do_not_set_bridge=yes. This string should be located a few lines below the case statement option network-discovery).

        Change the string to g_do_not_set_bridge=no.

        Save and exit /opt/exadata_ovm/exadata.img.domu_maker.

      3. Run /opt/exadata_ovm/exadata.img.domu_maker manually for every XML file in the /EXAVMIMAGES/conf directory.

        # cd /EXAVMIMAGES/conf
        # ls -1|while read file; do /opt/exadata_ovm/exadata.img.domu_maker 
        network-discovery $file /tmp/netdisc-$file; done
        
      4. Restore /opt/exadata_ovm/exadata.img.domu_maker from the backup copy.

        # cp /opt/exadata_ovm/exadata.img.domu_maker-orig /opt/exad
        ata_ovm/exadata.img.domu_maker
        
    3. If the version of the ovmutils RPM is 12.1.2.2.0 or later, then run the following command:
      # /opt/exadata_ovm/exadata.img.domu_maker add-bonded-bridge-
      dom0 vmbondeth0 eth4 eth5
      
  9. For each user domain directory in the /EXAVMIMAGES/GuestImages directory, perform the following steps:
    1. Get the UUID of the user domain.
      # grep ^uuid /EXAVMIMAGES/GuestImages/user_domain_hostname/vm.cfg
      |awk -F"=" '{print $2}'|sed s/"'"//g|sed s/" "//g
      

      The command returns the uuid value, which is used in the commands below.

    2. Create a sub-directory for the UUID.
      # mkdir -p /OVS/Repositories/uuid 
    3. Create a symbolic link for the vm.cfg file for the user_domain_hostname in the new UUID directory.
      # ln -s /EXAVMIMAGES/GuestImages/user_domain_hostname/vm.cfg /OVS/Repositories/uuid/vm.cfg
    4. Configure autostart for the user_domain_hostname.
      # ln -s /OVS/Repositories/uuid/vm.cfg /etc/xen/auto/user_domain_hostname.cfg
    5. Create the VirtualDisks sub-directory.
      # mkdir VirtualDisks
    6. Enter the VirtualDisks directory.
      # cd VirtualDisks
    7. Create four symbolic links in this directory using the four disk image names in the vm.cfg file, pointing to the four *.img files in /EXAVMIMAGES/GuestImages/user_domain_hostname directory.

      For example, the following is a sample disk entry in a sample vm.cfg file in a /OVS/Repositories/uuid directory:

      disk =  ['file:/OVS/Repositories/6e7c7109c1bc4ebba279f8
      4e595e0b27/VirtualDisks/dfd641a1c6a84bd69643da704ff98594.img,xv
      da,w','file:/OVS/Repositories/6e7c7109c1bc4ebba279f84e595e0b27/
      VirtualDisks/d349fd420a1e49459118e6a6fcdbc2a4.img,xvdb,w','file
      :/OVS/Repositories/6e7c7109c1bc4ebba279f84e595e0b27/VirtualDisk
      s/8ac470eeb8704aab9a8b3adedf1c3b04.img,xvdc,w','file:/OVS/Repos
      itories/6e7c7109c1bc4ebba279f84e595e0b27/VirtualDisks/333e7ed28
      50a441ca4d2461044dd0f7c.img,xvdd,w']
      

      You can list the four *.img files in the /EXAVMIMAGES/GuestImages/user_domain_hostname directory:

      # ls /EXAVMIMAGES/GuestImages/user_domain_name/*.img
      /EXAVMIMAGES/GuestImages/user_domain_name/System.img
      /EXAVMIMAGES/GuestImages/user_domain_name/grid12.1.0.2.2.img
      /EXAVMIMAGES/GuestImages/user_domain_name/db12.1.0.2.2-3.img
      /EXAVMIMAGES/GuestImages/user_domain_name/pv1_vgexadb.img
      

      In this example, the following commands can be used to create the four symbolic links where dbm01db08vm01 is the user domain host name:

      # ln -s /EXAVMIMAGES/GuestImages/dbm01db08vm01/System.img
       $(grep ^disk /EXAVMIMAGES/GuestImages/dbm01db08vm01/vm.cfg|awk 
      -F":" '{print $2}'|awk -F"," '{print $1}'|awk -F"/" '{print $6}')
      
      # ln -s /EXAVMIMAGES/GuestImages/dbm01db08vm01/grid12.1.0.2.2.img
       $(grep ^disk /EXAVMIMAGES/GuestImages/dbm01db08vm01/vm.cfg|awk 
      -F":" '{print $3}'|awk -F"," '{print $1}'|awk -F"/" '{print $6}')
      
      # ln -s /EXAVMIMAGES/GuestImages/dbm01db08vm01/db12.1.0.2.2-3.img
       $(grep ^disk /EXAVMIMAGES/GuestImages/dbm01db08vm01/vm.cfg|awk 
      -F":" '{print $4}'|awk -F"," '{print $1}'|awk -F"/" '{print $6}')
      
      # ln -s /EXAVMIMAGES/GuestImages/dbm01db08vm01/pv1_vgexadb.img 
      $(grep ^disk /EXAVMIMAGES/GuestImages/dbm01db08vm01/vm.cfg|awk 
      -F":" '{print $5}'|awk -F"," '{print $1}'|awk -F"/" '{print $6}')
      
  10. Restart each user domain.
    # xm create /EXAVMIMAGES/GuestImages/user_domain_hostname/vm.cfg
    

At this point all the user domains should start along with the Oracle Grid Infrastructure and the database instances. The node should join the Oracle RAC cluster formed by the other surviving management domain nodes.

5.20.4 Scenario 3: Restoring and Recovering User Domains from Snapshot Backups

Use this procedure to restore lost or damaged files of a user domain using a snapshot-based user domain backup taken from inside a user domain.

To use this procedure, the user domain backup must have been created using the procedure described in Method 3: Back up a User Domain from Inside the User Domain.

  1. Log in to the user domain as the root user.
  2. Mount the backup NFS server to restore the damaged or lost files.
    # mkdir -p /root/mnt
    
    # mount -t nfs -o ro,intr,soft,proto=tcp,nolock nfs_ip:/location_of_backup /root/mnt
    
  3. Extract the damaged or lost files from the backup to a staging area.

    Prepare a staging area to hold the extracted files. The backup LVM LVDbSys2 can be used for this:

    # mkdir /backup-LVM
    
    # mount /dev/mapper/VGExaDb-LVDbSys2 /backup-LVM
    
    # mkdir /backup-LVM/tmp_restore
    
    # tar -pjxvf /root/mnt/tar_file_name -C /backup-LVM/tmp_restore absolute_path_of_file_to_be_restored
    
  4. Restore the damaged or lost files from the temporary staging area as needed.
  5. Restart the user domain.

5.21 Removing an Oracle RAC Cluster Running in Oracle VM

You can remove all Oracle RAC nodes of an Oracle VM cluster, including the databases running within the cluster and all data stored on the Oracle Exadata Storage Server used by those databases.

To remove only a subset of user domains of a Oracle VM cluster, refer to the next section.

There are two main steps to remove a Oracle VM cluster:

  • Remove the user domain files from the management domain.

  • Remove the unused Oracle Exadata grid disks.

Note:

If the Oracle Exadata Deployment Assistant xml configuration files are to be reused later, then they will be not synchronized because the definition for the removed user domain still exists in Oracle Exadata Deployment Assistant files.

  1. Run the following example script as the grid software owner on any user domain to be removed.

    The example shell script generates two scripts, list_griddisk.sh and drop_griddisk.sh, that are run later in this procedure. Do not run the generated scripts until instructed.

    #!/bin/bash
     
    # Run this script as the Grid Infrastructure software owner.
    #
    # This script identifies griddisks used by this cluster and the cells to
    # which they belong, then creates two shell scripts - the list script to
    # show the current status, and the drop script to drop the griddisks.
    #
    # In order for the drop script to succeed, the griddisks must not be in use,
    # meaning databases and CRS are down, and the list script returns no output.
    #
    # The generated scripts are designed to run via dcli -x
     
    ORACLE_SID=$(awk -F: '/^+ASM/{print $1}' /etc/oratab)
    ORAENV_ASK=NO . oraenv >/dev/null
     
    listGriddiskScript=list_griddisk.sh
    dropGriddiskScript=drop_griddisk.sh
     
    rm -f $listGriddiskScript $dropGriddiskScript
     
    gridDiskList=$(asmcmd lsdsk --suppressheader | awk -F'/' '{print $NF}')
    if [[ ${PIPESTATUS[0]} != 0 ]]; then echo "asmcmd failed - exiting"; exit 1; fi
     
    cellList=$(echo "$gridDiskList" | awk -F_ '{print $NF}' | sort -u)
     
    for cell in $cellList; do
      myGriddisks=$(echo "$gridDiskList" | grep ${cell}$ | tr '\n' ',')
      echo "[[ \$(hostname -s) == ${cell} ]] && cellcli -e 'LIST GRIDDISK \
            ${myGriddisks%,} attributes name, asmDiskGroupName, asmModeStatus \
            where asmModeStatus != UNKNOWN'" >> $listGriddiskScript
      echo >> $listGriddiskScript
    done
     
    chmod +x $listGriddiskScript
     
    echo
    echo "Run the following command to list griddisks in use by this cluster:"
    echo
    echo "# dcli -l celladmin -c ${cellList//$'\n'/,} -x $listGriddiskScript"
    echo
     
    for cell in $cellList; do
      myGriddisks=$(echo "$gridDiskList" | grep ${cell}$ | tr '\n' ',')
      echo "[[ \$(hostname -s) == ${cell} ]] && cellcli -e 'DROP GRIDDISK \
            ${myGriddisks%,}'" >> $dropGriddiskScript
      echo >> $dropGriddiskScript
    done
     
    chmod +x $dropGriddiskScript
     
    echo
    echo "Stop CRS on all nodes in this cluster, then run the following"
    echo "command to drop all griddisks used by this cluster:"
    echo
    echo "# dcli -l celladmin -c ${cellList//$'\n'/,} -x $dropGriddiskScript"
    echo
     
    exit 
    
  2. Shut down the databases and Oracle Grid Infrastructure in all user domains that will be removed:
     # Grid_home/bin/crsctl stop crs -f
    
  3. Run the list_griddisk.sh script generated earlier from any user domain that will be removed.

    Note:

    • Run the script using the dcli command to connect as the celladmin user to all Oracle Exadata Storage Servers in the configuration.

    • Before running the dcli command, set up a passwordless SSH connection between the grid software owner on the database server and the celladmin user on the cells. Otherwise, the command will keep prompting you to enter the password.

    The following is an example of the command:

    $ dcli -l celladmin -c dm01celadm01,dm01celadm02,dm01celadm03  \
    -x list_griddisk.sh
    

    The list_griddisk.sh script should not output any grid disks. Grid disks returned from the list_griddisk.sh script are considered still in use.

    Do not proceed until the list_griddisk.sh script returns empty output indicating no grid disks are in use. Verify that Oracle Grid Infrastructure and the databases are shut down on all user domains to be dropped.

  4. Run the drop_griddisk.sh script generated earlier from any user domain that you want to remove.

    Run the script using the dcli command to connect as the celladmin user to all Oracle Exadata Storage Servers in the configuration.

    $ dcli -l celladmin -c dm01celadm01,dm01celadm02,dm01celadm03 \
    -x drop_griddisk.sh
    
  5. Run the exadata.img.domu_maker command from the management domain of each user domain you want to remove.

    This command removes the user domains, where DomainName is the name of the user domain.

    # /opt/exadata_ovm/exadata.img.domu_maker remove-domain DomainName
    

    In the following example, the commands remove the two user domains for a two-node Oracle VM RAC cluster in which the user domain dm01db01vm04 runs on the management domain dm01db01, and the user domain dm01db02vm04 runs on the management domain dm01db02.

    [root@dm01db01 ~] # /opt/exadata_ovm/exadata.img.domu_maker \
    remove-domain dm01db01vm04
    [INFO] Start with command line: /opt/exadata_ovm/exadata.img.domu_maker \
     remove-domain dm01db01vm04
    [INFO] Shutting down DomU dm01db01vm04
    [INFO] Autostart link for dm01db01vm04 deleted from /etc/xen/auto
    [INFO] Deleted OVM repository /OVS/Repositories/7bfd49d6bd5a4b2db2e46e8234788067 for DomU dm01db01vm04
    [INFO] Deleted guest vm /EXAVMIMAGES/GuestImages/dm01db01vm04 for \
    DomU dm01db01vm04
     
    [root@dm01db02 ~]# /opt/exadata_ovm/exadata.img.domu_maker \
    remove-domain dm01db02vm04
    [INFO] Start with command line: /opt/exadata_ovm/exadata.img.domu_maker \
    remove-domain dm01db02vm04
    [INFO] Shutting down DomU dm01db02vm04
    [INFO] Autostart link for dm01db02vm04 deleted from /etc/xen/auto
    [INFO] Deleted OVM repository /OVS/Repositories/1d29719ff26a4a17aca99b2f89fd8032 for DomU dm01db02vm04
    [INFO] Deleted guest vm /EXAVMIMAGES/GuestImages/dm01db02vm04  \
    for DomU dm01db02vm04
    

5.22 Deleting a User Domain from an Oracle VM Oracle RAC Cluster

You can remove a single Oracle RAC node from an Oracle VM cluster.

The Oracle Exadata grid disks remain in use by the remaining nodes in the cluster, and must not be dropped.

Note:

If Oracle Exadata Deployment Assistant xml configuration files are to be reused later, then they will be not synchronized because the definition for the removed user domain still exists in Oracle Exadata Deployment Assistant files.

  1. Delete the cluster node.
  2. Use the following command to shut down and remove the user domain, where DomainName is the name of the domain:
    # /opt/exadata_ovm/exadata.img.domu_maker remove-domain DomainName
    

    This command removes the user domain files from the management domain.

5.23 Implementing Tagged VLAN Interfaces

This topic describes the implementation of tagged VLAN interfaces in Oracle VM environments on Exadata.

Oracle databases running in Oracle VM guests on Oracle Exadata Database Machine are accessed through the client Ethernet network defined in the Oracle Exadata Deployment Assistant (OEDA) configuration tool. Client network configuration in both the management domain (dom0) and user domains (domU's) is done automatically when the OEDA installation tool creates the first user domain during initial deployment.

The following figure shows a default bonded client network configuration:

Figure 5-1 NIC Layout in an Oracle Virtual Environment

Description of Figure 5-1 follows
Description of "Figure 5-1 NIC Layout in an Oracle Virtual Environment"

The network has the following configuration:

  1. In the dom0, eth slave interfaces (for example, eth1 and eth2, or eth4 and eth5) that allow access to the domU client network defined in OEDA are discovered, configured, and brought up, but no IP is assigned.

  2. In the dom0, bondeth0 master interface is configured and brought up, but no IP is assigned.

  3. In the dom0, bridge interface vmbondeth0 is configured, but no IP is assigned.

  4. In the dom0, one virtual backend interface (vif) per domU that maps to that particular domU's bondeth0 interface is configured and brought up, but no IP is assigned. These vifs are configured on top of the bridge interface vmbondeth0, and the mapping between the dom0 vif interface and its corresponding user domain interface bondeth0 is defined in the user domain configuration file called vm.cfg, located in /EXAVMIMAGES/GuestImages/user domain name.

For default installations, a single bondeth0 and a corresponding vmbondeth0 bridge interface is configured in the dom0 as described above. This bondeth0 interface is based on the default Access Virtual Local Area Network (Access VLAN). The ports on the switch used by the slave interfaces making up bondeth0 are configured for Access VLAN.

Using VLAN Tagging

If there is a need for virtual deployments on Exadata to access additional VLANs on the client network, such as enabling network isolation across user domains, then 802.1Q-based VLAN tagging is a solution. The following figure shows a client network configuration with VLAN tagging.

Figure 5-2 NIC Layout for Oracle Virtual Environments with VLAN Tagging

Description of Figure 5-2 follows
Description of "Figure 5-2 NIC Layout for Oracle Virtual Environments with VLAN Tagging"

For instructions on how to configure and use such additional VLAN tagged interfaces on the client network, see My Oracle Support note 2018550.1. The Access VLAN must stay working and configured before and after these instructions are followed. At no time is the Access VLAN to be disabled.

5.24 Implementing InfiniBand Partitioning across Oracle VM Oracle RAC Clusters on Oracle Exadata Database Machine

For Oracle Real Application Clusters (Oracle RAC) clusters running in Oracle VM on Oracle Exadata Database Machine, you can isolate the network traffic on the InfiniBand network for each Oracle RAC clusters using custom InfiniBand partitioning, dedicated partition keys, and partitioned tables.

5.24.1 About InfiniBand Partitioning Across Oracle RAC Clusters Running in Oracle VM

An InfiniBand partition defines a group of InfiniBand nodes or members that are allowed to communicate with one another.

One of the key requirements of consolidated systems from a security standpoint is network isolation across the multiple environments within a consolidated system. For consolidations achieved using Oracle VM Oracle Real Application Clusters (Oracle RAC) clusters on Oracle Exadata, this means isolation across the different Oracle RAC clusters such that network traffic of one Oracle RAC cluster is not accessible to another Oracle RAC cluster. For the Ethernet networks, this is accomplished using VLAN tagging as described in My Oracle Support DocID 2018550.1. For the InfiniBand network, this is accomplished using custom InfiniBand partitioning, dedicated partition keys, and partitioned tables.

With InfiniBand partitioning, partitions identified by unique partition keys are created and are managed by the master subnet manager. Members are then assigned to these custom partitions. Members within a partition can only communicate among themselves (depending on the membership as explained in the Appendix 1 of My Oracle Support DocID 2018550.1). A member of one partition cannot communicate with a member of a different partition regardless of the membership. Continuing along these lines, the Oracle VM Oracle RAC nodes of one particular cluster are assigned one dedicated partition for the clusterware communication and one partition for communication with the storage cells. This way, the nodes of one Oracle RAC cluster will not be able to communicate with the nodes of another Oracle RAC cluster that belong to a different partition. The nodes in each Oracle RAC cluster have different partition keys assigned to them.

By default, the InfiniBand subnet manager provides a single partition that is identified by the partition key 0x7FFF (limited membership) or 0xFFFF (full membership). In Oracle VM deployments on Oracle Exadata Database Machine where custom InfiniBand partitioning is not used, the partition key 0xFFFF is used across all the user domains.

Figure 5-3 Oracle VM Oracle RAC Clusters without InfiniBand Network Isolation Across Clusters

Description of Figure 5-3 follows
Description of "Figure 5-3 Oracle VM Oracle RAC Clusters without InfiniBand Network Isolation Across Clusters"

With non-default custom partitions in place for implementing isolation across the Oracle VM Oracle RAC clusters, the configuration changes to what is shown in the next image. New interfaces clib0, clib1 (for the cluster pkey) and stib0, stib1 (for the storage pkey) exist in each of the user domains (domU's).

There is no change to InfiniBand interfaces in the management domain (dom0).

Figure 5-4 Oracle VM Oracle RAC Clusters with InfiniBand Network Isolation Across Clusters Using InfiniBand Partitioning

Description of Figure 5-4 follows
Description of "Figure 5-4 Oracle VM Oracle RAC Clusters with InfiniBand Network Isolation Across Clusters Using InfiniBand Partitioning"

5.24.2 Requirements for Implementing InfiniBand Partitioning across OVM RAC Clusters

Before configuring InfiniBand partitioning, ensure that:

  • You have configured OVM on your Exadata system.

  • All the user domains and storage cells are using the default partition key 0xFFFF.

  • You have set up passwordless secure shell (ssh) access for the root user from one of the management domains (dom0 node) to all the OVM RAC cluster nodes, storage cells, and InfiniBand switches.

  • InfiniBand switches are installed with firmware versions 2.0.4 or above.

  • You have an understanding of InfiniBand partitioning.

5.24.3 About InfiniBand Partitioning Network Configuration

Plan and allocate sets of IP addresses and netmasks for each Oracle VM RAC cluster that will be used by the cluster pkey interfaces and the storage pkey interfaces when InfiniBand partitioning gets implemented in the cluster.

Within an Oracle VM RAC cluster, the cluster pkey IP address and netmask should be on a separate subnet from the storage pkey IP address and netmask.

The tables below can be used as reference for one particular RAC cluster:

Table 5-1 Existing Configuration

Interface Name IP Address Netmask

ib0

192.168.12.153

255.255.248.0

ib1

192.168.12.154

255.255.248.0

The following table shows the new IP addresses and netmasks required by the pkey interfaces while implementing InfiniBand Partitioning for that one Oracle RAC cluster.

Table 5-2 New IP Addresses and Netmasks Required by the pkey Interfaces

Interface Name IP Address Netmask

clib0

192.168.112.1

255.255.248.0

clib1

192.168.112.2

255.255.248.0

stib0

192.168.114.1

255.255.240.0

stib1

192.168.114.2

255.255.240.0

5.24.4 Configuring InfiniBand Partitioning across Oracle VM RAC Clusters

The steps for configuring InfiniBand Partitioning across Oracle RAC clusters running in Oracle VM are described here.

In this procedure, the Oracle RAC clusters incur a minimal downtime. The downtime occurs when the Oracle RAC cluster is restarted to use the new interfaces.

Before you start this task, download and untar the file create_pkeys.tar. This file can be downloaded from Implementing InfiniBand Partitioning across OVM RAC clusters on Exadata (My Oracle Support Doc ID 2075398.1). The file should be downloaded to one of the management domain (dom0) nodes. This is the node that you will use for running all the scripts in this procedure. This node will be referred to as driver_dom0 in this procedure.

When you untar the file, you should get three files:

  • create_pkeys_on_switch.sh
  • run_create_pkeys.sh
  • create_pkey_files.sh
  1. Allocate IP addresses to be used by the pkey interfaces.

    Plan and allocate sets of IP addresses and netmasks for each Oracle VM RAC cluster that will be used by the cluster pkey interfaces and the storage pkey interfaces when InfiniBand partitioning gets implemented in the cluster.

    Refer to the topic About InfiniBand Partitioning Network Configuration for an example.

  2. On the InfiniBand switches, create a dedicated partition (cluster pkey) for each Oracle RAC cluster to be used by the clusterware and create one partition (storage pkey) to be used by all the Oracle VM RAC clusters and the storage cells for communication between the Oracle RAC cluster nodes and the storage cells.

    You assign a pkey to each partition as a simplified means of identifying the partition to the Subnet Manager. Pkeys are 15-bit integers. Values 0x0001 and 0x7fff are default partitions. Use values between 0x0002 and 0x7ffe for your pkeys.

    1. Enable password-less ssh equivalence for the root user from the driver_dom0 management domain (dom0) node to all the switches on the InfiniBand fabric.

      Use a command similar to the following where ib_switch_list refers to a file that contains the list of all the InfiniBand switches on the fabric, with each switch name on a separate line.

      # dcli –g ib_switch_list -l root –k
    2. Run the script create_pkeys_on_switch.sh from driver_dom0 to create and configure the partition keys on the InfiniBand switches.

      Note:

      Each execution of the script create_pkeys_on_switch.sh creates exactly one partition. You must run the script once for each partition to be created. For example, an environment that contains two Oracle VM RAC clusters will have a total of three partitions: one storage partition and two cluster partitions (one per Oracle RAC cluster). In this example, you will need to run create_pkeys_on_switch.sh three times.

      You must run the script on only one node (driver_dom0). The script creates the partitions in all the switches provided as input during the execution of the script.

    3. After you finish running the script, verify the partitions were created on all the switches.
      # /usr/local/sbin/smpartition list active no-page

      The following example output shows the default partitions (0x0001 and 0x7fff), and an additional partition, 0x0004. The partition with pkey 0x0004 is configured for IPoIB and has two member ports that are assigned full membership of the partition.

      # Sun DCS IB partition config file
      #! version_number : 1
      #! version_number : 12
      Default=0x7fff, ipoib :
      ALL_CAS=full,
      ALL_SWITCHES=full,
      SELF=full;
      SUN_DCS=0x0001, ipoib :
      ALL_SWITCHES=full;
       = 0x0004,ipoib: 
      0x0021280001cf3787=full, 
      0x0021280001cf205b=full; 

      At this stage ensure that you have created all the required partitions.

  3. On the Oracle VM RAC nodes and on the storage cells, generate all the relevant network configuration files for the new IP over InfiniBand (IPoIB) interfaces.

    Each partition requires a new IPoIB network interface.

    This step makes the following changes on the Oracle RAC cluster nodes:

    • Modifies these files:

      • /etc/sysconfig/network-scripts/ifcfg-ib0
      • /etc/sysconfig/network-scripts/ifcfg-ib1
    • Removes these files:

      • /etc/sysconfig/network-scripts/rule-ib0
      • /etc/sysconfig/network-scripts/rule-ib1
      • /etc/sysconfig/network-scripts/route-ib0
      • /etc/sysconfig/network-scripts/route-ib1
    • Creates the following new files in /etc/sysconfig/network-scripts:

      • ifcfg-clib0, ifcfg-clib1
      • rule-clib0, rule-clib1
      • route-clib0, route-clib1
      • ifcfg-stib0, ifcfg-stib1
      • rule-stib0, rule-stib1
      • route-stib0, route-stib1

    Note:

    If this step fails, before you rerun this step:

    • Restore all the files from /etc/sysconfig/network-scripts/backup-for-pkeys to /etc/sysconfig/network-scripts.
    • Remove the newly created files listed in this step.
    1. Make sure passwordless ssh is set up from the driver_dom0 node to all the Oracle RAC cluster nodes and the storage cells that need to be configured for partition keys.
    2. Make sure run_create_pkeys.sh and create_pkey_files.sh are executable and they are in the same directory on driver_dom0.
    3. Run run_create_pkeys.sh.

      For cluster nodes, you need to run the script a total of four times for every cluster node with a node_type value of compute.

      The syntax for this script is:

      run_create_pkeys.sh node_name interface_name pkey_id 
      node_type pkey_ipaddr pkey_netmask pkey_interfaceType
      • node_name specifies the cluster node.
      • interface_name is either ib0 or ib1.
      • pkey_id specifies the pkey without the 0x prefix. The value used here is the cluster partition key derived from the cluster pkey_id value entered in step 2.
      • node_type is either compute or cell.
      • pkey_ipaddr specifies the IP address.
      • pkey_netmask specifies the netmask in CIDR format, for example, /21.
      • pkey_interfaceType is cluster or storage for compute node types, or storage for cell node types.

      Note:

      The pkey_ipaddr and pkey_netmask of the cluster pkey interface must be on a different subnet from the pkey_ipaddr and pkey_netmask of the storage pkey interface.

      You can use the following command to derive the partition key values to be used for the run_create_pkeys.sh script from the pkey_id value entered in step 2.

      FinalHexValue=$(echo "obase=16;ibase=2;$(expr 1000000000000000 
      + $(echo "obase=2;ibase=16;$(echo $HexValue|tr [:lower:] [:upper:])"|bc))"
      |bc|tr [:upper:] [:lower:])

      FinalHexValue is the value that will be entered in the command here and HexValue is the value entered in step 2 for pkey_id.

      The following table provides an example of the inputs for the four runs for a cluster node:

      Table 5-3 Four Runs for Cluster Nodes

      Run Interface Name pkey_id node_type pkey_ipaddress pkey_netmask pkey_interfaceType

      1

      ib0

      a000

      compute

      192.168.12.153

      /21

      cluster

      2

      ib1

      a000

      compute

      192.168.12.154

      /21

      cluster

      3

      ib0

      aa00

      compute

      192.168.114.15

      /20

      storage

      4

      ib1

      aa00

      compute

      192.168.114.16

      /20

      storage

      You use these values in each execution of the script, denoted by the Run column, as shown in this example, where vm-guest-1 is the name of the cluster node.

      # ./run_create_pkeys.sh vm-guest-1 ib0 a000 compute 192.168.12.153 /21 cluster
      

    At this stage all the required networking files listed at the beginning of this step have been created for the new pkey-enabled network interfaces on the Oracle VM RAC cluster nodes.

    Oracle Grid Infrastructure has also been modified to make use of the new network interfaces upon restart. The output of the command $GRID_HOME/bin/oifcfg getif should list clib0 and clib1 in the list of interfaces to be used for the cluster interconnect.

  4. Modify Oracle ASM and Oracle RAC CLUSTER_INTERCONNECTS parameter.
    1. Log in to each of the Oracle ASM instances in the Oracle RAC cluster using SQL*Plus as SYS, and run the following command:
      ALTER SYSTEM SET cluster_interconnects='<cluster_pkey_IP_address_of_ib0>:
      <cluster_pkey_IP_address_of_ib1>' scope=spfile  sid='<name_of_current_ASM_instance>';

      For example:

      ALTER SYSTEM SET cluster_interconnects='192.168.12.153:192.168.12.154'
        scope=spfile  sid='+ASM1';
    2. Log in to each of the database instances in the Oracle RAC cluster using SQL*Plus, and run the same command for the Oracle RAC instance:

      For example:

      ALTER SYSTEM SET cluster_interconnects='192.168.12.153:192.168.12.154'
        scope=spfile  sid='RACDB1';
    3. Shut down and disable CRS auto-start on all the Oracle RAC cluster nodes.
      # Grid_home/bin/crsctl stop crs
      
      # Grid_home/bin/crsctl disable crs

    At this stage Oracle Grid Infrastructure, the Oracle ASM instances, and the Oracle Database instances have been modified to make use of the newly created network interfaces.

  5. Modify cellip.ora and cellinit.ora on all the cluster nodes (user domains).

    Perform these steps on any one database server node of the cluster (user domain for an Oracle VM RAC cluster).

    1. Make a backup of the cellip.ora and cellinit.ora files.
      # cd /etc/oracle/cell/network-config
      # cp cellip.ora cellip.ora-bak
      # cp cellinit.ora cellinit.ora-bak
    2. Modify the cellip.ora-bak file to replace the existing IP address with the two storage pkey IP addresses of every storage cell that will be setup in step 7.
      The two IP addresses are separated by a semi-colon (;).
    3. Make sure ssh equivalence is set up for the root user to all the cluster nodes from this cluster node.
    4. Replace the cellip.ora file on all the cluster nodes.

      Use the following commands to backup and then replace the cellip.ora file on all the cluster nodes. In this example cluster_nodes refers to a file containing the names of all the Oracle RAC cluster nodes of the Oracle VM RAC cluster, with each node on a separate line.

      # /usr/local/bin/dcli -g cluster_nodes –l root 
      "/bin/cp /etc/oracle/cell/network-config/cellip.ora /e
      tc/oracle/cell/network-config/cellip-orig.ora"
      
      # /usr/local/bin/dcli -g cluster_nodes –l root –f celli
      p.ora-bak –d /etc/oracle/cell/network-config/cellip.ora
      
    5. Manually edit the /etc/oracle/cell/network-config/cellinit.ora-bak file to replace the existing IP addresses and netmask with the two storage pkey IP addresses and netmask of the cluster node which was used in step 3.
    6. Make sure ssh equivalence is set up for the root user to all the cluster nodes from this cluster node.
    7. Replace the cellinit.ora file on all the cluster nodes.

      The IP address and netmask were used in the third and fourth run of step 3.

      Use the following commands to backup and then replace the cellinit.ora file on all the cluster nodes. In this example cluster_nodes refers to a file containing the names of all the Oracle RAC cluster nodes of the Oracle VM RAC cluster, with each node on a separate line.

      # /usr/local/bin/dcli -g cluster_nodes –l root 
      "/bin/cp /etc/oracle/cell/network-config/cellinit.ora /e
      tc/oracle/cell/network-config/cellinit-orig.ora"
      
      # /usr/local/bin/dcli -g cluster_nodes –l root –f cellini
      t.ora-bak –d /etc/oracle/cell/network-config/cellinit.ora
      
  6. In the management domains (dom0s), modify the user domain configuration file for each user domain to use the partition key applicable to that user domain.

    Modify all the relevant vm.cfg files in the management domain. This step is applicable only for Oracle VM environments. Log in to all the management domains and manually edit /EXAVMIMAGES/GuestImages/user_domain_name/vm.cfg to include the partition keys created in step 2.

    For example, modify the line:

    ib_pkeys = [{'pf':'40:00.0','port':'1','pkey':['0xffff'
    ,]},{'pf':'40:00.0','port':'2','pkey':['0xffff',]},]

    to:

    ib_pkeys = [{'pf':'40:00.0','port':'1','pkey':['0xa000'
    ,'0xaa00',]},{'pf':'40:00.0','port':'2','pkey':['0xa000
    ','0xaa00',]},]

    In this example, 0xa000 is the cluster partition key derived from the cluster pkey_id value entered in step 2, and 0xaa00 is the storage partition key derived from the storage pkey_id value.

    You can use the following command to derive the partition key values to use in vm.cfg from the pkey_id values entered in step 2.

    FinalHexValue=$(echo "obase=16;ibase=2;$(expr 100000000
    0000000 + $(echo "obase=2;ibase=16;$(echo $HexValue|tr 
    [:lower:] [:upper:])"|bc))"|bc|tr [:upper:] [:lower:])

    FinalHexValue is the value that you enter in vm.cfg and HexValue is the value entered in step 2 for pkey_id.

    Note:

    If your environment has multiple Oracle VM RAC clusters, the next two steps (step 7 and step 8) should be performed only once AFTER steps 3 through step 6 have been executed for all the Oracle VM RAC clusters.
  7. Modify the storage cells to use the newly created IPoIB interfaces.
    1. Make sure run_create_pkeys.sh and create_pkey_files.sh are available and that they are in the same directory on the same driver_dom0 node used in the previous steps.
    2. Make sure passwordless ssh is set up from the driver_dom0 node to all the storage cells that need to be configured for partition keys.
    3. Run run_create_pkeys.sh.

      For storage servers, you need to run the script twice for every storage server with a node_type value of cell.

      The syntax for this script is:

      run_create_pkeys.sh node_name interface_name pkey_id 
      node_type pkey_ipaddr pkey_netmask pkey_interfaceType
      • node_name specifies the storage server.
      • interface_name is either ib0 or ib1.
      • pkey_id specifies the pkey without the 0x prefix. The value used here is the cluster partition key derived from the storage pkey_id value entered in step 2.
      • node_type is either compute or cell.
      • pkey_ipaddr specifies the IP address.
      • pkey_netmask specifies the netmask in CIDR format, for example, /21.
      • pkey_interfaceType is cluster or storage for compute node types, or storage for cell node types.

      You can use the following command to derive the partition key values to be used for the run_create_pkeys.sh script from the pkey_id value entered in step 2.

      FinalHexValue=$(echo "obase=16;ibase=2;$(expr 1000000000000000 
      + $(echo "obase=2;ibase=16;$(echo $HexValue|tr [:lower:] [:upper:])"|bc))"
      |bc|tr [:upper:] [:lower:])

      FinalHexValue is the value that will be entered in the command here and HexValue is the value entered in step 2 for pkey_id.

      The following table provides an example of the inputs for the two runs for a storage server:

      Table 5-4 Two Runs for Storage Servers

      Run Interface Name pkey_id node_type pkey_ipaddress pkey_netmask pkey_interfaceType

      1

      ib0

      aa00

      cell

      192.168.114.1

      /20

      storage

      2

      ib1

      aa00

      cell

      192.168.114.2

      /20

      storage

      You use these values in each execution of the script, denoted by the Run column, as shown in this example, where cell01 is the name of the storage server.

      # ./run_create_pkeys.sh cell01 ib0 aa00 cell 192.168.114.1 /20 storage
      

      Note:

      You can ignore the following messages from the script. The restart of the storage cells at the end of this task will take care of these issues.

      Network configuration altered. Please issue the following commands 
      as root to restart the network and open IB stack: 
        service openibd restart
        service network restart
      A restart of all services is required to put new network configuration into 
      effect. MS-CELLSRV communication may be hampered until restart.

    At this stage the storage servers (cells) have been modified to use the new network interfaces upon restart.

  8. Modify the /opt/oracle.cellos/cell.conf file on each storage sever and restart the storage servers.
    1. Make a backup of the /opt/oracle.cellos/cell.conf file.
      # cd /opt/oracle.cellos
      # cp cell.conf cell.conf-prepkey
    2. Change the Pkey configuration lines in /opt/oracle.cellos/cell.conf.

      Change this line:

      <Pkeyconfigured>no</Pkeyconfigured>

      to:

      <Pkeyconfigured>yes</Pkeyconfigured>

      Change this line for the 2 private interfaces ib0 and ib1:

      <IP_enabled>yes</IP_enabled>

      to:

      <IP_enabled>no</IP_enabled>
    3. Make sure Oracle Grid Infrastructure is stopped on all Oracle VM RAC nodes.
    4. Restart all the storage cell servers.
      # shutdown -r now
    5. Verify that the new pkey-enabled network interfaces are in use.
      # cellcli -e list cell detail | egrep 'interconnect|ipaddress'

      The output should show the new pkey-enabled interfaces (stib0 and stib1) along with the new set of IP addresses.

  9. Restart the Oracle RAC clusters.
    1. Log in to the corresponding management domain of each of the user domain nodes.
    2. Run the following commands:
      # xm shutdown user_domain_name
      
      # xm create /EXAVMIMAGES/GuestImages/user_domain_name/vm.cfg
  10. Start and verify the Oracle Grid Infrastructure stack is fully started on all the cluster nodes.
    1. Start and enable auto-start of the Oracle Grid Infrastructure stack on all the Oracle RAC cluster nodes.
      # $GRID_HOME/bin/crsctl start crs
      
      # $GRID_HOME/bin/crsctl enable crs
    2. After Oracle Grid Infrastructure has started on all the nodes, verify the cluster_interconnects parameter is set to use the newly configured pkey interfaces.

      Log in to a database instance and run the following query:

      SQL> SELECT inst_id, value FROM gv$parameter 
      WHERE name = 'cluster_interconnects'
    3. Remove the old cluster interconnect interfaces from the Oracle Cluster Registry (OCR).
      # Grid_home/bin/oifcfg delif –global ib0/<old subnet>
      
      # Grid_home/bin/oifcfg delif –global ib1/<old subnet>

5.24.5 Implementing InfiniBand Partitioning across OVM RAC Clusters: Setting up Limited Membership

The 12.1.0.2 October 2016 Database Bundle Patch introduces a security enhancement feature where the GUIDs of the database nodes can be assigned to the storage pkey with limited membership instead of full membership, as was the case prior to the 12.1.0.2 October 2016 Bundle Patch. This addresses a security concern where one RAC node from one RAC cluster could talk to a RAC node from another RAC cluster using the storage pkey interfaces.

Full Membership and Limited Membership

An InfiniBand partition defines a group of InfiniBand nodes that are allowed to communicate with one another. With InfiniBand partitioning, you define custom or unique partition keys that are managed by the master subnet manager, and assign members to the custom partition keys. Members with the same partition key can only communicate amongst themselves. A member of one partition key cannot communicate with a member that has a different partition key, regardless of membership type. The OVM RAC cluster nodes of one cluster are assigned one partition key for clusterware communication and another partition key for communication with storage cells. This way, the nodes of one RAC cluster will not be able to communicate with the nodes of another RAC cluster, which have a different partition key assigned to them. This is very similar conceptually to tagged VLANs in the Ethernet world.

Partition keys (pkeys) are 15-bit integers and have a value of 0x1 to 0x7FFF. An additional bit, the membership bit, identifies the membership of a member of the partition. Memberships can be:

  • Full: The membership bit is set to 1. Members with full membership can communicate with each other as well as members with limited membership within same the partition key.

  • Limited: The membership bit is set to 0. Members with limited membership within a partition cannot communicate with each other. However they can communicate with other members with full membership within the same partition.

Combined together, the pkey and the membership bit comprise a 16-bit integer. The most significant bit is the membership bit.

By default, the InfiniBand subnet manager provides a single partition and it is identified by the partition key 0x7FFF (limited membership) or 0xFFFF (full membership).

An HCA port can participate in a maximum of 128 partitions. Each partition key provides a new IPoIB network interface. For example, InfiniBand port 1 with partition key 0xa001 will result in a new network interface. These interfaces are named with meaningful names through the ifcfg-<interface> file parameters.

An InfiniBand node can be a member of multiple partitions. When a packet arrives at a database node, the partition key (pkey) of the packet is matched with the Subnet Manager configuration. This validation prevents a database node from communicating with another database node outside of the partitions of which it is a member.

Every node within the infiniBand fabric has a partition key table which you can see in /sys/class/infiniband/mlx4_0/ports/[1-2]/pkeys. Every Queue Pair (QP) of the node has an index (pkey) associated with it that maps to an entry in that table. Whenever a packet is sent from the QP’s send queue, the indexed pkey is attached with it. Whenever a packet is received on the QP’s receive queue, the indexed pkey is compared with that of the incoming packet. If it does not match, the packet is silently discarded. The receiving Channel Adapter does not know it arrived and the sending Channel Adapter gets no acknowledgement as well that it was received. The sent packet simply gets manifested as a lost packet. It is only when the pkey of the incoming packet matches the indexed pkey of the QP’s receive queue, a handshake is made and the packet is accepted and an acknowledgment is sent to the sending channel adapter. This is how only members of the same partition are able to communicate with each other and not with hosts that are not members of that partition (which means those hosts that does not have that pkey in their partition table).

The steps below describe how to set up this enhancement on a pkey-enabled environment that has the 12.1.0.2 October 2016 Database Bundle Patch applied. There are two possible scenarios, as described below:

Case 1. Implementing the feature on a pkey-enabled environment in a rolling manner

In this case, you have already applied the 12.1.0.2 October 2016 Database Bundle Patch.

Perform the steps below on one node at a time.

  1. Shut down the Grid Infrastructure on the node.

    # $GI_HOME/bin/crsctl stop crs
  2. Determine the two port GUIDs of the dom0 (control domain) which manages this user domain OVM RAC cluster node.

    # /usr/sbin/ibstat | grep Port
  3. Login to the Infiniband Switch where the SM master is running as root.

  4. Run the commands below on the InfiniBand switch.

    # /usr/local/sbin/smpartition start
    
    # /usr/local/sbin/smpartition modify -n <storage pkey name> -port <Port GUID1 of the dom0 from step 2> -m limited
    
    # /usr/local/sbin/smpartition modify -n <storage pkey name> -port <Port GUID2 of the dom0 from step 2> -m limited
    
    # /usr/local/sbin/smpartition commit
  5. Modify the vm.cfg file for this OVM RAC user domain node in the dom0.

    1. Login to the dom0 as root.

    2. Edit /EXAVMIMAGES/GuestImages/<user domain name>/vm.cfg and modify the partition keys as shown in the example below.

      Modify this line:

      ib_pkeys = [{'pf':'40:00.0','port':'1','pkey':[ '0xclpkey','0x<stpkey>',]},{'pf':'40:00.0','port':'2','pkey':[ '0xclpkey','0x<stpkey>',]},]

      to this:

      ib_pkeys = [{'pf':'40:00.0','port':'1','pkey':[ '0xclpkey','0x<mod_stpkey>',]},{'pf':'40:00.0','port':'2','pkey':[ '0xclpkey','0x<mod_stpkey>',]},]

      <mod_stpkey> is derived from <stpkey> using the formula below:

      mod_stpkey=$(echo "obase=16;ibase=2;$(expr $(echo "obase=2;ibase=16;$(echo $stpkey|tr [:lower:] [:upper:])"|bc) - 1000000000000000)"|bc|tr [:upper:] [:lower:])

      Note that <stpkey> and <mod_stpkey> in the formula above are specified without the "0x" prefix.

  6. Modify the /etc/sysconfig/network-scripts/ifcfg-stib* files on the user domain RAC nodes.

    Edit the PKEY_ID in those files using the formula below:

    mod_stpkey=$(echo "obase=16;ibase=2;$(expr $(echo "obase=2;ibase=16;$(echo $stpkey|tr [:lower:] [:upper:])"|bc) - 1000000000000000)"|bc|tr [:upper:] [:lower:])

    mod_stpkey is the new PKEY_ID, and stpkey is the old PKEY_ID.

    Note that <stpkey> and <mod_stpkey> in the formula above are specified without the "0x" prefix.

  7. Modify /opt/oracle.cellos/pkey.conf on the user domain RAC nodes.

    Edit the Pkey for the storage network pkey interfaces (stib*):

    Change:

    <Pkey>0xstpkey</Pkey>

    to:

    <Pkey>0xmod_stpkey</Pkey>

    mod_stpkey is derived from stpkey using the formula below:

    mod_stpkey=$(echo "obase=16;ibase=2;$(expr $(echo "obase=2;ibase=16;$(echo $stpkey|tr [:lower:] [:upper:])"|bc) - 1000000000000000)"|bc|tr [:upper:] [:lower:])

    stpkey and mod_stpkey used in the formula above are specified without the "0x" prefix.

  8. Restart the OVM RAC user domain node.

    1. Login to the dom0 as root.

    2. Run the following commands:

      # xm shutdown <user domain name>
      
      # xm create /EXAVMIMAGES/GuestImages/<user domain name>/vm.cfg
  9. Verify the Grid Infrastructure stack is fully up on the cluster node.

  10. Repeat the steps on the remaining cluster nodes, one node at a time.

Case 2. Implementing the feature on a pkey-enabled environment while you apply the 12.1.0.2 October 2016 Database Bundle Patch in a rolling manner

Perform the steps below on one node at a time.

  1. Apply the 12.1.0.2 October 2016 Database Bundle Patch on the cluster node.

  2. Run the steps 1 through 10 from Case 1 above on the node where the patch was applied.

  3. Move on to the next cluster node and repeat steps 1 and 2 above.

Note:

Once the dom0 GUIDs are converted to limited membership, deployment of any new cluster will have the October 2016 Database Bundle Patch as a prerequisite.

5.25 Running Oracle EXAchk in Oracle VM Environments

Oracle EXAchk version 12.1.0.2.2 and higher supports virtualization on Oracle Exadata Database Machine.

To perform the complete set of Oracle EXAchk audit checks in an Oracle Exadata Database Machine Oracle VM environment, Oracle EXAchk must be installed in and run from multiple locations, as follows:

  • From one management domain (dom0)

  • From one user domain (domU) in each Oracle VM Oracle Real Application Clusters (Oracle RAC) cluster

For example, an Oracle Exadata Database Machine Quarter Rack with 2 database servers containing 4 Oracle VM Oracle RAC clusters (2 nodes per cluster for a total of 8 domU's across both database servers) requires running Oracle EXAchk 5 separate times, as follows:

  1. Run Oracle EXAchk in the first user domain (domU) for the first cluster.

  2. Run Oracle EXAchk in the first user domain (domU) for the second cluster.

  3. Run Oracle EXAchk in the first user domain (domU) for the third cluster.

  4. Run Oracle EXAchk in the first user domain (domU) for the fourth cluster.

  5. Run Oracle EXAchk in the first management domain (dom0).

The audit checks performed by Oracle EXAchk are specified in the following table:

Table 5-5 Audit Checks Performed by Oracle EXAchk

Where to Install and Run Oracle EXAchk Audit Checks Performed

Management domain (dom0)

Hardware and operating system level checks for:

  • Database servers (management domains)
  • Storage servers
  • RDMA Network Fabric
  • RDMA Network Fabric switches

User domain (domU)

Operating system level checks for user domains, and checks for Oracle Grid Infrastructure and Oracle Database

Oracle EXAchk Command Line Options

Oracle EXAchk requires no special command line options. It automatically detects that it is running in an Oracle Exadata Database Machine Oracle VM environment and whether it is running in a management domain or user domain and performs the applicable audit checks. For example, in the simplest case, you can run Oracle EXAchk with no command line options:

./exachk

When Oracle EXAchk is run in the management domain, it performs audit checks on all database servers, storage servers, and RDMA Network Fabric switches accessible through the RDMA Network Fabric.

To run Oracle EXAchk on a subset of servers or switches, use the following command line options:

Table 5-6 Command Line Options for Oracle EXAchk

Option Description

-clusternodes

Specifies a comma-separated list of database servers.

-cells

Specifies a comma-separated list of storage servers.

-ibswitches

Specifies a comma-separated list of RDMA Network Fabric switches.

For example, for an Oracle Exadata Database Machine Full Rack where only the first Quarter Rack is configured for virtualization, but all components are accessible through the RDMA Network Fabric, you can run a command similar to the following from the database server dm01adm01:

./exachk -clusternodes dm01adm01,dm01adm02
   -cells dm01celadm01,dm01celadm02,dm01celadm03
   -ibswitches dm01swibs0,dm01sw-iba0,dm01sw-ibb0