The Oracle Private Cloud Appliance is designed for high availability at every level of its component make-up.
Management Node Failover
During the factory installation of an Oracle Private Cloud Appliance, the management nodes are configured as a cluster. The cluster relies on an OCFS2 file system exported as an iSCSI LUN from the ZFS storage to perform the heartbeat function and to store a lock file that each management node attempts to take control of. The management node that has control over the lock file automatically becomes the master or active node in the cluster.
      When the Oracle Private Cloud Appliance is first initialized, the
      o2cb service is started on each management
      node. This service is the default cluster stack for the OCFS2 file
      system. It includes a node manager that keeps track of the nodes
      in the cluster, a heartbeat agent to detect live nodes, a network
      agent for intra-cluster node communication and a distributed lock
      manager to keep track of lock resources. All these components are
      in-kernel.
    
      Additionally, the ovca service is started on
      each management node. The management node that obtains control
      over the cluster lock and is thereby promoted to the master or
      active management node, runs the full complement of
      Oracle Private Cloud Appliance services. This process also configures the
      Virtual IP that is used to access the active management node, so
      that it is 'up' on the active management node and 'down' on the
      standby management node. This ensures that, when attempting to
      connect to the Virtual IP address that you configured for the
      management nodes, you are always accessing the active management
      node.
    
      In the case where the active management node fails, the cluster
      detects the failure and the lock is released. Since the standby
      management node is constantly polling for control over the lock
      file, it detects when it has control of this file and the
      ovca service brings up all of the required
      Oracle Private Cloud Appliance services. On the standby management node the
      Virtual IP is configured on the appropriate interface as it is
      promoted to the active role.
    
When the management node that failed comes back online, it no longer has control of the cluster lock file. It is automatically put into standby mode, and the Virtual IP is removed from the management interface. This means that one of the two management nodes in the rack is always available through the same IP address and is always correctly configured. The management node failover process takes up to 5 minutes to complete.
Oracle VM Management Database Failover
The Oracle VM Manager database files are located on a shared file system exposed by the ZFS storage appliance. The active management node runs the MySQL database server, which accesses the database files on the shared storage. In the event that the management node fails, the standby management node is promoted and the MySQL database server on the promoted node is started so that the service can resume as normal. The database contents are available to the newly running MySQL database server.
Compute Node Failover
High availability (HA) of compute nodes within the Oracle Private Cloud Appliance is enabled through the clustered server pool that is created automatically in Oracle VM Manager during the compute node provisioning process. Since the server pool is configured as a cluster using an underlying OCFS2 file system, HA-enabled virtual machines running on any compute node can be migrated and restarted automatically on an alternate compute node in the event of failure.
The Oracle VM Concepts Guide provides good background information about the principles of high availability. Refer to the section How does High Availability (HA) Work?.
Storage Redundancy
Further redundancy is provided through the use of the ZFS storage appliance to host storage. This component is configured with RAID-1 providing integrated redundancy and excellent data loss protection. Furthermore, the storage appliance includes two storage heads or controllers that are interconnected in a clustered configuration. The pair of controllers operate in an active-passive configuration, meaning continuation of service is guaranteed in the event that one storage head should fail. The storage heads share a single IP in the storage subnet, but both have an individual management IP address for convenient maintenance access.
Network Redundancy
All of the customer-usable networking within the Oracle Private Cloud Appliance is configured for redundancy. Only the internal administrative Ethernet network, which is used for initialization and ILOM connectivity, is not redundant. There are two of each switch type to ensure that there is no single point of failure. Networking cabling and interfaces are equally duplicated and switches are interconnected as described in Section 1.2.4, “Network Infrastructure”.

