As a converged infrastructure solution, the Oracle Virtual Compute Appliance X3-2 aims to eliminate many of the intricacies of optimizing the system configuration. Hardware components are installed and cabled at the factory. Configuration settings and installation software are preloaded onto the system. Once the appliance is connected to the data center power source and public network, the provisioning process between the administrator pressing the power button of the first management node and the appliance reaching its Deployment Readiness state is entirely orchestrated by the master management node. This section explains what happens behind the curtains as the Oracle Virtual Compute Appliance X3-2 is initialized and all nodes are provisioned.
When power is applied to the first management node, it takes approximately five minutes for the server to boot. While the Oracle Linux 6 operating system is loading, an Apache web server is started, which serves a static welcome page the administrator can browse to from the workstation connected to the appliance management network.
The necessary Oracle Linux services are started as the server comes
up to runlevel 3 (multi-user mode with networking). At this
point, the management node executes a series of system health
checks. It verifies that all expected infrastructure components
are present on the appliance management network and in the
correct predefined location, identified by the rack unit number
and fixed IP address. Next, the management node probes the ZFS
storage appliance for a management NFS export and a management
iSCSI LUN with OCFS2 file system. The storage and its access
groups have been configured at the factory. If the health checks
reveal no problems, the
ocfs2
and
o2cb
services are started up automatically.
When the OCFS2 file system on the shared iSCSI LUN is ready, and
the
o2cb
services have started successfully, the management nodes can
join the cluster. In the meantime, the first management node has
also started the second management node, which will come up with
an identical configuration. Both management nodes eventually
join the cluster, but the first management node will take an
exclusive lock on the shared OCFS2 file system using Distributed
Lock Management (DLM). The second management node remains in
permanent standby and takes over the lock only in case the first
management node goes down or otherwise releases its lock.
With mutual exclusion established between both members of the management cluster, the master management node continues to load the remaining Oracle Virtual Compute Appliance services, including tftp, dhcpd, Oracle VM Manager and the Oracle Virtual Compute Appliance databases. The virtual IP address of the management cluster is also brought online, and the Oracle Virtual Compute Appliance Dashboard is started within WebLogic. The static Apache web server now redirects to the Dashboard at the virtual IP, where the administrator can access a live view of the appliance rack component status.
Once the
dhcpd
service is started, the system state changes to
Provision Readiness, which means it is
ready to discover non-infrastructure components.
To discover compute nodes, the Node Manager on the master
management node uses a DHCP server and the node database. The
node database is a single-access MySQL database, located on the
management NFS share, containing the state and configuration
details of each node in the system, including MAC addresses, IP
addresses and host names. The discovery process of a node begins
with a DHCP request from the ILOM. All discovery and
provisioning actions are asynchronous and occur in parallel. The
DHCP server hands out pre-assigned IP addresses on the appliance
management network (
192.168.4.0/24
). When the Node Manager has verified that a node has a valid
service tag for use with Oracle Virtual Compute Appliance, it launches a series of
provisioning tasks. All required software resources have been
loaded onto the ZFS storage appliance at the factory.
The provisioning process is tracked in the node database by means of status changes. The next provisioning task can only be started if the node status indicates that the previous task has completed successfully. For each valid node, the Node Manager begins by building a PXE configuration and forces the node to boot using Oracle Virtual Compute Appliance runtime services. After the hardware RAID-1 configuration is applied, the node is restarted to perform a kickstart installation of Oracle VM Server. Crucial kernel modules and host drivers for InfiniBand and IO Director support are added to the installation. At the end of the installation process, the network configuration files are updated to allow all necessary network interfaces and bonds to be brought up.
Now that the PVI for the Oracle VM management network exists, the compute node is rebooted one last time to reconfigure the Oracle VM Agent to communicate over the PVI. At this point, the node is ready for Oracle VM Manager discovery.
When the Node Manager detects a fully installed compute node that is ready to join the Oracle VM environment, it issues the necessary Oracle VM CLI commands to add the new node to the Oracle VM server pool. With the discovery of the first node, the system also configures the clustered Oracle VM server pool with the appropriate networking, access to the shared storage, and a virtual IP. Oracle Virtual Compute Appliance X3-2 expects that all compute nodes in one rack belong to a single clustered server pool with High Availability (HA) and Distributed Resource Scheduling (DRS) enabled. When all compute nodes have joined the Oracle VM server pool, the appliance is in Ready state, meaning virtual machines (VMs) can be deployed.
When an expansion compute node is installed, its presence is detected based on the DHCP request from its ILOM. If the new server is identified as an Oracle Virtual Compute Appliance node, an entry is added in the node database with "new" state. This triggers the initialization and provisioning process. New compute nodes are integrated seamlessly to expand the capacity of the running system, without the need for manual reconfiguration by an administrator.