As a converged infrastructure solution, the Oracle Private Cloud Appliance is built to eliminate many of the intricacies of optimizing the system configuration. Hardware components are installed and cabled at the factory. Configuration settings and installation software are preloaded onto the system. Once the appliance is connected to the data center power source and public network, the provisioning process between the administrator pressing the power button of the first management node and the appliance reaching its Deployment Readiness state is entirely orchestrated by the master management node. This section explains what happens as the Oracle PCA is initialized and all nodes are provisioned.
Boot Sequence and Health Checks
When power is applied to the first management node, it takes approximately five minutes for the server to boot. While the Oracle Linux 6 operating system is loading, an Apache web server is started, which serves a static welcome page the administrator can browse to from the workstation connected to the appliance management network.
The necessary Oracle Linux services are started as the server comes
up to runlevel 3 (multi-user mode with networking). At this
point, the management node executes a series of system health
checks. It verifies that all expected infrastructure components
are present on the appliance management network and in the
correct predefined location, identified by the rack unit number
and fixed IP address. Next, the management node probes the ZFS
storage appliance for a management NFS export and a management
iSCSI LUN with OCFS2 file system. The storage and its access
groups have been configured at the factory. If the health checks
reveal no problems, the
ocfs2
and
o2cb
services are started up automatically.
Management Cluster Setup
When the OCFS2 file system on the shared iSCSI LUN is ready, and
the
o2cb
services have started successfully, the management nodes can
join the cluster. In the meantime, the first management node has
also started the second management node, which will come up with
an identical configuration. Both management nodes eventually
join the cluster, but the first management node will take an
exclusive lock on the shared OCFS2 file system using Distributed
Lock Management (DLM). The second management node remains in
permanent standby and takes over the lock only in case the first
management node goes down or otherwise releases its lock.
With mutual exclusion established between both members of the management cluster, the master management node continues to load the remaining Oracle PCA services, including dhcpd, Oracle VM Manager and the Oracle PCA databases. The virtual IP address of the management cluster is also brought online, and the Oracle PCA Dashboard is started within WebLogic. The static Apache web server now redirects to the Dashboard at the virtual IP, where the administrator can access a live view of the appliance rack component status.
Once the
dhcpd
service is started, the system state changes to
Provision Readiness, which means it is
ready to discover non-infrastructure components.
Node Manager
To discover compute nodes, the Node Manager on the master
management node uses a DHCP server and the node database. The
node database is a BerkeleyDB type database, located on the
management NFS share, containing the state and configuration
details of each node in the system, including MAC addresses, IP
addresses and host names. The discovery process of a node begins
with a DHCP request from the ILOM. Most discovery and
provisioning actions are synchronous and occur sequentially,
while time consuming installation and configuration processes
are launched in parallel and asynchronously. The DHCP server
hands out pre-assigned IP addresses on the appliance management
network (
192.168.4.0/24
). When the Node Manager has verified that a node has a valid
service tag for use with Oracle PCA, it launches a series of
provisioning tasks. All required software resources have been
loaded onto the ZFS storage appliance at the factory.
Provisioning Tasks
The provisioning process is tracked in the node database by means of status changes. The next provisioning task can only be started if the node status indicates that the previous task has completed successfully. For each valid node, the Node Manager begins by building a PXE configuration and forces the node to boot using Oracle PCA runtime services. After the hardware RAID-1 configuration is applied, the node is restarted to perform a kickstart installation of Oracle VM Server. Crucial kernel modules and host drivers for InfiniBand and IO Director (Fabric Interconnect) support are added to the installation. At the end of the installation process, the network configuration files are updated to allow all necessary network interfaces and bonds to be brought up.
Now that the PVI for the Oracle VM management network exists, the compute node is rebooted one last time to reconfigure the Oracle VM Agent to communicate over the PVI. At this point, the node is ready for Oracle VM Manager discovery.
Oracle VM Server Pool
When the Node Manager detects a fully installed compute node that is ready to join the Oracle VM environment, it issues the necessary Oracle VM CLI commands to add the new node to the Oracle VM server pool. With the discovery of the first node, the system also configures the clustered Oracle VM server pool with the appropriate networking, access to the shared storage, and a virtual IP. For every compute node added to Oracle VM Manager the IPMI configuration is stored in order to enable convenient remote power-on/off.
Oracle PCA expects that all compute nodes in one rack belong to a single clustered server pool with High Availability (HA) and Distributed Resource Scheduling (DRS) enabled. When all compute nodes have joined the Oracle VM server pool, the appliance is in Ready state, meaning virtual machines (VMs) can be deployed.
Expansion Compute Nodes
When an expansion compute node is installed, its presence is detected based on the DHCP request from its ILOM. If the new server is identified as an Oracle PCA node, an entry is added in the node database with "new" state. This triggers the initialization and provisioning process. New compute nodes are integrated seamlessly to expand the capacity of the running system, without the need for manual reconfiguration by an administrator.
Synchronization Service
As part of the provisioning process, a number of configuration settings are applied, either globally or at individual component level. Some are visible to the administrator, and some are entirely internal to the system. Throughout the life cycle of the appliance, software updates, capacity extensions and configuration changes will occur at different points in time. For example, an expansion compute node may have different hardware, firmware and software compared to the servers already in use in the environment, and it comes with factory default settings that do not match those of the running system. A synchronization service, implemented on the management nodes, can set and maintain configurable parameters across heterogeneous sets of components within an Oracle PCA environment. It facilitates the integration of new system components in case of capacity expansion or servicing, and allows the administrator to streamline the process when manual intervention is required. The CLI provides an interface to the exposed functionality of the synchronization service.