2.8 High Availability, Load Balancing and Power Management

Oracle VM has high-availability (HA) functionality built in. Even though there is only one Oracle VM Manager in the environment, it distributes vital information over the servers it manages, so that in case of failure the Oracle VM Manager and its infrastructure database can be rebuilt. For virtual machine HA, Oracle VM Servers can be clustered so that if one server fails, the virtual machines can be automatically migrated to another server as all virtual machine data is on shared storage and not directly on the Oracle VM Server. In case of predictable failures or scheduled maintenance, virtual machines can be moved to other members of the server pool using live migration.

In addition, Oracle VM supports HA networking and storage, but these are configurations the system administrator must implement outside Oracle VM Manager (RAID, multipathing, etc.).

Clustered server pools also support advanced management policies called Dynamic Power Management (DPM) and Dynamic Resource Scheduler (DRS). DPM is a policy that optimizes the use of the server pool members to conserve power. When DPM is enabled, the policy will periodically look for Oracle VM Servers that are under utilized and then live-migrate the virtual machines on that server to other servers in the pool. When live migration is complete, the server is shut down, conserving power. Conversely, if a server becomes overloaded, the policy will look for other servers to off load virtual machines from the busy server. If no other powered up Oracle VM Servers are available, then the policy will start up a powered-down server using its Wake-On-LAN capability, and begin live-migrating virtual machines to balance the overall load. It is a prerequisite that all the servers that participate in DPM have Wake-On-LAN enabled in the BIOS for the physical network interface that connects to the dedicated management network. Dynamic Resource Scheduler (DRS) uses the same underlying code as DPM. The difference is that DRS will only react to servers that exceed their thresholds for CPU and network usage, and take action to move virtual machines off servers. These thresholds are configurable in the DRS policy, which runs at a specified interval and monitors CPU and network usage over a sample time period. The calculated average load is compared to the threshold and determines if migrations need to be performed.