8.21 Eliminating Time-Out Issues when Provisioning Compute Nodes

The provisioning process is an appliance level orchestration of many configuration operations that run at the level of Oracle VM Manager and the individual Oracle VM Servers or compute nodes. As the virtualized environment grows – meaning there are more virtual machines, storage paths and networks –, the time required to complete various discovery tasks increases exponentially.

The maximum task durations have been configured to reliably accommodate a standard base rack setup. At a given point, however, the complexity of the existing configuration, when replicated to a large number of compute nodes, increases the duration of tasks beyond their standard time-out. As a result, provisioning failures occur.

Because many provisioning tasks have been designed to use a common time-out mechanism, this problem cannot be resolved by simply increasing the global time-out. Doing so would decrease the overall performance of the system. To overcome this issue, additional code has been implemented to allow a finer-grained definition of time-outs through a number of settings in a system configuration file: /var/lib/ovca/ovca-system.conf.

If you run into time-out issues when provisioning additional compute nodes, it may be possible to resolve them by tweaking specific time-out settings in the configuration. Depending on which job failures occur, changing the storage_refresh_timeout, discover_server_timeout or other parameters could allow the provisioning operations to complete successfully. These changes would need to be applied on both management nodes.

Please contact your Oracle representative if your compute nodes fail to provision due to time-out issues. Oracle product specialists can analyse these failures for you and recommend new time-out parameters accordingly.