Performing Compute Node Operations
From the Rack Units list of the Service Web UI, an administrator can execute certain operations on hardware components. These operations can be accessed from the Actions menu, which is the button with three vertical dots on the right hand side of each table row. In practice, only the View Details and Copy ID operations are available for all component types.
When compute nodes are in the discovery state or coming up, their status is 'Failed' until
the hardware process transitions them to 'Ready to Provision'. This process typically takes
under five minutes. If the failed state persists, use the Service CLI command list ComputeNode
to
determine the provisioning state of the compute nodes and take appropriate action.
For compute nodes, several other operations are available, either from the Actions menu or from the compute node detail page. Those operations are described in detail in this section, including the equivalent steps in the Service CLI.
Provisioning a Compute Node
Before a compute node can be used to host your compute instances, it must be provisioned by an administrator. The appliance software detects the compute nodes that are installed in the rack and cabled to the switches, meaning they appear in the Rack Units list as Ready to Provision. You can provision them from the Service Web UI or Service CLI.
Using the Service Web UI
-
In the navigation menu, click Rack Units.
-
In the Rack Units table, click the host name of the compute node you want to provision.
The compute node detail page appears.
-
In the top-right corner of the page, click Controls and select the Provision command.
Using the Service CLI
-
Display the list of compute nodes.
Copy the ID of the compute node you want to provision.
PCA-ADMIN> list ComputeNode Command: list ComputeNode Status: Success Time: 2021-08-20 08:53:56,681 UTC Data: id name provisioningState provisioningType -- ---- ----------------- ---------------- 29f68a0e-4744-4a92-9545-7c48fa365d0a pcacn001 Ready to Provision Unspecified 7a0236f4-b00e-461d-93a0-b22673a18d9c pcacn003 Ready to Provision Unspecified dc8ae567-b07f-48e0-89bd-e57069c20010 pcacn002 Ready to Provision Unspecified
-
Provision the compute node with this command:
PCA-ADMIN> provision id=7a0236f4-b00e-461d-93a0-b22673a18d9c Command: provision id=7a0236f4-b00e-461d-93a0-b22673a18d9c Status: Success Time: 2021-08-20 11:35:40,152 UTC JobId: ea93cac4-4430-4663-aafd-d70701593fb2
Use the job ID to check the status of your provision command.
PCA-ADMIN> show Job id=ea93cac4-4430-4663-aafd-d70701593fb2 [...] Done = true Name = MODIFY_TYPE Run State = Succeeded
-
Repeat the provision command for any other compute nodes you want to provision at this time.
-
Confirm that the compute nodes have been provisioned.
PCA-ADMIN> list ComputeNode Command: list ComputeNode Status: Success Time: 2021-08-20 11:38:29,509 UTC Data: id name provisioningState provisioningType -- ---- ----------------- ---------------- 29f68a0e-4744-4a92-9545-7c48fa365d0a pcacn001 Provisioned KVM 7a0236f4-b00e-461d-93a0-b22673a18d9c pcacn003 Provisioned KVM dc8ae567-b07f-48e0-89bd-e57069c20010 pcacn002 Provisioned KVM
Providing Platform Images
Platform images are provided during Private Cloud Appliance installation, and new platform images might be provided during appliance upgrade or patching operations.
During installation, upgrade, and patching, new platform images are placed on the management
node in /nfs/shared_storage/oci_compute_images
. During patching and upgrade,
you can run commands to make these images available to Compute Enclave users. See the patchOCIimages
command in "Patching Oracle Cloud Infrastructure Images" in the
Oracle Private Cloud Appliance Patching Guide, and the
upgradeOCIImages
command in "Upgrading Oracle Cloud Infrastructure Images" in the Oracle Private Cloud Appliance Upgrade Guide.
The image import command described in Importing Platform Imagesalso makes the images
available to Compute Enclave users. Run this
importPlatformImages
command if images were not imported during patch or
upgrade, or you need to re-import images. You can also use this command to make custom images
available to all Compute Enclave users after you put the
image in /nfs/shared_storage/oci_compute_images
on the management node.
During upgrade and patching, new versions of an image do not replace existing versions on the management node. If more than three versions of an image are available on the management node, only the newest three versions are shown when images are listed in the Compute Enclave. Older platform images are still available to users by specifying the image OCID.
Importing Platform Images
Run the importPlatformImages
command to make all images that are in
/nfs/shared_storage/oci_compute_images
on the management node also
available in all compartments in all tenancies in the Compute Enclave.
PCA-ADMIN> importPlatformImages Command: importPlatformImages Status: Running Time: 2022-11-10 17:35:20,345 UTC JobId: f21b9d86-ccf2-4bd3-bab9-04dc3adb2966
Use the JobId
to get more detailed information about the job. In the
following example, no new images have been delivered:
PCA-ADMIN> show job id=f21b9d86-ccf2-4bd3-bab9-04dc3adb2966 Command: show job id=f21b9d86-ccf2-4bd3-bab9-04dc3adb2966 Status: Success Time: 2022-11-10 17:35:36,023 UTC Data: Id = f21b9d86-ccf2-4bd3-bab9-04dc3adb2966 Type = Job Done = true Name = OPERATION Progress Message = There are no new platform image files to import Run State = Succeeded Transcript = 2022-11-10 17:35:20.339 : Created job OPERATION Username = admin
Listing Platform Images
Use the listplatformImages
command to list all platform images that have
been imported from the management node.
PCA-ADMIN> listplatformImages Data: id displayName lifecycleState -- ----------- -------------- ocid1.image.unique_ID_1 uln-pca-Oracle-Linux-7.9-2023.09.26_0... AVAILABLE ocid1.image.unique_ID_2 uln-pca-Oracle-Linux-8-2023.09.26_0.oci AVAILABLE ocid1.image.unique_ID_3 uln-pca-Oracle-Linux-9-2023.09.26_0.oci AVAILABLE ocid1.image.unique_ID_4 uln-pca-Oracle-Linux8-OKE-1.26.6-2024... AVAILABLE ocid1.image.unique_ID_5 uln-pca-Oracle-Linux8-OKE-1.27.7-2024... AVAILABLE ocid1.image.unique_ID_6 uln-pca-Oracle-Linux8-OKE-1.28.3-2024... AVAILABLE ocid1.image.unique_ID_7 uln-pca-Oracle-Solaris-11-2023.10.16_... AVAILABLE
Compute Enclave users see the same
lifecycleState
that listplatformImages
shows. Shortly
after running importPlatformImages
, both
listplatformImages
and the Compute Enclave might show new images with lifecycleState
IMPORTING
. When the importPlatformImages
job is complete,
both listplatformImages
and the Compute Enclave show the images as
AVAILABLE
.
If you delete a platform image as shown in Deleting Platform Images, both
listplatformImages
and the Compute Enclave show the image as DELETING
or DELETED
.
Deleting Platform Images
Use the following command to delete the specified platform image. The image shows as
DELETING and then DELETED in listplatformImages
output and in the Compute Enclave, and eventually is not listed at all.
However, the image file is not deleted from the management node, and running the
importPlatformImages
command re-imports the image so that the image is
again available in all compartments.
PCA-ADMIN> deleteplatformImage imageId=ocid1.image.unique_ID_7 JobId: 401567c3-3662-46bb-89d2-b7ad1541fa2d PCA-ADMIN> listplatformImages Data: id displayName lifecycleState -- ----------- -------------- ocid1.image.unique_ID_1 uln-pca-Oracle-Linux-7.9-2023.09.26_0... AVAILABLE ocid1.image.unique_ID_2 uln-pca-Oracle-Linux-8-2023.09.26_0.oci AVAILABLE [...] ocid1.image.unique_ID_7 uln-pca-Oracle-Solaris-11-2023.10.16_... DELETED
Disabling Compute Node Provisioning
Several compute node operations can only be performed on condition that provisioning has been disabled. This section explains how to impose and release a provisioning lock.
Using the Service Web UI
-
In the navigation menu, click Rack Units.
-
In the Rack Units table, click the host name of the compute node you want to make changes to.
The compute node detail page appears.
-
In the top-right corner of the page, click Controls and select the Provisioning Lock command.
When the confirmation window appears, click Lock to proceed.
After successful completion, the Compute Node Information tab shows Provisioning Locked = Yes.
-
To release the provisioning lock, click Controls and select the Provisioning Unlock command.
When the confirmation window appears, click Unlock to proceed.
After successful completion, the Compute Node Information tab shows Provisioning Locked = No.
Using the Service CLI
-
Display the list of compute nodes.
Copy the ID of the compute node for which you want to disable provisioning operations.
PCA-ADMIN> list ComputeNode Command: list ComputeNode Status: Success Time: 2021-08-23 09:25:56,307 UTC Data: id name provisioningState provisioningType -- ---- ----------------- ---------------- 3e62bf25-a26c-407e-ab8b-df01a4ad98b6 pcacn002 Provisioned KVM f7b8356b-052f-4911-babb-447e6ab9c78d pcacn003 Provisioned KVM 4e06ebdf-faed-484e-996d-d77af786f123 pcacn001 Provisioned KVM
-
Set a provisioning lock on the compute node.
PCA-ADMIN> provisioningLock id=f7b8356b-052f-4911-babb-447e6ab9c78d Command: provisioningLock id=f7b8356b-052f-4911-babb-447e6ab9c78d Status: Success Time: 2021-08-23 09:29:46,568 UTC JobId: 6ee78c8a-e227-4d31-a770-9b9c96085f3f
Use the job ID to check the status of your command.
PCA-ADMIN> show Job id=6ee78c8a-e227-4d31-a770-9b9c96085f3f Command: show Job id=6ee78c8a-e227-4d31-a770-9b9c96085f3f [...] Done = true Name = MODIFY_TYPE Run State = Succeeded
-
When the job has completed, confirm that the compute node is under provisioning lock.
PCA-ADMIN> show ComputeNode id=f7b8356b-052f-4911-babb-447e6ab9c78d [...] Provisioning State = Provisioned [...] Provisioning Locked = true Maintenance Locked = false
All provisioning operations are now disabled until the lock is released.
-
To release the provisioning lock, use this command:
PCA-ADMIN> provisioningUnlock id=f7b8356b-052f-4911-babb-447e6ab9c78d Command: provisioningUnlock id=f7b8356b-052f-4911-babb-447e6ab9c78d Status: Success Time: 2021-08-23 09:44:58,531 UTC JobId: 523892e8-c2d4-403c-9620-2f3e94015b46
Use the job ID to check the status of your command.
PCA-ADMIN> show Job id=523892e8-c2d4-403c-9620-2f3e94015b46 [...] Done = true Name = MODIFY_TYPE Run State = Succeeded
-
When the job has completed, confirm that the provisioning lock has been released.
PCA-ADMIN> show ComputeNode id=f7b8356b-052f-4911-babb-447e6ab9c78d [...] Provisioning State = Provisioned [...] Provisioning Locked = false Maintenance Locked = false
Locking a Compute Node for Maintenance
For maintenance operations, compute nodes must be placed in maintenance mode. This section explains how to impose and release a maintenance lock. Before you can lock a compute node for maintenance, you must disable provisioning first. Maintenance operations can only be performed if the compute node has no running compute instances.
Caution:
Depending on the high-availability configuration of the Compute service, automatic instance migrations can prevent you from successfully locking a compute node. See Configuring the Compute Service for High Availability. This situation is more likely to occur when available compute capacity is limited.
-
Instance recovery or migration operations after a compute node outage can cause a maintenance lock to fail. Compute nodes involved in instance migrations will reject the maintenance lock until the migrations are complete.
-
Displaced instances could be migrated back to their original fault domain when a compute node maintenance lock is released. A compute node from where a displaced instance is migrated back will reject the maintenance lock until the migration is complete.
-
Migrating an instance typically takes no more than 30 seconds. However, large instances and heavy workloads increase the time required.
-
In the event that an instance gets stuck in moving state and migration fails to complete, its host compute node cannot be locked for maintenance. Contact Oracle for assistance.
Using the Service Web UI
-
Ensure that provisioning has been disabled on the compute node.
-
Ensure that the compute node has no active instances. They must be migrated or shut down.
-
In the navigation menu, click Rack Units.
-
In the Rack Units table, click the host name of the compute node that requires maintenance.
The compute node detail page appears.
-
In the top-right corner of the page, click Controls and select the Maintenance Lock command.
When the confirmation window appears, click Lock to proceed.
After successful completion, the Compute Node Information tab shows Maintenance Locked = Yes.
-
To release the maintenance lock, click Controls and select the Maintenance Unlock command.
When the confirmation window appears, click Unlock to proceed.
After successful completion, the Compute Node Information tab shows Maintenance Locked = No.
Using the Service CLI
-
Display the list of compute nodes.
Copy the ID of the compute node that requires maintenance.
PCA-ADMIN> list ComputeNode Command: list ComputeNode Status: Success Time: 2021-08-23 09:25:56,307 UTC Data: id name provisioningState provisioningType -- ---- ----------------- ---------------- 3e62bf25-a26c-407e-ab8b-df01a4ad98b6 pcacn002 Provisioned KVM f7b8356b-052f-4911-babb-447e6ab9c78d pcacn003 Provisioned KVM 4e06ebdf-faed-484e-996d-d77af786f123 pcacn001 Provisioned KVM
-
Ensure that provisioning has been disabled on the compute node.
-
Lock the compute node for maintenance.
PCA-ADMIN> maintenanceLock id=f7b8356b-052f-4911-babb-447e6ab9c78d Command: maintenanceLock id=f7b8356b-052f-4911-babb-447e6ab9c78d Status: Success Time: 2021-08-23 09:56:05,443 UTC JobId: e46f6603-2af2-4df4-a0db-b15156491f88
Use the job ID to check the status of your command.
PCA-ADMIN> show Job id=e46f6603-2af2-4df4-a0db-b15156491f88 [...] Done = true Name = MODIFY_TYPE Run State = Succeeded
-
When the job has completed, confirm that the compute node has been locked for maintenance.
PCA-ADMIN> show ComputeNode id=f7b8356b-052f-4911-babb-447e6ab9c78d [...] Provisioning State = Provisioned [...] Provisioning Locked = true Maintenance Locked = true
The compute node is now ready for maintenance.
-
To release the maintenance lock, use this command:
PCA-ADMIN> maintenanceUnlock id=f7b8356b-052f-4911-babb-447e6ab9c78d Command: maintenanceUnlock id=f7b8356b-052f-4911-babb-447e6ab9c78d Status: Success Time: 2021-08-23 10:00:53,902 UTC JobId: 625af20e-4b49-4201-879f-41d4405314c7
Use the job ID to check the status of your command.
PCA-ADMIN> show Job id=625af20e-4b49-4201-879f-41d4405314c7 [...] Done = true Name = MODIFY_TYPE Run State = Succeeded
-
When the job has completed, confirm that the provisioning lock has been released.
PCA-ADMIN> show ComputeNode id=f7b8356b-052f-4911-babb-447e6ab9c78d [...] Provisioning State = Provisioned [...] Provisioning Locked = true Maintenance Locked = false
Migrating Instances from a Compute Node
Some compute node operations, such as some maintenance operations, can only be performed if the compute node has no running compute instances. Administrators can migrate all running instances away from a compute node, also known as evacuating the compute node. If enough resources are available, running instances are live migrated to other compute nodes in the same fault domain.
Important:
Before you perform a compute node evacuation, check what the behavior will be for any instances that cannot be migrated to another compute node in the same fault domain.
See Viewing and Setting Compute Service Configuration to check whether strict fault domain enforcement is set.
Strict FD Enabled
is
false
in the Service CLI),
instances that cannot be migrated to another compute node in the same fault domain are
migrated to a different fault domain.
When strict fault domain enforcement is enabled (Strict FD is set to Enabled in the Service Web UI or Strict FD Enabled
is
true
in the Service CLI), instances
that cannot be migrated to another compute node in the same fault domain do not migrate; those
instances are still running in the compute node that you are trying to evacuate.
Enable or disable strict fault domain enforcement to set whether instances that cannot migrate to other compute nodes in the same fault domain will be migrated to a different fault domain or still running in the same compute node after you attempt to evacuate the compute node.
If the current fault domain is not able to accommodate some instances that need to be migrated, and strict fault domain enforcement is enabled, you can re-run the migrate operation with the force option specified. When the force option is specified, the Compute service will soft stop any instances that fail to migrate, allowing the evacuation to proceed.
Restart stopped instances. If instances were stopped by the Compute service (not
manually stopped by an administrator) and you want them to be automatically restored to
running when resources become available, check that the Auto Recovery property of the Compute
service is enabled and the instance availability recovery action is set to
RESTORE_INSTANCE
. See Viewing and Setting Compute Service Configuration
and Configuring the Recovery State for a Stopped Instance.
Instances can be stopped by the Compute service if the force option is used or if no fault
domain can accommodate the instances. You can change the Auto Recovery setting at any time
before or after the compute node evacuation completes to restart instances that were stopped
by the Compute service. If the instance availability recovery action is set to
STOP_INSTANCE
, the instance remains stopped even though the Auto Recovery
property is enabled. If the instance availability recovery action is later changed to
RESTORE_INSTANCE
, a subsequent Auto Recovery pass will restart the
instance.
Return relocated instances. If instances are migrated to a different fault domain (displaced), and you want them returned to their selected fault domain (the fault domain that is specified in the instance configuration) when resources become available, check that the Auto Resolve property of the Compute service is enabled. See Viewing and Setting Compute Service Configuration and Compute Service Configuration Commands. You can set the Auto Resolve property at any time before or after the compute node evacuation completes to relocate any displaced instances.
Use the following procedures to perform the migrate operation.
Compute Node Evacuation: Before You Begin
-
Check fault domain and compute node resources. See Viewing CPU and Memory Usage By Fault Domain. Based on this information, decide whether to do any of the following:
-
Terminate instances that are no longer needed.
-
Reconfigure some instances to use fewer resources. For example, specify a different shape.
-
Reconfigure some instances to specify a different fault domain.
-
Stop some instances while you perform the compute node evacuation.
-
Specify the force option on the migration operation to soft stop any instances that cannot be migrated. See the discussion above of instance availability recovery action and Auto Recovery configuration.
-
-
Disable provisioning on the compute node. See Disabling Compute Node Provisioning.
Using the Service Web UI
-
In the navigation menu, click Rack Units.
-
In the Rack Units table, click the host name of the compute node that you want to evacuate.
The compute node details page appears.
-
In the top-right corner of the compute node details page, click Controls and select the Migrate All Vms command.
The Compute service migrates the running instances to other compute nodes.
Using the Service CLI
-
Display the list of compute nodes.
Copy the ID of the compute node that you that you want to evacuate.
PCA-ADMIN> list ComputeNode Command: list ComputeNode Status: Success Time: 2021-08-23 09:25:56,307 UTC Data: id name provisioningState provisioningType -- ---- ----------------- ---------------- 3e62bf25-a26c-407e-ab8b-df01a4ad98b6 pcacn002 Provisioned KVM f7b8356b-052f-4911-babb-447e6ab9c78d pcacn003 Provisioned KVM 4e06ebdf-faed-484e-996d-d77af786f123 pcacn001 Provisioned KVM
-
Use the
migrateVm
command to migrate all running compute instances off the compute node.PCA-ADMIN> migrateVm id=7a0236f4-b00e-461d-93a0-b22673a18d9c Command: migrateVm id=7a0236f4-b00e-461d-93a0-b22673a18d9c Status: Running Time: 2021-08-20 10:37:05,781 UTC JobId: 6f1e94bc-7d5b-4002-ada9-7d4b504a2599
To soft stop any instances that fail to migrate, set the
force
option:PCA-ADMIN> migrateVm id=cn_id force=true
The Compute service migrates the running instances to other compute nodes.
Use the job ID to check the status of your command.
PCA-ADMIN> show Job id=6f1e94bc-7d5b-4002-ada9-7d4b504a2599 [...] Done = true Name = MODIFY_TYPE Run State = Succeeded
Configuring the Compute Service for High Availability
Migrating Instances from a Compute Node describes how to evacuate a compute node for maintenance. In the case of a compute node unplanned outage, the Compute service attempts to evacuate the compute node or stop and restart the instances.
The following sections describe how you can set high availability configuration to control how the Compute service handles an unplanned outage.
Using Instance and Compute Service High Availability Configuration
The following sections describe how to use high availability configuration to manage outcomes for different types of compute node outages. Instance availability recovery action is the only high availability configuration that is set for each instance. All other high availability configuration is set on the Compute service and affects all instances.
The selected fault domain is the fault domain that is specified in the instance configuration. A displaced instance is in a fault domain that is not its selected fault domain.
Planned Maintenance Outage
See Migrating Instances from a Compute Node for information about using instance availability recovery action (set on each instance), and the Auto Recovery and Auto Resolve properties of the Compute service when performing a compute node evacuation.
Unplanned Outage Less Than Ten Minutes
After an unplanned outage of less than ten minutes, by default the Compute service attempts to restart instances that were running before the outage. Actual behavior depends on how the instances and the Compute service are configured. The following decision flow describes how you can control this behavior.
Do you want the Compute service to attempt to restart instances that were running prior to the outage? This is the default.
-
Yes. Check that Auto Recovery is enabled and the instance availability recovery action is set to
RESTORE_INSTANCE
. See Configuring the Recovery State for a Stopped Instance.If some instances can no longer be accommodated in their selected fault domain, Auto Recovery will continue to poll and attempt to restart the instances. See also
getForcedStoppedInstances
.If the instance availability recovery action is set to
STOP_INSTANCE
, the instance will remain stopped, even if Auto Recovery is enabled. -
No. Disable Auto Recovery. Instances that had been running prior to the outage will remain stopped.
The instance availability recovery action setting and Auto Recovery setting can be changed at any time, and the changes will be effective at the next polling time.
Unplanned Outage More Than Ten Minutes
After an unplanned outage of more than ten minutes, by default the Compute service attempts to reboot migrate (cold migrate) instances off the compute node, and instances that cannot be accommodated on other compute nodes in the same fault domain are migrated to other fault domains. Actual behavior depends on how the Compute service is configured. The following decision flow describes how you can control this behavior.
Do you want running instances to be reboot migrated? Reboot migration is stopping and starting each running instance on a given compute node. See also "Compute Instance Availability" in "High Availability" in the Architecture and Design chapter of Oracle Private Cloud Appliance Concepts Guide.
-
Yes. Check that VM High Availability is enabled.
If some instances cannot be accommodated on another compute node in the same fault domain, do you want those instances to be reboot migrated to a different fault domain?
-
Yes. Check that Strict FD is disabled. Instances that cannot be accommodated in any fault domain remain stopped by the Compute service.
After reboot migration, do you want instances that are running in a fault domain that is not their selected fault domain to be automatically live migrated to their selected fault domain when resources become available?
-
Yes. Check that Auto Resolve is enabled. See also
getDisplacedInstances
. -
No. Disable Auto Resolve.
-
-
No. Enable Strict FD. Instances that were running prior to the outage and cannot be migrated to another compute node in the current fault domain remain stopped by the Compute service.
-
-
No. Disable VM High Availability. Instances that were running prior to the outage are stopped by the Compute service.
Do you want instances that were stopped by the Compute service to be automatically restored
to running in their selected fault domain? If yes, check that Auto Recovery is enabled and
the instance availability recovery action is set to RESTORE_INSTANCE
. See
Configuring the Recovery State for a Stopped Instance.
Viewing and Setting Compute Service Configuration
For information about how these configuration settings work, see Compute Service Configuration Commands.
Using the Service Web UI
On the navigation menu, click FD Instances and then click Compute Service Detail.
The Compute Service Information page shows the current settings for Auto Recovery, Auto Resolve Displaced Instances, VM High Availability, and Strict FD. All of these settings are enabled by default except for Strict FD, which is disabled by default. By default, fault domain placement is not strictly enforced when the Compute service migrates instances.
Use the Controls menu on the Compute Service Information page to change the values of these configuration settings between Enabled and Disabled.
Using the Service CLI
Use the show computeservice
command to show the current Compute service
configuration settings. In the following example, the default values are set for the four
high availability configuration settings: Auto Recovery Action Enabled
,
Auto-Resolve Displaced Instances Enabled
, VM High Availability
Enabled
, and Strict FD Enabled
. All of these settings are true
by default except for Strict FD Enabled
, which is false by default.
PCA-ADMIN> show computeservice
Command: show computeservice
Status: Success
Time: 2023-04-17 20:37:42,296 UTC
Data:
Id = unique_ID
Type = ComputeService
total CN cpu usage percent = 23.3
total CN memory usage percent = 16.2
Auto Recovery Action Enabled = true
Auto-Resolve Displaced Instances Enabled = true
VM High Availability Enabled = true
Strict FD Enabled = false
Name = Compute Service
Work State = Normal
To change these settings, use the commands in the following list. The showcustomcmds
computeservice
command lists all high availability configuration commands in the
Compute service.
PCA-ADMIN> showcustomcmds computeservice enableAutoRecoveryAction disableAutoRecoveryAction enableAutoResolveDisplacedInstances disableAutoResolveDisplacedInstances enableVmHighAvailability disableVmHighAvailability enableStrictFD disableStrictFD getForcedStoppedInstances getDisplacedInstances
For example, to disable Auto Recovery Action Enabled
, run the
disableAutoRecoveryAction
command. To enable strict fault domain
enforcement, run the enableStrictFD
command.
Compute Service Configuration Commands
This section describes the behavior of the high availability configuration settings in the Compute service. The Service CLI commands are used in the list in this section. To access the equivalent Service Web UI settings, click the navigation menu and click FD Instances. See Viewing and Setting Compute Service Configuration.
In these descriptions, the selected fault domain is the fault domain that is specified in the instance configuration. A displaced instance is in a fault domain that is not its selected fault domain.
-
enableAutoRecoveryAction
-
Enables the automatic restart of instances that were stopped by the Compute service. This is the default. If the instance availability recovery action is set to
RESTORE_INSTANCE
, this command causes instances that were stopped by the Compute service to be automatically restarted in their selected fault domain when resources are available. See also Configuring the Recovery State for a Stopped Instance andgetForcedStoppedInstances
.Instances could have been stopped by the Compute service for the following reasons:
-
As a result of specifying the force option on a migrate all operation.
-
Because no fault domain can accommodate these instances.
-
As a result of a compute node outage.
You can set this Auto Recovery property at any time before or after an outage to restart instances that were stopped by the Compute service. If the instance availability recovery action is set to
STOP_INSTANCE
, the instance remains stopped even though the Auto Recovery property is enabled. If the instance availability recovery action is later changed toRESTORE_INSTANCE
, a subsequent Auto Recovery pass will restart the instance. -
-
disableAutoRecoveryAction
-
Disables the automatic restart of stopped instances. Instances that were stopped by the Compute service are not automatically restarted when resources are available.
-
enableAutoResolveDisplacedInstances
-
Enables the return of running instances to their selected fault domain. This is the default. If instances were moved to a different fault domain (displaced) during compute node evacuation, this command enables those instances to be automatically live migrated to their selected fault domain once sufficient resources are available in that fault domain. See also
getDisplacedInstances
.You can set this Auto Resolve configuration at any time before or after an outage to relocate any displaced instances.
Instances that are stopped are not migrated.
-
disableAutoResolveDisplacedInstances
-
Disables the return of instances to their selected fault domain. Instances that were moved to a different fault domain during compute node evacuation remain in the fault domain to which they were moved.
-
enableVmHighAvailability
-
Enables High Availability (reboot migration) off of an unreachable compute node. This is the default.
-
disableVmHighAvailability
-
Disables reboot migration.
-
enableStrictFD
-
Enables strict fault domain enforcement. During compute node evacuation, any instance that cannot be moved to a different compute node in the same fault domain is stopped if the force option was specified. If the force option was not specified, the migrate operation fails.
-
disableStrictFD
-
Disables strict fault domain enforcement. This is the default. During compute node evacuation, any instance that cannot be moved to a different compute node in the same fault domain is moved to a different fault domain. This move to a different fault domain is temporary if the Auto Resolve property of the Compute service is enabled: If Auto Resolve is enabled, then when resources become available, the moved instances are live migrated back to their selected fault domain. See also
getDisplacedInstances
. -
getForcedStoppedInstances
-
Lists all instances that were stopped via the use of the force option on the migrate operation or that were stopped by the Compute service because no fault domain can accommodate these instances.
PCA-ADMIN> getForcedStoppedInstances Command: getForcedStoppedInstances Status: Success Time: 2023-04-17 20:53:51,410 UTC Data: id displayName compartmentId -- ----------- ------------- ocid1.instance.unique_ID inst-name ocid1.compartment.unique_ID
In the Service Web UI, click the navigation menu, click FD Instances, and then click Forced Stopped Instances. Use the Actions menu to copy the OCIDs.
-
getDisplacedInstances
-
Lists instances that are currently running in a fault domain that is not their selected fault domain. Instances that are not running are not shown.
In the following example, running instances are being migrated away from fault domain 1. One instance has been placed in fault domain 2 and one has been placed in fault domain 3.
PCA-ADMIN> getDisplacedInstances Command: getDisplacedInstances Status: Success Time: 2023-04-18 23:20:41,484 UTC Data: id displayName compartmentId faultDomain faultDomainSelected -- ----------- ------------- ----------- ------------------- ocid1.instance.unique_ID inst-name ocid1.compartment.unique_ID FAULT-DOMAIN-3 FAULT-DOMAIN-1 ocid1.instance.unique_ID inst-name ocid1.compartment.unique_ID FAULT-DOMAIN-2 FAULT-DOMAIN-1
In the Service Web UI, click the navigation menu, click FD Instances, and then click Displaced Instances. Use the Actions menu to copy the OCIDs.
Configuring the Recovery State for a Stopped Instance
If the Compute service stopped an instance, you can configure how that stopped instance will be treated when resources are again available by setting the instance availability recovery action and the Auto Recovery property of the Compute service.
See the description of the enableAutoRecoveryAction
command in Compute Service Configuration Commands for reasons that an instance can be stopped by the Compute service. See
also the descriptions of disableAutoRecoveryAction
and
getForcedStoppedInstances
.
During instance launch or in a subsequent instance update, set the instance recovery action in the instance availability configuration.
In the Compute Web UI, see the "Availability configuration" section in the dialog to create or edit an instance or create or edit an instance configuration. To restart instances that were stopped by the Compute service, check the box labeled "Restore instance lifecycle state after infrastructure maintenance". This is the default. To keep stopped instances stopped, uncheck the "Restore instance" box.
In the OCI CLI, use the
--availability-config
option or the availabilityConfig
property in the compute instance launch
or update
command or
the instance configuration create
or update
command. Set the
recoveryAction
to RESTORE_INSTANCE
or
STOP_INSTANCE
. The default behavior is
RESTORE_INSTANCE
.
"availabilityConfig": {"recoveryAction": "STOP_INSTANCE"}
Enabling Strict Fault Domain Enforcement
To enable strict fault domain enforcement, do one of the following:
-
In the Service Web UI, click the navigation menu, click FD Instances, and click Compute Service Detail. On the Compute Service Information page, click the Controls menu, and click Enable Strict FD.
-
In the Service CLI, run the
enableStrictFD
command.
For more information about the effect of fault domain enforcement, see Compute Service Configuration Commands.
In case the current fault domain does not have enough resources to accommodate all instances that need to be migrated, do the following:
-
If you are performing a planned compute node evacuation, specify the force option on the migration operation to stop the instances in their current fault domain.
-
Run the
enableAutoRecoveryAction
command or select Enable Auto Recovery in the Service Web UI. -
Ensure that the instance availability recovery action for each instance is set to
RESTORE_INSTANCE
, which is the default. See Configuring the Recovery State for a Stopped Instance.
See the example in Migrating Instances from a Compute Node.
Starting, Resetting or Stopping a Compute Node
The Service Enclave allows administrators to send start, reboot and shutdown signals to the compute nodes.
Using the Service Web UI
-
Make sure that the compute node is locked for maintenance.
-
In the navigation menu, click Rack Units.
-
In the Rack Units table, locate the compute node you want to start, reset or stop.
-
Click the Action menu (three vertical dots) and select the appropriate action: Start, Reset, or Stop.
-
When the confirmation window appears, click the appropriate action button to proceed.
A pop-up window appears for a few seconds to confirm that the compute node is starting, stopping, or restarting.
-
When the compute node is up and running again, release the maintenance and provisioning locks.
Using the Service CLI
-
Display the list of compute nodes.
Copy the ID of the compute node that you want to start, reset or stop.
PCA-ADMIN> list ComputeNode Command: list ComputeNode Status: Success Time: 2021-08-23 09:25:56,307 UTC Data: id name provisioningState provisioningType -- ---- ----------------- ---------------- 3e62bf25-a26c-407e-ab8b-df01a4ad98b6 pcacn002 Provisioned KVM f7b8356b-052f-4911-babb-447e6ab9c78d pcacn003 Provisioned KVM 4e06ebdf-faed-484e-996d-d77af786f123 pcacn001 Provisioned KVM
-
Make sure that the compute node is locked for maintenance.
-
Start, reset or stop the compute node using the corresponding command:
PCA-ADMIN> start ComputeNode id=f7b8356b-052f-4911-babb-447e6ab9c78d Command: start ComputeNode id=f7b8356b-052f-4911-babb-447e6ab9c78d Status: Success Time: 2021-08-23 09:26:06,446 UTC Data: Success
PCA-ADMIN> reset id=f7b8356b-052f-4911-babb-447e6ab9c78d Command: reset id=f7b8356b-052f-4911-babb-447e6ab9c78d Status: Success Time: 2021-08-23 09:27:06,434 UTC Data: Success
PCA-ADMIN> stop ComputeNode id=f7b8356b-052f-4911-babb-447e6ab9c78d Command: stop ComputeNode id=f7b8356b-052f-4911-babb-447e6ab9c78d Status: Success Time: 2021-08-23 09:31:38,271 UTC Data: Success
-
When the compute node is up and running again, release the maintenance and provisioning locks.
Deprovisioning a Compute Node
If you need to take a compute node out of service, for example to replace a defective one, you must deprovision it first, so that its data is removed cleanly from the system databases.
Using the Service Web UI
-
In the navigation menu, click Rack Units.
-
In the Rack Units table, click the host name of the compute node you want to deprovision.
The compute node detail page appears.
-
In the top-right corner of the page, click Controls and select the Provisioning Lock command.
When the confirmation window appears, click Lock to proceed.
After successful completion, the Compute Node Information tab shows Provisioning Locked = Yes.
-
Make sure that no more compute instances are running on the compute node.
Click Controls and select the Migrate All Vms command. The system migrates the instances to other compute nodes.
-
To deprovision the compute node, click Controls and select the Deprovision command.
When the confirmation window appears, click Deprovision to proceed.
After successful completion, the Compute Node Information tab shows Provisioning State = Ready to Provision.
Using the Service CLI
-
Display the list of compute nodes.
Copy the ID of the compute node you want to deprovision.
PCA-ADMIN> list ComputeNode Command: list ComputeNode Status: Success Time: 2021-08-20 08:53:56,681 UTC Data: id name provisioningState provisioningType -- ---- ----------------- ---------------- 29f68a0e-4744-4a92-9545-7c48fa365d0a pcacn001 Provisioned KVM 7a0236f4-b00e-461d-93a0-b22673a18d9c pcacn003 Provisioned KVM dc8ae567-b07f-48e0-89bd-e57069c20010 pcacn002 Provisioned KVM
-
Set a provisioning lock on the compute node.
PCA-ADMIN> provisioningLock id=7a0236f4-b00e-461d-93a0-b22673a18d9c Command: provisioningLock id=7a0236f4-b00e-461d-93a0-b22673a18d9c Status: Success Time: 2021-08-20 10:30:00,320 UTC JobId: ed4a4646-6d73-41f9-9cb0-73ea35e0d766
Use the job ID to check the status of your command.
PCA-ADMIN> show Job id=ed4a4646-6d73-41f9-9cb0-73ea35e0d766 [...] Done = true Name = MODIFY_TYPE Run State = Succeeded
-
Confirm that the compute node is under provisioning lock.
PCA-ADMIN> show ComputeNode id=7a0236f4-b00e-461d-93a0-b22673a18d9c [...] Provisioning Locked = true
-
Migrate all running compute instances off the compute node you want to deprovision.
PCA-ADMIN> migrateVm id=7a0236f4-b00e-461d-93a0-b22673a18d9c Command: migrateVm id=7a0236f4-b00e-461d-93a0-b22673a18d9c Status: Running Time: 2021-08-20 10:37:05,781 UTC JobId: 6f1e94bc-7d5b-4002-ada9-7d4b504a2599
Use the job ID to check the status of your command.
PCA-ADMIN> show Job id=6f1e94bc-7d5b-4002-ada9-7d4b504a2599 Command: show Job id=6f1e94bc-7d5b-4002-ada9-7d4b504a2599 Status: Success Time: 2021-08-20 10:39:59,025 UTC Data: [...] Done = true Name = MODIFY_TYPE Run State = Succeeded
-
Deprovision the compute node with this command:
PCA-ADMIN> deprovision id=7a0236f4-b00e-461d-93a0-b22673a18d9c Command: deprovision id=7a0236f4-b00e-461d-93a0-b22673a18d9c Status: Success Time: 2021-08-20 11:30:43,793 UTC JobId: 9868fdac-ddb6-4260-9ce1-c018cf2ddc8d
Use the job ID to check the status of your deprovision command.
PCA-ADMIN> show Job id=9868fdac-ddb6-4260-9ce1-c018cf2ddc8d [...] Done = true Name = MODIFY_TYPE Run State = Succeeded
-
Confirm that the compute node has been deprovisioned.
PCA-ADMIN> list ComputeNode Command: list ComputeNode Status: Success Time: 2021-08-20 08:53:56,681 UTC Data: id name provisioningState provisioningType -- ---- ----------------- ---------------- 29f68a0e-4744-4a92-9545-7c48fa365d0a pcacn001 Provisioned KVM 7a0236f4-b00e-461d-93a0-b22673a18d9c pcacn003 Ready to Provision Unspecified dc8ae567-b07f-48e0-89bd-e57069c20010 pcacn002 Provisioned KVM