Note:
- This tutorial requires access to Oracle Cloud. To sign up for a free account, see Get started with Oracle Cloud Infrastructure Free Tier.
- It uses example values for Oracle Cloud Infrastructure credentials, tenancy, and compartments. When completing your lab, substitute these values with ones specific to your cloud environment.
Replace a Faulty ESXi Host from your Oracle Cloud VMware Solution Cluster on Oracle Cloud Infrastructure
Introduction
This guide details the replacing a malfunctioning ESXi host within your Oracle Cloud VMware Solution cluster. The process involves using the Oracle Cloud Infrastructure (OCI) Console and performing the remaining configuration steps through vCenter and NSX Manager.
Objectives
- Replace a faulty ESXi host in an Oracle Cloud VMware Solution cluster. The process involves adding and deleting an ESXi host within your VMware environment, initiated through the OCI Console. For more information, see Add an ESXi Host to an Oracle Cloud VMware Solution Cluster and Delete a VMware ESXi Host from an Oracle Cloud VMware Solution Cluster.
Prerequisites
Before replacing a faulty ESXi host in your Oracle Cloud VMware Solution cluster, ensure you meet the following requirements:
-
Understanding Oracle Cloud VMware Solution: Familiarize yourself with Oracle Cloud VMware Solution and its functionalities. For more information, see Get Started with Oracle Cloud VMware Solution.
-
Software-Defined Data Center (SDDC): You will need an existing SDDC in your OCI Console containing at least one Unified Management Cluster. Host replacement involves creating a new host within your existing cluster.
-
Standard Shape Cluster Considerations: Standard shape clusters have a strict limit of 32 ESXi hosts, including any replacement hosts. This means careful planning is required to ensure you have space for replacements when needed. To avoid complications during host replacements, it is strongly recommended to maintain some spare capacity within your cluster. In simple terms, try not to utilize all 32 hosts for workloads. If your cluster is already at the maximum capacity of 32 ESXi hosts, consider removing one before proceeding with the replacement process.
-
Block Volume Maximum: A single OCI Block Volume can be attached to a maximum of 32 instances. If your block volumes are already attached to 32 ESXi hosts, adding another host for replacement becomes impossible.
-
-
Tenancy Allowlisting: For utilizing the replace host functionality, submit a Support Request to allowlist your tenancy. This allowlisting has a time limit, so factor that into your planning.
-
Access to VMware Management Tool: Ensure you have valid credentials and access to vCenter Server, NSX Manager, and HCX Manager associated with your SDDC.
-
Administrative Privileges: Verify that you have the necessary administrative privileges within OCI, vCenter, NSX Manager, and HCX Manager. These permissions are required for managing SDDC resources, including hosts, clusters, network configurations, and datastores.
Task 1: Initiate Host Replacement from OCI Console
In this task, we will initiate the host replacement process within the OCI Console for your Oracle Cloud VMware Solution cluster.
-
Log in to the OCI Console and navigate to the specific VMware SDDC that contains the faulty ESXi host requiring replacement.
-
Locate the cluster within your SDDC that contains the malfunctioning ESXi host. For this tutorial, let us assume the faulty host is
Cls2-Standard3-1
which is located in theCls2-Standard3
cluster. -
Click three dots displayed to the right of the faulty ESXi host and select Replace Host.
Note: Before proceeding, ensure your tenancy has been allowlisted through a Support Request. For more information, see the prerequisites section.
-
In the Replace Host window, enter the following information.
-
Release Name: Select a compatible release name from the drop-down menu. This version should align with the currently used build number in your vCenter cluster to ensure compatibility.
Note: The Replace Host workflow simplifies compatibility by displaying only minor versions within your SDDC major ESXi version (like ESXi 7 or 8) that have been officially made available by Oracle Cloud VMware Solution. This lets you choose the exact version that matches your existing set up, ensuring everything works together seamlessly.
Example: Imagine your SDDC was created with ESXi 8 Update 1c-build 22088125-1. The release name drop-down menu will show all updates for ESXi 8 that Oracle Cloud VMware Solution has officially made available, from Update 1c-build 22088125-1 to the latest version in ESXi 8 for example Update 2-build 22380479-1. You will not see versions not offered by Oracle Cloud VMware Solution, eliminating any compatibility issues.
-
Billing Grace Period: Review the information message regarding the replacement host creation. The process creates a new loaner host with a 24-hour grace period for billing purposes. This replacement host is billed after the grace period on an hourly basis. Once you complete the required configuration steps in your VMware environment (vCenter Server, NSX Manager and so on) and terminate the original faulty host, the billing is automatically switched to the new permanent host.
Note: Be aware that failing to terminate the original faulty host within 24 hours will result in charges for both hosts; the original host with its existing billing commitment and the new replacement host with its hourly billing.
-
-
Review the settings and potential billing implications and click Confirm to initiate the host replacement process.
-
(Optional) If you accidentally initiated the host replacement and want to cancel it, locate the warning banner displayed at the top of the cluster details page. Click Cancel Replacement to stop the process.
-
Navigate to the Work Requests section within your chosen cluster.
This section allows you to monitor the progress of the create ESXi host task, which is part of the replacement process.
After 20-25 mins, you should see the work request complete successfully.
-
Verify replacement status.
-
Cluster Details Page: Upon successful completion of the replacement process, the newly added host should display Active state. Conversely, the original faulty host should now be marked as Updating. On the cluster details page, a banner will appear, highlighting the need for termination of the faulty host within a specific time frame to prevent double billing.
-
Original Faulty Host: On the details page for the faulty host, a similar banner will appear, reminding you to terminate the host to avoid double billing.
-
New Replacement Host: Unlike the faulty host, the new replacement host will not have a Pricing Interval End Date. This value will be inherited from the faulty host once it is terminated. However, the replacement host does have a Grace Period End Date. If the faulty host remains unterminated after this date, you will be charged hourly for the replacement host.
-
Task 2: Get ESXi Host Information and Default vCenter Password
In this task, we will gather essential details from the OCI Console, including the newly created ESXi host information and the vCenter default password.
-
Open the OCI Console, navigate to Compute and Instances. Identify and note down the host information.
-
From the list of instances, select the newly added ESXi host.
-
Note down the Private IPv4 address and Internal FQDN details for later use.
-
Get Attached Block Volumes iSCSI target server details.
-
Access iSCSI attachment information.
-
Access iSCSI target server details.
-
Note down the same iSCSI target information for all the attached block volumes.
-
-
-
Access the SDDC details page within the OCI Console. Locate and securely store the vCenter default password. You will need this password when adding the ESXi host to vCenter in a later task.
Note: Ensure you store the vCenter password securely. Avoid sharing it in plain text or storing it in unencrypted locations.
Task 3: Add and Configure the New ESXi Host in vCenter
In this task, we will add a newly created ESXi host to your vCenter cluster and configure its network settings.
-
To add the ESXi host to vCenter, open vCenter Server and locate the desired data center where you want to add the ESXi host. You can find this data center in the inventory pane.
-
Right-click on the chosen data center and select Add Host.
-
In the Add Host wizard, enter the following information.
-
Host Name or IP Address: Enter the FQDN for the new ESXi host noted in Task 2 and click Next.
-
Connection Settings: Enter the log in credentials for the ESXi host. Username should be root and password should be the default vCenter password obtained from the OCI Console SDDC details page. Click Next.
-
Host Summary: Review the summarized information about the host and click Next.
-
Host Lifecycle: Deselect Manage host with an image and click Next.
-
Assign License: Select an existing vSphere license from the available options to assign a license to the new ESXi host and click Next.
-
Lockdown Mode: Select Normal lockdown mode, which is the standard setting used with Oracle Cloud VMware Solution deployments. You can adjust this setting if needed based on your specific environment and click Next.
-
VM Location: Keep the default settings for VM placement and click Next.
-
Review and Finish: Review all the configuration details one last time and click Finish to submit the task and add the ESXi host to your vCenter cluster.
-
-
Set ESXi host to maintenance mode.
Once the ESXi host is successfully added, right-click on it within the vCenter inventory and select Enter Maintenance Mode. This takes the host offline, allowing you to configure its network settings.
Validate that the host has successfully entered into maintenance mode.
-
Verify host status in NSX Manager (Optional).
In NSX Manager, the new ESXi host should be listed under Other Nodes and NSX Configuration status as Not Configured.
-
Add the ESXi host to the Distributed Switch.
-
Navigate to the Networking view within vCenter Server.
-
Select the Distributed Switch (DSwitch) associated with the cluster where the ESXi host will reside.
-
Right-click on the DSwitch or click Actions and select Add and Manage Hosts.
-
In the Add and Manage Hosts window, enter the following information.
-
Add Hosts: Select Add Hosts and click Next.
-
Select Hosts: Select the newly added ESXi host from the list and ensure it is currently in maintenance mode. Click Next.
-
Manage Physical Adapters: Select
vmnic0
andvmnic1
from the drop-down menu. -
Manage VMkernel Adapters: Assign each VMkernel adapter (vmk) to a specific port group as shown.
VMKernal Adapter Port Group vmk0 Management Networking vmk1 vMotion vmk2 vSAN vmk3 Replication vmk4 Provisioning -
Migrate VM Networking: Keep the default values for migrating VM networking.
-
-
Review all configuration details and click Finish to submit the changes and add the ESXi host to the Distributed Switch.
-
-
Move the ESXi host to the vCenter cluster.
-
Once the network configuration is complete, you can move the ESXi host to the intended vCenter cluster. Right-click on the host and select Move to.
-
In the Move To window, select the cluster and click OK.
-
In the Move Host into Cluster window, keep the default selection Put all of this host’s virtual machines in the cluster’s root resource pool and click Ok to complete the move.
-
Task 4: Verify NSX Configuration
Within NSX Manager, you can now observe the configuration status of the newly added ESXi host. NSX automatically pushes the configuration to the host and integrates it into the cluster.
Monitor the NSX configuration for successful completion. This process typically takes at least 5 minutes. The NSX Configuration first changes to Success and Node Status shows as Unknown, after a few minutes it changes to Down and then to Up.
Once the configuration finishes, verify that the NSX Configuration status displays as Success and Up within NSX Manager. This confirms that the ESXi host has been successfully configured for NSX.
Task 5: Configure the Datastores
This task covers configuring datastores for your newly added ESXi host. The specific steps depend on whether you are using Virtual Machine File System (VMFS) Datastores backed by OCI Block Storage or vSAN datastore with Dense shaped instances.
Scenario 1: Configure Standard Shaped Instances (VMFS Datastores)
Follow these steps to configure VMFS datastores using OCI Block Storage.
-
Ensure all the OCI Block Volumes attached to the other ESXi hosts in the cluster are also attached to the newly added host.
-
Copy the iSCSI attachment information for all the block volumes you attached in step 1. You will need this information later.
-
Access iSCSI storage adapters.
-
In vCenter Server, select the newly added ESXi host.
-
Navigate to Configure and Storage Adapters.
-
-
Configure iSCSI targets servers.
-
From the right-hand pane, select the iSCSI storage adapter.
-
Select the Dynamic Discovery tab and click Add to add iSCSI target server.
-
-
Add all the iSCSI target server IPs you gathered in step 2.
-
Once all iSCSI servers are added, select the iSCSI adapter again and click Rescan Adapters to refresh the connection.
-
Verify block volume attachments. After the rescan completes, you should see all the block volumes attached as Oracle iSCSI disks.
-
Validate datastore availability from Datastores tab for the newly added host. You should see all the datastores mounted, matching the configuration of the other hosts in the cluster.
-
To confirm datastore presence, navigate to the Storage view and select the datastore cluster. Verify that the newly added host appears under the Hosts section.
-
Once all configurations are complete, remove the ESXi host from maintenance mode.
-
After exiting maintenance mode, confirm that your virtual environment remains stable and healthy as expected.
Scenario 2: Configure Dense Shaped Instances (vSAN Datastore)
Note: These steps are only applicable if you are using Dense shaped instances with vSAN.
Before configuring vSAN datastore, ensure the ESXi host is out of maintenance mode. Monitor the progress until completion.
-
Access vSAN disk management.
-
Select Dense Cluster under the data center.
-
Navigate to Configure, vSAN and Disk Management.
-
-
To claim unused disks, click Claim Unused Disks to incorporate available disks into vSAN storage.
-
Configure vSAN disks: A vSAN cluster typically requires at least one high-performance cache disk and one or more capacity disks per host for data storage. Select the first disk as the cache and the remaining disks for capacity (usually 7 for Dense shapes). You can adjust this configuration based on your specific environment. Submit the task and wait for successful completion.
-
From the right-hand pane, confirm that all available disks on the host are listed and healthy.
-
To verify vSAN datastore capacity, navigate to the Storage view and select the vSAN datastore. The summary page should now reflect the increased total capacity due to the added capacity drives.
-
To confirm host status in vSAN, go to the Hosts tab within the datastore. You should see the newly added host listed with a Normal status.
-
Configure vSAN fault domain.
-
A single OCI region typically has 3 fault domains, and vSAN fault domains should mirror these. Oracle Cloud VMware Solution provisioning usually distributes ESXi hosts across all fault domains for optimal balance. As this is replacing the existing faulty host, the provisioning service deploys within the same fault domain. Aim to co-locate it with the original host that resides in the same OCI fault domain.
-
Under vSAN, click Fault Domains. Select the newly added host and move it to the same fault domain as the original host (for example,
Fault-Domain-1
).
-
-
Verify fault domain placement and confirm that the new host now resides within the desired fault domain.
Task 6: Test the New ESXi Host
This task ensures the newly added ESXi host functions correctly by deploying or migrating a test virtual machine (VM) to it.
-
Deploy or migrate a test VM. You can either deploy a new test VM directly on the newly added ESXi host or migrate/clone an existing test VM from another host in the cluster to the new host.
-
Verify VM functionality. Once the VM is deployed or migrated, power it on and perform basic tests to confirm it works as expected. This could involve:
- Logging in to the VM operating system.
- Verifying network connectivity.
- Checking resource availability (CPU, memory and storage).
- Testing application functionality (if applicable).
If the test VM operates successfully on the new ESXi host, you can proceed with confidence that the host has been configured correctly.
Task 7: Remove the Faulty Host from vCenter and NSX Manager
In this task, we will remove the ESXi host from your vCenter cluster and NSX Manager.
-
Prepare the ESXi host for removal.
-
Log in to the vCenter Server and locate the ESXi host you want to retire.
-
If the host is already in a Disconnected state and all you want to do is remove the host from vCenter, skip step 2 to 5 and move to step 6 (Disconnect and Remove host from vCenter Inventory).
-
Ensure all virtual machines on the target host are either powered off or migrated to the new host or other hosts within the cluster. A host with running VMs cannot enter maintenance mode.
-
-
To enter maintenance mode, right-click on the ESXi host and select Maintenance Mode and Enter Maintenance Mode.
Data Migration Options (Based on Host Type):
-
Standard Shapes: By default, powered off and suspended VMs are migrated to other hosts. Accept the defaults and submit the task.
-
Dense Shapes: In addition to the default migration, also select Full data migration from the vSAN data migration drop-down menu. This ensures complete data evacuation from the host.
Note: Click PRE-CHECK to validate the vSAN migration process before proceeding to maintenance mode.
-
-
Verify successful maintenance mode entry.
-
Standard Shapes: Due to minimal data movement, this should be quick.
-
Dense Shapes: vSAN data evacuation can take time depending on the environment. Monitor the progress.
Note: Ensure successful entry to maintenance mode before continuing to avoid data loss or downtime.
-
-
Move faulty host out of the cluster.
-
To isolate the host from the cluster, right-click on the host and click Move.
-
Select the datacenter.
-
Verify the faulty host is not in the vCenter cluster.
-
-
Monitor NSX configuration removal.
-
To monitor NSX configuration removal, log in to NSX Manager and observe the automatic removal of NSX configuration on the host.
-
Verify NSX configuration removal completion. In NSX Manager, confirm the host shows Not Configured under Other Nodes.
-
-
Disconnect and remove host from vCenter Inventory.
-
To disconnect the ESXi host, right-click on the host and click Connection, Disconnect in vCenter Server.
-
Verify host disconnected status as the host should now appear as Disconnected in vCenter Server.
-
To remove the host from inventory, right-click on the host and select Remove from Inventory. This permanently removes the host from your vCenter inventory (proceed with caution).
-
Verify the health of your environment in both vCenter Server and NSX Manager.
-
Task 8: Remove the Faulty Host in OCI Console
This task guides you through terminating the faulty ESXi host within the OCI Console.
-
Open the OCI Console and navigate to the cluster containing the ESXi host you want to remove.
-
Identify the host that you previously marked for replacement (indicated with Updating status).
-
To terminate the host, click Remove failed Host associated with the faulty host. This would be located in the top banner or within the host details section.
-
Now, the faulty host will change to Terminating state.
The OCI Console will start a task to delete the ESXi host. Monitor the progress of this task until it reaches successful completion.
Notice the pricing interval end times have switched between the hosts.
-
Once the termination task finishes successfully, the Replace Host activity is considered complete. Validate that the status of you SDDC is healthy and back to the same host count as it was prior to starting the replace host activity.
For further configuration options tailored to your specific VMware environment, consult the relevant vCenter documentation. For any Oracle Cloud VMware Solution related questions, see Oracle Cloud VMware Solution.
Related Links
Acknowledgments
- Author - Praveen Kumar Pedda Vakkalam (Principal Solutions Architect)
More Learning Resources
Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.
For product documentation, visit Oracle Help Center.
Replace a Faulty ESXi Host from your Oracle Cloud VMware Solution Cluster on Oracle Cloud Infrastructure
F96887-01
May 2024