Create the Stretched VMware vSAN Cluster

With all prerequisite configurations complete, you can now proceed to create the VMware vSAN stretched cluster. This step formalizes the connection between hosts across OCI Dedicated Region A and OCI Dedicated Region B, as well as the Witness node deployed in a third region.

You can use the Quickstart wizard or navigate directly to: Cluster, Configure, vSAN, Fault Domains and Stretched Cluster in the VMware vCenter UI.

Configure the following during this process:

Assign OCI Dedicated Region A hosts to Fault Domain 1
Assign OCI Dedicated Region B hosts to Fault Domain 2
Specify the Witness Host (previously added) for quorum

For more details, see Stretched Cluster Requirements and VMware vSAN Stretched Cluster Guide.

After the stretched cluster is created:

Run the vSAN Health Checks to validate cluster integrity.
Address any networking-related errors (e.g., MTU mismatch or routing issues).

Note:

You might encounter stale vSAN objects on certain hosts from their original clusters. Refer to this guide to remove them: How to Delete Inaccessible Objects in vSAN Datastore

When complete, the cluster should report a vSAN health score in the high 90s, indicating a successful stretched configuration.

Configure NSX

With the VMware vSAN cluster stretched, update VMware NSX to support cross-site overlay networking. This step ensures ESXi hosts from both regions can communicate over NSX tunnels using their respective transport zones.

Clone NSX TEP Configuration

Copy the NSX TEP IP Pool from the OCI Dedicated Region B NSX Manager to the OCI Dedicated Region A NSX Manager.
To avoid IP conflicts with the management ESXi host still present in OCI Dedicated Region B, configure the new IP Pool in OCI Dedicated Region A to start from .10.
Example: In OCI Dedicated Region A NSX Manager, create a TEP Pool with a range of .10–.20 for OCI Dedicated Region B hosts to ensure no overlap with existing IPs.

Create a OCI Dedicated Region B Uplink Profile in OCI Dedicated Region A

In the OCI Dedicated Region A NSX Manager, define a new Uplink Profile specifically for OCI Dedicated Region B hosts.
Use the correct VLAN ID and ensure the uplink order matches the OCI Dedicated Region B configuration.

Prepare Hosts for NSX

Use OVERLAY-TZ and VLAN-TZ as the transport zones.
During host preparation, assign the appropriate Uplink Profile depending on whether the host is from OCI Dedicated Region A or OCI Dedicated Region B.
Note: In some scenarios, especially after a failover event, NSX tunnel interfaces may not come up correctly. To resolve this:
- Reboot the affected ESXi host or
- Run services.sh restart via SSH on the host.
This ensures all NSX services start in the correct order and restore tunnel stability.

Create NSX Overlay Segments

Create four NSX overlay segments.
Ensure these segments are visible and synchronized across all ESXi hosts in both sites.

Configure DHCP (Optional)

Optionally, configure DHCP settings for the new overlay segments.
DNS settings have already been configured earlier in this guide and do not need to be repeated here.

Validate End-to-End Overlay Connectivity

Deploy four VMs, placing one on each host across both regions.
Assign each VM a static IP address within the respective segment range.
Ping the segment gateways and between VMs to validate L3 overlay connectivity across the stretched environment.

Enable External Connectivity for Overlay VMs

To allow VMware NSX overlay VMs to access external networks, configure NAT rules and routing for the relevant VLANs.

In both VCN-MGMT-Active and VCN-MGMT-Failover, update the NAT configuration for the NSX Edge Uplink 1 VLAN:

Use the same external access IPs in both regions, matching those used during the OCI Dedicated Region A deployment.
Confirm the IP used is the HA VIP for the NSX Edge nodes, visible in NSX Manager.

Also update external access rules for the vSphere VLANs:

Configure NAT rules for vcenter-vip, nsxt-manager-vip, and hcx-manager-vip (if HCX is used) in both VCNs.

DNS Forwarding Support

Overlay VMs typically use a DNS forwarder (e.g., 192.168.253.253) defined in NSX-T. To route these DNS queries:

Create a dedicated route table for the NAT Gateway.
Define a static route:
- Destination: 10.x.x.x (overlay VM subnets)
- Target: NAT Gateway
- DNS Forwarder IP: 192.168.253.253

This configuration must be replicated in both sites. Associate the new route table with the NAT Gateway for consistent behavior.

Reassign ESXi Host VLANs to the Floating VCN

In the current setup, each ESXi host is provisioned with two physical NICs, each associated with a default set of VLANs configured via VNIC attachments to VCN-Primary (OCI Dedicated Region A) and VCN-Secondary (OCI Dedicated Region B). These VNICs are configured using the secondary CIDR block (172.45.0.0/16) attached to the respective VCNs.

To complete the transition toward a stretched configuration, all VLANs with tags 200 and above (e.g., for vSphere, HCX, NSX Edge, and so on) must be migrated to the floating VCNs:

VCN-MGMT-Active in OCI Dedicated Region A
VCN-MGMT-Failover in OCI Dedicated Region B

Migrate VNICs to the Floating VCN

Follow these steps for each ESXi host in both SDDCs:

Access ESXi Host Details: In the OCI Console, go to Compute, ESXi Hosts.
Delete Existing VNIC Attachments: For each host, delete the VNICs associated with VLANs 201 and above from VCN-Primary or VCN-Secondary.

Note:
This step is mandatory because a new VNIC cannot be created for the same VLAN while the old one exists.
Recreate VNICs in the Floating VCN:
- Create a new VNIC for each VLAN in the corresponding floating VCN:
  - Use VCN-MGMT-Active in OCI Dedicated Region A
  - Use VCN-MGMT-Failover in OCI Dedicated Region B
- Select the VLAN tagged with the appropriate -NEW suffix to distinguish it from the original.
Repeat this process for both VNICs per host. We recommend a systematic approach: start with vnic0 and vnic1 for VLAN 201, complete the replacements, then proceed with the next VLAN.

Special Considerations for Secondary Site Hosts

After migrating the VNICs for hosts in the Primary Site, repeat the process for all hosts in the Secondary Site. However, note one key detail:
- The vSphere management components in the Secondary Site were initially deployed on a temporary VLAN (e.g., VLAN-Stretched-Cls-Mgmt-vSphere-TEMP).
- This temporary VLAN can remain in place during the transition. It does not impact stretched vSAN functionality and offers fallback access to vCenter and NSX components if needed.
Retaining this temporary VLAN ensures uninterrupted management access during the VNIC and network migration workflow.

Connectivity Impact and Recovery

During VNIC updates, temporary loss of connectivity to vCenter, NSX Manager, or ESXi hosts is expected. To ensure recovery:
1. Verify DRG Attachments: Confirm that the appropriate Management VCNs (both Active and Failover) are attached to their respective Dynamic Routing Gateways (DRGs).
2. Update Route Tables:
  - Update the master route table in each Management VCN to point to the DRG.
  - Update Bastion subnet route tables to ensure management traffic is routed correctly between VCNs and across regions.
3. Validate Access:
  - After updating routes, access to all management interfaces from the Bastion host should be restored.
  - If any resources remain unreachable, double-check NSG rules and route propagation between VCNs.
Post- vNIC Migration Clean-up

Once the VNIC migration is complete:
- Remove all unused VLANs from VCN-Primary and VCN-Secondary that belong to the 172.45.0.0/16 CIDR block.
- Detach the secondary CIDR (172.45.0.0/16) from VCN-Primary since it’s no longer in use.
OCI will allow the CIDR detachment only when no active resources (VNICs, subnets, or VLANs) use it.
- You may observe warning indicators in the SDDC resource page in the OCI Console, this is expected, as the Oracle Cloud VMware Solution service is no longer tracking components that were initially deployed in VCN-Primary.
Update Routing for New VCN Attachments
1. Attach VCN-MGMT-Active to the DRG in OCI Dedicated Region A.
2. Update route tables:
  - For VCN-MGMT-Active: point the default route (0.0.0.0/0) to the DRG.
  - For the Bastion subnet in VCN-Primary: update its route table to point to the DRG to ensure it can still access VMware vCenter and VMware NSX Manager.
After these changes are made, VMware vCenter and VMware NSX Manager in OCI Dedicated Region A should become reachable from the Bastion host, even though their underlying interfaces now reside in a different VCN.

Configure DRS Affinity Rules, HA, and VMware vSAN Storage Policy

Once the stretched cluster is fully operational and networking is stable across both sites, configure Distributed Resource Scheduler (DRS), High Availability (HA), and assign a site-aware VMware vSAN storage policy to the workload and management virtual machines (VMs).

These configurations ensure optimal placement of VMs across fault domains and enable automatic recovery during site failures.

Migrate VMs to the Stretched Cluster

Begin by migrating all management VMs and test workload VMs into the newly created stretched cluster:

Use vMotion to move the VMs from their original site-specific clusters to the stretched cluster.
If everything is configured correctly (networking, storage, port groups), VM migration should complete without any issue.

If default NSX DRS rules exist and are set to MUST, remove them. These can interfere with HA operations and prevent failover of NSX Edge nodes and NSX Manager VMs.

Create VM and Host Groups

Define affinity groups for workload placement:

Create Host Groups:
- Group hosts belonging to the Primary Site.
- Group hosts belonging to the Secondary Site.
Create VM Groups:
- Group Management VMs that must reside on hosts within each site (such as vCenter, NSX Managers, NSX Edge nodes, HCX Manager and others if applicable).
- Similarly, group all Workload VMs together (in our case all the test VMs).

Define VM/Host Affinity Rules

After groups are defined:

Create VM-to-Host affinity rules to keep VMs located on hosts in their intended site.
Use Run VMs on Hosts rules to allow HA flexibility in failover scenarios.
Create such rules for both the Management VM and Workload VM Groups.

This setup ensures that during normal operation, each site hosts its intended workloads, but allows automatic recovery in the event of a host or site failure.

Enable High Availability (HA)

Ensure HA is enabled at the cluster level after affinity rules are in place.
The default option to Restart VMs upon a Host Failure event, ensures VM restart during unexpected failures, including full site outages.

Create and Apply a Stretched vSAN Storage Policy

To ensure data redundancy across both sites in a stretched configuration, define a new vSAN Storage-Based Policy Management (SBPM) policy. This policy will control how VM data is distributed across the fault domains and the witness site.

Configure the following placement rules within the policy:

Storage Type: vSAN
Site Disaster Tolerance: Site mirroring – stretched cluster
Failures to Tolerate: No data redundancy
Number of Disk Stripes per Object: 1
IOPS Limit for Object: 0

Leave all other options at their default values.

Once the policy is created:

Apply the policy to all test and management VMs within the stretched cluster.
Navigate to Monitor, vSAN, Resyncing Objects to observe and track the resync process.
After resync completes, verify object placement to confirm the policy is functioning as expected:
- One replica object is located on the Primary Site
- A second replica object is located on the Secondary Site
- The witness component resides in the remote Witness region

All VMs will initially appear as non-compliant. Select each VM or a group of VMs and manually assign the newly created stretched policy to bring them into compliance.

Additionally, navigate to Monitor, vSAN, Resyncing Objects and Virtual Objects. Once the resyncing process is complete, you should observe that each VM’s virtual objects are distributed correctly across the Primary Site, Secondary Site, and Witness node, validating full compliance with the stretched cluster design.