Create the Stretched VMware vSAN Cluster
With all prerequisite configurations complete, you can now proceed to create the VMware vSAN stretched cluster. This step formalizes the connection between hosts across OCI Dedicated Region A and OCI Dedicated Region B, as well as the Witness node deployed in a third region.
You can use the Quickstart wizard or navigate directly to: Cluster, Configure, vSAN, Fault Domains and Stretched Cluster in the VMware vCenter UI.
Configure the following during this process:
- Assign OCI Dedicated Region A hosts to Fault Domain 1
- Assign OCI Dedicated Region B hosts to Fault Domain 2
- Specify the Witness Host (previously added) for quorum
For more details, see Stretched Cluster Requirements and VMware vSAN Stretched Cluster Guide.
After the stretched cluster is created:
- Run the vSAN Health Checks to validate cluster integrity.
- Address any networking-related errors (e.g., MTU mismatch or routing issues).
Note:
You might encounter stale vSAN objects on certain hosts from their original clusters. Refer to this guide to remove them: How to Delete Inaccessible Objects in vSAN DatastoreWhen complete, the cluster should report a vSAN health score in the high 90s, indicating a successful stretched configuration.
Configure NSX
With the VMware vSAN cluster stretched, update VMware NSX to support cross-site overlay networking. This step ensures ESXi hosts from both regions can communicate over NSX tunnels using their respective transport zones.
- Copy the NSX TEP IP Pool from the OCI Dedicated Region B NSX Manager to the OCI Dedicated Region A NSX Manager.
- To avoid IP conflicts with the management ESXi host
still present in OCI Dedicated Region B, configure the new IP Pool in OCI Dedicated Region A to start from .10.
Example: In OCI Dedicated Region A NSX Manager, create a TEP Pool with a range of .10–.20 for OCI Dedicated Region B hosts to ensure no overlap with existing IPs.
- In the OCI Dedicated Region A NSX Manager, define a new Uplink Profile specifically for OCI Dedicated Region B hosts.
- Use the correct VLAN ID and ensure the uplink order matches the OCI Dedicated Region B configuration.
- Use OVERLAY-TZ and VLAN-TZ as the transport zones.
- During host preparation, assign the appropriate
Uplink Profile depending on whether the host
is from OCI Dedicated Region A or OCI Dedicated Region B.
Note: In some scenarios, especially after a failover event, NSX tunnel interfaces may not come up correctly. To resolve this:
- Reboot the affected ESXi host or
- Run
services.sh
restart via SSH on the host.
This ensures all NSX services start in the correct order and restore tunnel stability.
- Create four NSX overlay segments.
- Ensure these segments are visible and synchronized across all ESXi hosts in both sites.
- Optionally, configure DHCP settings for the new overlay segments.
- DNS settings have already been configured earlier in this guide and do not need to be repeated here.
- Deploy four VMs, placing one on each host across both regions.
- Assign each VM a static IP address within the respective segment range.
- Ping the segment gateways and between VMs to validate L3 overlay connectivity across the stretched environment.
Enable External Connectivity for Overlay VMs
To allow VMware NSX overlay VMs to access external networks, configure NAT rules and routing for the relevant VLANs.
In both VCN-MGMT-Active
and
VCN-MGMT-Failover
, update the NAT configuration for
the NSX Edge Uplink 1 VLAN:
- Use the same external access IPs in both regions, matching those used during the OCI Dedicated Region A deployment.
- Confirm the IP used is the HA VIP for the NSX Edge nodes, visible in NSX Manager.
Also update external access rules for the vSphere VLANs:
- Configure NAT rules for vcenter-vip, nsxt-manager-vip, and hcx-manager-vip (if HCX is used) in both VCNs.
DNS Forwarding Support
Overlay VMs typically use a DNS forwarder (e.g.,
192.168.253.253
) defined in NSX-T. To route these
DNS queries:
- Create a dedicated route table for the NAT Gateway.
- Define a static route:
- Destination:
10.x.x.x
(overlay VM subnets) - Target: NAT Gateway
- DNS Forwarder IP:
192.168.253.253
- Destination:
This configuration must be replicated in both sites. Associate the new route table with the NAT Gateway for consistent behavior.
Reassign ESXi Host VLANs to the Floating VCN
In the current setup, each ESXi host is provisioned with two physical NICs,
each associated with a default set of VLANs configured via VNIC attachments to
VCN-Primary
(OCI Dedicated Region A) and VCN-Secondary
(OCI Dedicated Region B). These VNICs are configured using the secondary CIDR block
(172.45.0.0/16
) attached to the respective VCNs.
VCN-MGMT-Active
in OCI Dedicated Region AVCN-MGMT-Failover
in OCI Dedicated Region B
Migrate VNICs to the Floating VCN
- Access ESXi Host Details: In the OCI Console, go to Compute, ESXi Hosts.
- Delete Existing VNIC Attachments: For each host,
delete the VNICs associated with VLANs 201 and above from VCN-Primary or
VCN-Secondary.
Note:
This step is mandatory because a new VNIC cannot be created for the same VLAN while the old one exists. - Recreate VNICs in the Floating VCN:
- Create a new VNIC for each VLAN in the corresponding floating VCN:
- Use
VCN-MGMT-Active
in OCI Dedicated Region A - Use
VCN-MGMT-Failover
in OCI Dedicated Region B
- Use
- Select the VLAN tagged with the appropriate -NEW suffix to distinguish it from the original.
Repeat this process for both VNICs per host. We recommend a systematic approach: start with vnic0 and vnic1 for VLAN 201, complete the replacements, then proceed with the next VLAN.
Special Considerations for Secondary Site Hosts
After migrating the VNICs for hosts in the Primary Site, repeat the process for all hosts in the Secondary Site. However, note one key detail:
- The vSphere management components in the Secondary Site were initially deployed on a temporary VLAN (e.g., VLAN-Stretched-Cls-Mgmt-vSphere-TEMP).
- This temporary VLAN can remain in place during the transition. It does not impact stretched vSAN functionality and offers fallback access to vCenter and NSX components if needed.
Retaining this temporary VLAN ensures uninterrupted management access during the VNIC and network migration workflow.
Connectivity Impact and Recovery
During VNIC updates, temporary loss of connectivity to vCenter, NSX Manager, or ESXi hosts is expected. To ensure recovery:
- Verify DRG Attachments: Confirm that the appropriate Management VCNs (both Active and Failover) are attached to their respective Dynamic Routing Gateways (DRGs).
- Update Route Tables:
- Update the master route table in each Management VCN to point to the DRG.
- Update Bastion subnet route tables to ensure management traffic is routed correctly between VCNs and across regions.
- Validate Access:
- After updating routes, access to all management interfaces from the Bastion host should be restored.
- If any resources remain unreachable, double-check NSG rules and route propagation between VCNs.
Post- vNIC Migration Clean-up
Once the VNIC migration is complete:
- Remove all unused VLANs from
VCN-Primary
andVCN-Secondary
that belong to the172.45.0.0/16
CIDR block. - Detach the secondary CIDR (
172.45.0.0/16
) fromVCN-Primary
since it’s no longer in use.
OCI will allow the CIDR detachment only when no active resources (VNICs, subnets, or VLANs) use it.
- You may observe warning indicators in the SDDC resource page in
the OCI Console, this is expected, as the Oracle Cloud VMware Solution service is no longer tracking components that were initially deployed in
VCN-Primary
.
Update Routing for New VCN Attachments
- Attach
VCN-MGMT-Active
to the DRG in OCI Dedicated Region A. - Update route tables:
- For
VCN-MGMT-Active
: point the default route (0.0.0.0/0
) to the DRG. - For the Bastion subnet in
VCN-Primary
: update its route table to point to the DRG to ensure it can still access VMware vCenter and VMware NSX Manager.
- For
After these changes are made, VMware vCenter and VMware NSX Manager in OCI Dedicated Region A should become reachable from the Bastion host, even though their underlying interfaces now reside in a different VCN.
- Create a new VNIC for each VLAN in the corresponding floating VCN:
Configure DRS Affinity Rules, HA, and VMware vSAN Storage Policy
Once the stretched cluster is fully operational and networking is stable across both sites, configure Distributed Resource Scheduler (DRS), High Availability (HA), and assign a site-aware VMware vSAN storage policy to the workload and management virtual machines (VMs).
These configurations ensure optimal placement of VMs across fault domains and enable automatic recovery during site failures.
Migrate VMs to the Stretched Cluster
Begin by migrating all management VMs and test workload VMs into the newly created stretched cluster:
- Use vMotion to move the VMs from their original site-specific clusters to the stretched cluster.
- If everything is configured correctly (networking, storage, port groups), VM migration should complete without any issue.
If default NSX DRS rules exist and are set to MUST, remove them. These can interfere with HA operations and prevent failover of NSX Edge nodes and NSX Manager VMs.
Create VM and Host Groups
Define affinity groups for workload placement:
- Create Host Groups:
- Group hosts belonging to the Primary Site.
- Group hosts belonging to the Secondary Site.
- Create VM Groups:
- Group Management VMs that must reside on hosts within each site (such as vCenter, NSX Managers, NSX Edge nodes, HCX Manager and others if applicable).
- Similarly, group all Workload VMs together (in our case all the test VMs).
Define VM/Host Affinity Rules
After groups are defined:
- Create VM-to-Host affinity rules to keep VMs located on hosts in their intended site.
- Use Run VMs on Hosts rules to allow HA flexibility in failover scenarios.
- Create such rules for both the Management VM and Workload VM Groups.
This setup ensures that during normal operation, each site hosts its intended workloads, but allows automatic recovery in the event of a host or site failure.
- Ensure HA is enabled at the cluster level after affinity rules are in place.
- The default option to Restart VMs upon a Host Failure event, ensures VM restart during unexpected failures, including full site outages.
Create and Apply a Stretched vSAN Storage Policy
To ensure data redundancy across both sites in a stretched configuration, define a new vSAN Storage-Based Policy Management (SBPM) policy. This policy will control how VM data is distributed across the fault domains and the witness site.
Configure the following placement rules within the policy:
- Storage Type: vSAN
- Site Disaster Tolerance: Site mirroring – stretched cluster
- Failures to Tolerate: No data redundancy
- Number of Disk Stripes per Object: 1
- IOPS Limit for Object: 0
Leave all other options at their default values.
Once the policy is created:
- Apply the policy to all test and management VMs within the stretched cluster.
- Navigate to Monitor, vSAN, Resyncing Objects to observe and track the resync process.
- After resync completes, verify object placement to confirm the
policy is functioning as expected:
- One replica object is located on the Primary Site
- A second replica object is located on the Secondary Site
- The witness component resides in the remote Witness region
All VMs will initially appear as non-compliant. Select each VM or a group of VMs and manually assign the newly created stretched policy to bring them into compliance.
Additionally, navigate to Monitor, vSAN, Resyncing Objects and Virtual Objects. Once the resyncing process is complete, you should observe that each VM’s virtual objects are distributed correctly across the Primary Site, Secondary Site, and Witness node, validating full compliance with the stretched cluster design.