Networking Issues

This section describes known issues and workarounds related to all aspects of the appliance networking, meaning the system's internal connectivity, the external uplinks to the data center, and the virtual networking for users' compute instances.

Possible Impact to BGP Links After Upgrading to 3.0.2-b1081555 or Higher

When running BGP in a Mesh configuration, you may experience a situation where BGP links show an IDLE state or never connected, when upgrading to 3.0.2-b1010555 or later. If you are using BGP in a Mesh configuration and are currently running a release prior to 3.0.2-b1010555 then contact Oracle support, who can assist in updating and correcting your uplink configuration prior to upgrade. If an upgrade to 3.0.2-b1010555 or later has already been performed and you see BGP link state as IDLE then contact Oracle support who can also assist, post upgrade.

Bug: 36525352

Version: 3.0.2

DNS Zone Scope Cannot Be Set

When creating or updating a DNS zone, scope cannot be set. In command line output, the value of the scope property is null.

Bug: 32998565

Version: 3.0.1

To Update a DNS Record the Command Must Include Existing Protected Records

When updating a DNS record, it is expected that you include all existing protected records in the update command even if your update does not affect those. This requirement is intended to prevent the existing protected records from being inadvertently deleted. However, the checks are so restrictive with regard to SOA records that certain updates are difficult to achieve.

Workaround: It is possible to update existing records by either providing the SOA record as part of the command, or by setting the domain to not include the SOA domain. In practice, most record updates occur at a higher level and are not affected by these restrictions.

Bug: 33089111

Version: 3.0.1

Fix available: Please apply the latest patches to your system.

Create Route Table Fails With Confusing Error Message

When you create a route table, but make a mistake in the route rule parameters, the API server may return an error message that is misleading. That specific message reads: "Route table target should be one of LPG, NAT gateway, Internet gateway, DRG attachment or Service gateway." In that list of possible targets, DRG attachment is not correct. The dynamic routing gateway itself should be specified as a target, not its DRG attachment.

Workaround: Ignore the error message in question. When configuring route rules to send traffic through a dynamic routing gateway, specify the DRG as the target.

Bug: 33570320

Version: 3.0.1

VCN Creation Uses Deprecated Parameter

When creating a VCN, you typically specify the CIDR range it covers. In the Compute Web UI, you simply enter this in the applicable field. However, the CLI provides two command parameters: --cidr-block, which is now deprecated, and --cidr-blocks, which is a new parameter that is meant to replace the deprecated one. When using the OCI CLI with Private Cloud Appliance you must use --cidr-block. The new parameter is not supported by the API server.

Workaround: Ignore any warning messages about the deprecated parameter. Use the --cidr-block parameter when specifying the CIDR range used by a VCN.

Bug: 33620672

Version: 3.0.1

File Storage Traffic Blocked By Security Rules

To allow users to mount file systems on their instances, security rules must be configured in addition to those in the default security list, in order to allow the necessary network traffic between mount targets and instances. Configuring file storage ports and protocols in Oracle Private Cloud Appliance is further complicated by the underlay network architecture, which can block file storage traffic unexpectedly unless the source and destination of security rules are set up in a very specific way.

Scenario A – If the mount target and instances using the file system service reside in the same subnet, create a security list and attach it to the subnet in addition to the default security list. The new security list must contain the following stateful rules:

+++ Ingress Rules ++++++++++++++++++++

Source            Protocol     Source Ports            Destination Ports
------            --------     ------------            -----------------
<subnet CIDR>     TCP          All                     111, 389, 445, 4045,
                                                       2048-2050, 20048
<subnet CIDR>     UDP          All                     111, 289, 445, 2048,
                                                       4045, 20048

+++ Egress Rules ++++++++++++++++++++

Destination       Protocol     Source Ports            Destination Ports
-----------       --------     ------------            -----------------
<subnet CIDR>     TCP          111, 389, 445, 4045,    All
                               2048-2050, 20048
<subnet CIDR>     TCP          All                     111, 389, 445, 4045,
                                                       2048-2050, 20048
<subnet CIDR>     UDP          111, 389, 445,          All
                               4045, 20048
<subnet CIDR>     UDP          All                     111, 389, 445,
                                                       4045, 20048

Scenario B – If the mount target and instances using the file system service reside in different subnets, create a new security list for each subnet, and attach them to the respective subnet in addition to the default security list.

The new security list for the subnet containing the mount target must contain the following stateful rules:

+++ Ingress Rules ++++++++++++++++++++

Source                        Protocol     Source Ports            Destination Ports
------                        --------     ------------            -----------------
<instances subnet CIDR>       TCP          All                     111, 389, 445, 4045,
                                                                   2048-2050, 20048
<instances subnet CIDR>       UDP          All                     111, 289, 445, 2048,
                                                                   4045, 20048

+++ Egress Rules ++++++++++++++++++++

Destination                   Protocol     Source Ports            Destination Ports
-----------                   --------     ------------            -----------------
<instances subnet CIDR>       TCP          111, 389, 445, 4045,    All
                                           2048-2050, 20048
<instances subnet CIDR>       UDP          111, 389, 445,          All
                                           4045, 20048

The new security list for the subnet containing the instances using the file system service must contain the following stateful rules:

+++ Ingress Rules ++++++++++++++++++++

Source                        Protocol     Source Ports            Destination Ports
------                        --------     ------------            -----------------
<mount target subnet CIDR>    TCP          111, 389, 445, 4045,    All
                                           2048-2050, 20048
<mount target subnet CIDR>    UDP          111, 289, 445, 2048,    All
                                           4045, 20048

+++ Egress Rules ++++++++++++++++++++

Destination                   Protocol     Source Ports            Destination Ports
-----------                   --------     ------------            -----------------
<mount target subnet CIDR>    TCP          All                     111, 389, 445, 4045,
                                                                   2048-2050, 20048
<mount target subnet CIDR>    UDP          All                     111, 389, 445,
                                                                   4045, 20048

Workaround: Follow the guidelines provided here to configure ingress and egress rules that enable file system service traffic. If the unmodified default security list is already attached, the proposed egress rules do not need to be added, because there already is a default stateful security rule that allows all egress traffic (destination: 0.0.0.0/0, protocol: all).

Bug: 33680750

Version: 3.0.1

Stateful and Stateless Security Rules Cannot Be Combined

The appliance allows you to configure a combination of stateful and stateless security rules in your tenancy. The access control lists generated from those security rules are correct, but may cause a wrong interpretation in the virtual underlay network. As a result, certain traffic may be blocked or allowed inadvertently. Therefore, it is recommended to use either stateful or stateless security rules.

Workaround: This behavior is expected; it is not considered a bug. Whenever possible, create security rules that are either all stateful or all stateless.

Note:

If you have a specific need, you can have stateful and stateless rules combined, but if you use stateless rules they must be symmetrical, meaning you cannot have a stateless egress rule, and a stateful ingress rule for the same flow.

Bug: 33744232

Version: 3.0.1

Routing Failure With Public IPs Configured as CIDR During System Initialization

When you complete the initial setup procedure on the appliance (see "Complete the Initial Setup" in the chapter Configuring Oracle Private Cloud Appliance of the Oracle Private Cloud Appliance Installation Guide), one of the final steps is to define the data center IP addresses that will be assigned as public IPs to your cloud resources. If you selected BGP-based dynamic routing, the public IPs may not be advertised correctly when defined as one or more CIDRs, and thus may not be reachable from outside the appliance.

Workaround: To ensure that your cloud resources' public IPs can be reached from outside the appliance, specify all IP addresses individually with a /32 netmask. For example, instead of entering 192.168.100.0/24, submit a comma-separated list: 192.168.100.1/32,192.168.100.2/32,192.168.100.3/32,192.168.100.4/32, and so on.

Bug: 33765256

Version: 3.0.1

Fix available: Please apply the latest patches to your system.

Admin Network Cannot Be Used for Service Web UI Access

The purpose of the (optional) Administration network is to provide system administrators separate access to the Service Web UI. The current implementation of the Administration network is incomplete and cannot provide the correct access.

Workaround: None available. At this point, do not configure the Admin Network during initial configuration.

Bug: 34087174, 34038203

Version: 3.0.1

Network Configuration Fails During Initial Installation Procedure

After physical installation of the appliance rack, the system must be initialized and integrated into your data center environment before it is ready for use. This procedure is documented in the chapter titled "Configuring Oracle Private Cloud Appliance" of the Oracle Private Cloud Appliance Installation Guide. If the network configuration part of this procedure fails – for example due to issues with message transport or service pods, or errors returned by the switches – there are locks in place that need to be rolled back manually before the operation can be retried.

Workaround: None available. Please contact Oracle for assistance.

If possible, confirm the state of the network configuration from the Service CLI.

PCA-ADMIN> show networkConfig                                                                                  
Data:
[...]
  Network Config Lifecycle State = FAILED

Bug: 34788596

Version: 3.0.2

External Certificates Not Allowed

At this time, Oracle Private Cloud Appliance does not allow the use of external CA-signed certificates.

Workaround: Please contact Oracle support for a workaround.

Bug: 33025681

Version: 3.0.2

DNS Entries on Oracle Linux 8 Instances Incorrect After Upgrade to Release 3.0.2

After the appliance software is upgrade to Release 3.0.2, the name resolution settings in the compute instance operating system are not automatically updated. Up-to-date network parameters are obtained when the instance's DHCP leases are renewed. Until then, due to the way Oracle Linux 8 responds to DNS server messages, it can fail to resolve short host names although queries with FQDNs are successful. Oracle Linux 7 instances are not affected by this issue.

Workaround: Restart the DHCP client service (dhclient) on the command line of your Oracle Linux 8 instances. Rebooting the instance also resolves the issue.

Bug: 34918899

Version: 3.0.2

Network Load Balancer Does Not Report Detailed Backend Health Status

Users of Oracle Cloud Infrastructure might be familiar with the detailed health statuses it provides for backend servers of network load balancers. In case a backend server is not entirely healthy, the health check status provides an indication of the problem, for example: connection failure, time-out, regex mismatch, I/O error, invalid status code. Due to the specific load balancer implementation in Oracle Private Cloud Appliance, the Network Load Balancer service can only report whether a backend server is healthy (OK) or unhealthy (CRITICAL).

Workaround: There is no workaround. Backend health checks cannot provide extra status information.

Bug: 35993214

Version: 3.0.2

Route Table Stuck in Provisioning State Failure

When updating a route table that is associated as an attachment to a Dynamic Routing Gateway (DRG) to have a Local Peering Gateway (LPG) as a target, this known issue can leave the route table stuck in the provisioning state:

{
  "timestamp": "2023-06-28T15:30:58.635+0000",
  "rid":
"7FCCBAEBA62848878983FDA3098EE4DB/330fc100-b86f-4137-a1f9-2437a512b8e8/7b003c9
7-11c4-4e4a-8c9a-11861532db0d,
  "process": 1,
  "ocid": null,
  "levelname": "ERROR"
  "src_lineno": 481
  "src_pathname": "/usr/lib/python3.6/site-packages/pcanwctl/framework.py",
  "message::
  "Exception on function call: update_route_table, error: (404,
NotAuthorizedOrNotFound', 'No Subnet was found'), start exception rollback",
  "tag": "pca-nwctl.log"
}

Workaround: Delete and recreate the route table to avoid the error in the update routine.

Bug: 35547644

Version: 3.0.2

Updating Route Table Using Terraform Fails Because DRG Is Not Attached

When deploying network resources with Terraform, it may occur that a route table cannot be updated because an expected Dynamic Routing Gateway (DRG) attachment appears not to exist. Although the DRG is attached to the VCN, the operation is not fully completed when the command is issued to update the route table. The quick succession of commands through Terraform can reveal this timing issue, but it is highly unlikely to occur as a result of human user actions.

Workaround: Assuming the route table update failure is the result of a timing issue, repeating the route table update command is expected to succeed. Reapply the Terraform configuration or update the route table manually.

Bug: 36297777

Version: 3.0.2

Failure Executing Terraform Destroy Due to Route Table in Provisioning State

When you run a terraform destroy operation, it might fail because a route table object is still in 'provisioning' state instead of 'available'. This typically occurs when many updates are made to a route table in a short amount of time, resulting in commands taking longer to complete than expected by the Terraform provider. Strictly speaking, this is not a bug but rather a timing issue.

Workaround: Assuming the failure is the result of a timing issue, no route table or other resource is permanently stuck in 'provisioning' state. Repeating the terraform destroy command is expected to successfully remove the remaining objects. If necessary, increase the wait times for specific resources in your Terraform settings.

Bug: 36352218

Version: 3.0.2

When Configuring BGP Authentication the Password Is a Required Parameter

When the appliance uplinks to the data center network are configured for dynamic routing, two Autonomous Systems – meaning the spine switches on the appliance side, and the ToR switches on the data center side – are set up as BGP (Border Gateway Protocol) peers. The sessions between the BGP peers can be protected with password-based authentication. BGP authentication can be enabled for the data network as well as the optional separate administration network.

You can set the BGP password using the setDay0DynamicRoutingParameters command in the Service CLI. Two command parameters must be provided for each network.

  • data network: bgpAuthentication=True and bgpPassword=<mypassword>

  • administration network: adminBgpAuthentication=True and adminBgpPassword=<mypassword>

However, the CLI command is accepted if you set BGP authentication to "true" without providing a password. This has no adverse effects, but BGP authentication remains disabled.

Workaround: When you enable BGP authentication on the data network, and the administration network if present, make sure you also specify the BGP password as part of the command parameters.

Bug: 35737959

Version: 3.0.2

Uplink VRRP Mesh Configuration Sets Second Switch IP Incorrectly

When you try to configure the appliance data and administration network uplinks in mesh topology with VRRP (Virtual Router Redundancy Protocol), the command results in a CLI error. The problem occurs when the spine switches' second IP address is configured: the switch interprets the parameters as overlapping network settings and rejects them.

The following example shows an administration network with a 4-port mesh uplink topology. The same behavior applies to data network uplinks.

PCA-ADMIN> edit networkConfig enableAdminNetwork=True adminportcount=4 admintopology=MESH adminportspeed=10 
adminspine1Ip=10.1.1.97,10.1.1.98 adminspine2Ip=10.1.1.101,10.1.1.102 adminSpineVip=10.1.1.105 [...]

PCA-ADMIN> show networkconfig
Data: 
  [...]
Error:
UpdateFirstBootHandler: {'http_status_code': 500, 'code': 'InternalServerError', 'message': 'SwitchCliError on 100.96.2.20: 
overlapping network for ipv4 address: 10.1.1.98/28 on po46, 10.1.1.97/28 already configured on po45\\n for cmd [...]

Workaround: There is no workaround. This specific uplink configuration cannot be applied at this time.

Bug: 36063880

Version: 3.0.2

Real Application Cluster (RAC) Environment Loses Access to Oracle Exadata Database Instances During Appliance Upgrade

Oracle Real Application Cluster (RAC) environments are typically deployed with application servers running as compute instances on the Private Cloud Appliance, and database clusters on a directly connected Oracle Exadata system. When the appliance software is upgraded or patched, the RAC listener services can experience failures, making the database clusters inaccessible on the Exadata network.

If the RAC listener service is down, the Exadata database node is inaccessible for applications making new connections. However, existing sessions are not interrupted as long as another database node remains online.

Workaround: In most cases the Oracle Clusterware Agent recovers the listener service automatically after the outage caused by the appliance software upgrade. If not, a manual restart of the listener service is required to restore database access on a node. Use the lsnrctl utility from the Grid home as the Grid user.

Bug: 36446341

Version: 3.0.2