Networking Issues

This section describes known issues and workarounds related to all aspects of the appliance networking, meaning the system's internal connectivity, the external uplinks to the data center, and the virtual networking for users' compute instances.

Possible Impact to BGP Links After Upgrading to 3.0.2-b1081557 or Higher

When running BGP in a Mesh configuration, you may experience a situation where BGP links show an IDLE state or never connected, when upgrading to 3.0.2-b1010555 or later. If you are using BGP in a Mesh configuration and are currently running a release prior to 3.0.2-b1010555 then contact Oracle support, who can assist in updating and correcting your uplink configuration prior to upgrade. If an upgrade to 3.0.2-b1010555 or later has already been performed and you see BGP link state as IDLE then contact Oracle support who can also assist, post upgrade.

Bug: 36525352

Version: 3.0.2

DNS Zone Scope Cannot Be Set

When creating or updating a DNS zone, scope cannot be set. In command line output, the value of the scope property is null.

Bug: 32998565

Version: 3.0.1

To Update a DNS Record the Command Must Include Existing Protected Records

When updating a DNS record, it is expected that you include all existing protected records in the update command even if your update does not affect those. This requirement is intended to prevent the existing protected records from being inadvertently deleted. However, the checks are so restrictive with regard to SOA records that certain updates are difficult to achieve.

Workaround: It is possible to update existing records by either providing the SOA record as part of the command, or by setting the domain to not include the SOA domain. In practice, most record updates occur at a higher level and are not affected by these restrictions.

Bug: 33089111

Version: 3.0.1

Fix available: Please apply the latest patches to your system.

Create Route Table Fails With Confusing Error Message

When you create a route table, but make a mistake in the route rule parameters, the API server may return an error message that is misleading. That specific message reads: "Route table target should be one of LPG, NAT gateway, Internet gateway, DRG attachment or Service gateway." In that list of possible targets, DRG attachment is not correct. The dynamic routing gateway itself should be specified as a target, not its DRG attachment.

Workaround: Ignore the error message in question. When configuring route rules to send traffic through a dynamic routing gateway, specify the DRG as the target.

Bug: 33570320

Version: 3.0.1

Networking Service Not Responding While Within Resource Limits

The configuration limits for networking resources, when running appliance software version 3.0.2-b1185392 or earlier, allow setups that exceed the capabilities of the Networking Service architecture. In particular, the theoretical combination of maximum 80 VCNs and 40 subnets each, generates considerably more data flows than the underlay network can manage.

Consequently, even though your configuration is within the resource limits, the Networking Service might show signs of excessive load. Its service pods might run out of memory and crash, causing internal server errors when performing networking operations, even when requesting a list of resources.

Workaround: The internal server errors are transient, and the Networking Service is expected to resume normal operation. If this problem persists, ask Oracle for assistance.

To avoid the issue, stay within the updated resource limits documented in the release notes chapter Service Limits. These stricter limits are enforced by the appliance software as of version 3.0.2-b1261765.

Bug: 36906198

Version: 3.0.2

File Storage Traffic Blocked By Security Rules

To allow users to mount file systems on their instances, security rules must be configured in addition to those in the default security list, in order to allow the necessary network traffic between mount targets and instances. Configuring file storage ports and protocols in Oracle Private Cloud Appliance is further complicated by the underlay network architecture, which can block file storage traffic unexpectedly unless the source and destination of security rules are set up in a very specific way.

Scenario A – If the mount target and instances using the file system service reside in the same subnet, create a security list and attach it to the subnet in addition to the default security list. The new security list must contain the following stateful rules:

+++ Ingress Rules ++++++++++++++++++++

Source            Protocol     Source Ports            Destination Ports
------            --------     ------------            -----------------
<subnet CIDR>     TCP          All                     111, 389, 445, 4045,
                                                       2048-2050, 20048
<subnet CIDR>     UDP          All                     111, 289, 445, 2048,
                                                       4045, 20048

+++ Egress Rules ++++++++++++++++++++

Destination       Protocol     Source Ports            Destination Ports
-----------       --------     ------------            -----------------
<subnet CIDR>     TCP          111, 389, 445, 4045,    All
                               2048-2050, 20048
<subnet CIDR>     TCP          All                     111, 389, 445, 4045,
                                                       2048-2050, 20048
<subnet CIDR>     UDP          111, 389, 445,          All
                               4045, 20048
<subnet CIDR>     UDP          All                     111, 389, 445,
                                                       4045, 20048

Scenario B – If the mount target and instances using the file system service reside in different subnets, create a new security list for each subnet, and attach them to the respective subnet in addition to the default security list.

The new security list for the subnet containing the mount target must contain the following stateful rules:

+++ Ingress Rules ++++++++++++++++++++

Source                        Protocol     Source Ports            Destination Ports
------                        --------     ------------            -----------------
<instances subnet CIDR>       TCP          All                     111, 389, 445, 4045,
                                                                   2048-2050, 20048
<instances subnet CIDR>       UDP          All                     111, 289, 445, 2048,
                                                                   4045, 20048

+++ Egress Rules ++++++++++++++++++++

Destination                   Protocol     Source Ports            Destination Ports
-----------                   --------     ------------            -----------------
<instances subnet CIDR>       TCP          111, 389, 445, 4045,    All
                                           2048-2050, 20048
<instances subnet CIDR>       UDP          111, 389, 445,          All
                                           4045, 20048

The new security list for the subnet containing the instances using the file system service must contain the following stateful rules:

+++ Ingress Rules ++++++++++++++++++++

Source                        Protocol     Source Ports            Destination Ports
------                        --------     ------------            -----------------
<mount target subnet CIDR>    TCP          111, 389, 445, 4045,    All
                                           2048-2050, 20048
<mount target subnet CIDR>    UDP          111, 289, 445, 2048,    All
                                           4045, 20048

+++ Egress Rules ++++++++++++++++++++

Destination                   Protocol     Source Ports            Destination Ports
-----------                   --------     ------------            -----------------
<mount target subnet CIDR>    TCP          All                     111, 389, 445, 4045,
                                                                   2048-2050, 20048
<mount target subnet CIDR>    UDP          All                     111, 389, 445,
                                                                   4045, 20048

Workaround: Follow the guidelines provided here to configure ingress and egress rules that enable file system service traffic. If the unmodified default security list is already attached, the proposed egress rules do not need to be added, because there already is a default stateful security rule that allows all egress traffic (destination: 0.0.0.0/0, protocol: all).

Bug: 33680750

Version: 3.0.1

Stateful and Stateless Security Rules Cannot Be Combined

The appliance allows you to configure a combination of stateful and stateless security rules in your tenancy. The access control lists generated from those security rules are correct, but may cause a wrong interpretation in the virtual underlay network. As a result, certain traffic may be blocked or allowed inadvertently. Therefore, it is recommended to use either stateful or stateless security rules.

Workaround: This behavior is expected; it is not considered a bug. Whenever possible, create security rules that are either all stateful or all stateless.

Note:

If you have a specific need, you can have stateful and stateless rules combined, but if you use stateless rules they must be symmetrical, meaning you cannot have a stateless egress rule, and a stateful ingress rule for the same flow.

Bug: 33744232

Version: 3.0.1

Routing Failure With Public IPs Configured as CIDR During System Initialization

When you complete the initial setup procedure on the appliance (see "Complete the Initial Setup" in the chapter Configuring Oracle Private Cloud Appliance of the Oracle Private Cloud Appliance Installation Guide), one of the final steps is to define the data center IP addresses that will be assigned as public IPs to your cloud resources. If you selected BGP-based dynamic routing, the public IPs may not be advertised correctly when defined as one or more CIDRs, and thus may not be reachable from outside the appliance.

Workaround: To ensure that your cloud resources' public IPs can be reached from outside the appliance, specify all IP addresses individually with a /32 netmask. For example, instead of entering 192.168.100.0/24, submit a comma-separated list: 192.168.100.1/32,192.168.100.2/32,192.168.100.3/32,192.168.100.4/32, and so on.

Bug: 33765256

Version: 3.0.1

Fix available: Please apply the latest patches to your system.

Admin Network Cannot Be Used for Service Web UI Access

The purpose of the (optional) Administration network is to provide system administrators separate access to the Service Web UI. The current implementation of the Administration network is incomplete and cannot provide the correct access.

Workaround: None available. At this point, do not configure the Admin Network during initial configuration.

Bug: 34087174, 34038203

Version: 3.0.1

Network Configuration Fails During Initial Installation Procedure

After physical installation of the appliance rack, the system must be initialized and integrated into your data center environment before it is ready for use. This procedure is documented in the chapter titled "Configuring Oracle Private Cloud Appliance" of the Oracle Private Cloud Appliance Installation Guide. If the network configuration part of this procedure fails – for example due to issues with message transport or service pods, or errors returned by the switches – there are locks in place that need to be rolled back manually before the operation can be retried.

Workaround: None available. Please contact Oracle for assistance.

If possible, confirm the state of the network configuration from the Service CLI.

PCA-ADMIN> show networkConfig                                                                                  
Data:
[...]
  Network Config Lifecycle State = FAILED

Bug: 34788596

Version: 3.0.2

External Certificates Not Allowed

At this time, Oracle Private Cloud Appliance does not allow the use of external CA-signed certificates.

Workaround: Please contact Oracle support for a workaround.

Bug: 33025681

Version: 3.0.2

DNS Entries on Oracle Linux 8 Instances Incorrect After Upgrade to Release 3.0.2

After the appliance software is upgrade to Release 3.0.2, the name resolution settings in the compute instance operating system are not automatically updated. Up-to-date network parameters are obtained when the instance's DHCP leases are renewed. Until then, due to the way Oracle Linux 8 responds to DNS server messages, it can fail to resolve short host names although queries with FQDNs are successful. Oracle Linux 7 instances are not affected by this issue.

Workaround: Restart the DHCP client service (dhclient) on the command line of your Oracle Linux 8 instances. Rebooting the instance also resolves the issue.

Bug: 34918899

Version: 3.0.2

Network Load Balancer Does Not Report Detailed Backend Health Status

Users of Oracle Cloud Infrastructure might be familiar with the detailed health statuses it provides for backend servers of network load balancers. In case a backend server is not entirely healthy, the health check status provides an indication of the problem, for example: connection failure, time-out, regex mismatch, I/O error, invalid status code. Due to the specific load balancer implementation in Oracle Private Cloud Appliance, the Network Load Balancer service can only report whether a backend server is healthy (OK) or unhealthy (CRITICAL).

Workaround: There is no workaround. Backend health checks cannot provide extra status information.

Bug: 35993214

Version: 3.0.2

Load Balancer Backend Health Check Configuration Locked Due to Unsupported Character

On systems running appliance software version 3.0.2-b1325160 or earlier, a load balancer can be configured to parse regular expressions (regex) in health status responses sent by the backend servers. When configuring the response-body-regex parameter, it is possible to include the forward slash ('/') escape character. However, this character leads to an invalid json configuration file, which prevents you from making further configuration changes, or removing the invalid character.

This issue also blocks the appliance software upgrade to version 3.0.2-b1392231.

Workaround: Do not use the forward slash ('/') escape character in a response body regex. Otherwise, the entire load balancer setup will need to be deleted.

Bug: 37795379

Version: 3.0.2

Load Balancer Functional Changes After Appliance Software Upgrade

With version 3.0.2-b1392231 of the Private Cloud Appliance software, load balancers are migrated to a new background implementation. As a result, a few features are either different or no longer available. An existing configuration that is no longer supported in the new implementation, can have a negative impact on the appliance software upgrade.

Response body regex parsing

If you have a load balancer configured with regular expression (regex) parsing of backend responses for health status information, that will no longer work after the upgrade. Health status reporting is limited to response codes.

Workaround: Before upgrading the appliance software to version 3.0.2-b1392231, unconfigure the optional regex setting (--response-body-regex) for the response from the backend servers.

Bug: 37629014

Cipher suites

In the new load balancer implementation, weaker cipher suites have been removed. Going forward, SSL/TLS connections can be secured with these cipher suites:

AES128-GCM-SHA256, AES256-GCM-SHA384, 
ECDHE-ECDSA-AES128-GCM-SHA256, ECDHE-ECDSA-AES256-GCM-SHA384, 
ECDHE-RSA-AES128-GCM-SHA256, ECDHE-RSA-AES256-GCM-SHA384, 
AES128-SHA, AES256-SHA, DES-CBC3-SHA, 
ECDHE-ECDSA-AES128-SHA, ECDHE-ECDSA-AES256-SHA, 
ECDHE-RSA-AES128-SHA, ECDHE-RSA-AES256-SHA, 
PSK-AES128-CBC-SHA, PSK-AES256-CBC-SHA

Workaround: Before upgrading the appliance software to version 3.0.2-b1392231, ensure that cipher suites are used that remain available after upgrade. If necessary, change the existing load balancer configurations.

Bug: 37461876

Cookie-based session persistence

For existing load balancers, session persistence between clients and backend servers can be enabled using either application cookies or load balancer cookies. These are no longer supported after upgrade.

Workaround: Before upgrading the appliance software to version 3.0.2-b1392231, unconfigure cookie-based session persistence. Alternatively, load balancer cookies can be preserved on condition that the load balancing policy is set to IP hash before upgrade.

Bug: 37473362

Server order preference

The SSL parameter to prioritize server ciphers over client ciphers is not supported.

Connectivity to All Instances Lost after Appliance Upgrade

In large multitenant network configurations using BGP (Border Gateway Protocol) with layer 3 virtualization, route leaking allows routes to be shared in a controlled way between routing tables that are otherwise isolated through virtual routing and forwarding (VRF). However, after upgrading to appliance software version 3.0.2-b1261765, the number of routes that can be leaked between VRFs is restricted to 1000. If the existing route map before the upgrade is larger, certain routes are dropped and connectivity is lost. For example, traffic to and from compute instances might be blocked because the required routes are no longer available.

Workaround: Before upgrading the appliance software to version 3.0.2-b1261765, ensure that the number of routes imported from one VRF to another does not exceed 1000. If pruning the route list is not sufficient, a default route can be configured. If needed, Oracle can provide guidance for this particular upgrade scenario.

To check VRF information for BGP routes advertised between the appliance and the data center, the following command can be used on the spine switches:

# show ip route summary vrf default
IP Route Table for VRF “default”
Total number of routes: 1626
Total number of paths:  3192
Unicast paths:
Best paths per protocol:      Backup paths per protocol:
  urib_internal  : 16           static         : 1
  am             : 11           bgp-65410      : 2
  local          : 3
  direct         : 3
  static         : 11
  broadcast      : 9
  nat            : 5
  hsrp           : 1
  bgp-65410      : 3130

Bug: 37628459

Version: 3.0.2

Route Table Stuck in Provisioning State Failure

When updating a route table that is associated as an attachment to a Dynamic Routing Gateway (DRG) to have a Local Peering Gateway (LPG) as a target, this known issue can leave the route table stuck in the provisioning state:

{
  "timestamp": "2023-06-28T15:30:58.635+0000",
  "rid":
"7FCCBAEBA62848878983FDA3098EE4DB/330fc100-b86f-4137-a1f9-2437a512b8e8/7b003c9
7-11c4-4e4a-8c9a-11861532db0d,
  "process": 1,
  "ocid": null,
  "levelname": "ERROR"
  "src_lineno": 481
  "src_pathname": "/usr/lib/python3.6/site-packages/pcanwctl/framework.py",
  "message::
  "Exception on function call: update_route_table, error: (404,
NotAuthorizedOrNotFound', 'No Subnet was found'), start exception rollback",
  "tag": "pca-nwctl.log"
}

Workaround: Delete and recreate the route table to avoid the error in the update routine.

Bug: 35547644

Version: 3.0.2

Updating Route Table Using Terraform Fails Because DRG Is Not Attached

When deploying network resources with Terraform, it may occur that a route table cannot be updated because an expected Dynamic Routing Gateway (DRG) attachment appears not to exist. Although the DRG is attached to the VCN, the operation is not fully completed when the command is issued to update the route table. The quick succession of commands through Terraform can reveal this timing issue, but it is highly unlikely to occur as a result of human user actions.

Workaround: Assuming the route table update failure is the result of a timing issue, repeating the route table update command is expected to succeed. Reapply the Terraform configuration or update the route table manually.

Bug: 36297777

Version: 3.0.2

Failure Executing Terraform Destroy Due to Route Table in Provisioning State

When you run a terraform destroy operation, it might fail because a route table object is still in 'provisioning' state instead of 'available'. This typically occurs when many updates are made to a route table in a short amount of time, resulting in commands taking longer to complete than expected by the Terraform provider. Strictly speaking, this is not a bug but rather a timing issue.

Workaround: Assuming the failure is the result of a timing issue, no route table or other resource is permanently stuck in 'provisioning' state. Repeating the terraform destroy command is expected to successfully remove the remaining objects. If necessary, increase the wait times for specific resources in your Terraform settings.

Bug: 36352218

Version: 3.0.2

When Configuring BGP Authentication the Password Is a Required Parameter

When the appliance uplinks to the data center network are configured for dynamic routing, two Autonomous Systems – meaning the spine switches on the appliance side, and the ToR switches on the data center side – are set up as BGP (Border Gateway Protocol) peers. The sessions between the BGP peers can be protected with password-based authentication. BGP authentication can be enabled for the data network as well as the optional separate administration network.

You can set the BGP password using the setDay0DynamicRoutingParameters command in the Service CLI. Two command parameters must be provided for each network.

  • data network: bgpAuthentication=True and bgpPassword=<mypassword>

  • administration network: adminBgpAuthentication=True and adminBgpPassword=<mypassword>

However, the CLI command is accepted if you set BGP authentication to "true" without providing a password. This has no adverse effects, but BGP authentication remains disabled.

Workaround: When you enable BGP authentication on the data network, and the administration network if present, make sure you also specify the BGP password as part of the command parameters.

Bug: 35737959

Version: 3.0.2

Uplink VRRP Mesh Configuration Sets Second Switch IP Incorrectly

When you try to configure the appliance data and administration network uplinks in mesh topology with VRRP (Virtual Router Redundancy Protocol), the command results in a CLI error. The problem occurs when the spine switches' second IP address is configured: the switch interprets the parameters as overlapping network settings and rejects them.

The following example shows an administration network with a 4-port mesh uplink topology. The same behavior applies to data network uplinks.

PCA-ADMIN> edit networkConfig enableAdminNetwork=True adminportcount=4 admintopology=MESH adminportspeed=10 
adminspine1Ip=10.1.1.97,10.1.1.98 adminspine2Ip=10.1.1.101,10.1.1.102 adminSpineVip=10.1.1.105 [...]

PCA-ADMIN> show networkconfig
Data: 
  [...]
Error:
UpdateFirstBootHandler: {'http_status_code': 500, 'code': 'InternalServerError', 'message': 'SwitchCliError on 100.96.2.20: 
overlapping network for ipv4 address: 10.1.1.98/28 on po46, 10.1.1.97/28 already configured on po45\\n for cmd [...]

Workaround: There is no workaround. This specific uplink configuration cannot be applied at this time.

Bug: 36063880

Version: 3.0.2

Failure Disabling the Segregated Administration Network

After upgrading the appliance software to 3.0.2-b1261765, disabling the separate administration network results in an error because the spine switch configuration cannot be updated. This is caused by a missing parameter in the update request that is sent to the spine switches.

Workaround: If you need to disable the segregated administration network on an appliance where this error occurs, change the BGP topology setting as shown below, after the appliance software upgrade or after you have enabled the separate admin network.

PCA-ADMIN> edit networkconfig adminBgpTopology=triangle

Bug: 37618380

Version: 3.0.2

Editing Rack Network Configuration Fails When Spine Switch Settings Are Not Changed

Changing the rack network configuration, without any other parameters that trigger a switch configuration update, results in a failure because the internal command contains empty parameters. The network remains fully functional but the configuration changes are not applied.

Workaround: To update certain network configuration parameters – such as DNS settings, NTP settings, and public IP addresses – add another change to the network configuration update command. After a successful update, revert the additional change. In the following example, we include a change to the adminmtu parameter when updating the DNS settings, then revert it back to its original value.

PCA-ADMIN> edit networkconfig dnsip1=10.202.192.16 dnsip2=192.0.2.32 adminmtu=9200
  JobId: 1c9bbfca-7566-4e06-a6e3-8185c512cf92
  Data: created job for edit network

PCA-ADMIN> edit networkconfig adminmtu=9216

Bug: 37311414

Version: 3.0.2

Instances Unreachable From Flex Network with Static Gateway After Upgrade

After upgrading the appliance software to 3.0.2-b1392231, any Flex network that uses a static gateway stops functioning correctly. Any new Flex network using a static gateway that is created after an upgrade to build 3.0.2-b1392231, does not allow network traffic.

Workaround: A security change was made in this release that interferes with flex network static gateway traffic. To re-enable this network access, follow the directions in Oracle Support Document 3093595.1 ([PCA 3.x] Flex Networks with Gateways will break after upgrade to 3.0.2-b1392231)

Bug: 38113437

Version: 3.0.2