8.4.3.3 Patchmgr Syntax for RoCE Network Fabric Switches

You can use patchmgr to update software for RoCE Network Fabric switches.

Prerequisites

Patchmgr is run on the "driving system", which is an Oracle Exadata database server or a non-Oracle Exadata system running Oracle Linux. This allows patchmgr to run from a central server to update multiple Oracle Exadata systems.

Note:

Prior to Oracle Exadata System Software release 19.3.9, you must run patchmgr as a non-root user for patching RoCE Network Fabric switches.

Patchmgr Syntax for RoCE Network Fabric Switches


./patchmgr --roceswitches [roceswitch_list_file] 
    { --upgrade [--roceswitch-precheck] [--unkey] [--force] |
      --downgrade [--roceswitch-precheck] [--unkey] [--force] |
      --apply-config [--unkey] [--force] |
      --verify-config [ --newswitchlist new_list_file ] [--unkey] }
    [ -log_dir { absolute_path_to_log_directory | AUTO } ]

Main Arguments

Argument Description
--roceswitches [roceswitch_list_file]

Specifies that patchmgr is acting on the RoCE Network Fabric switches.

If specified, the switch list file identifies the RoCE Network Fabric switches.

In its simplest form, the file has one switch host name or IP address on each line. In this case, each switch is assumed to be a leaf switch in a single-rack Exadata environment. For example:

rack1sw-rocea0
rack1sw-roceb0

To specify the configuration type for each switch, append a colon (:) and tag to each switch host name or IP address in the switch list file. The following tags are supported:

  • leaf - Identifies a leaf switch in a single rack system. This configuration type is assumed if no tag is specified.
  • mspine - Identifies a spine switch. Note that one spine switch configuration supports all spine switches on single and multi-rack systems, with and without Exadata Secure RDMA Fabric Isolation.
  • mleaf - Identifies a leaf switch in a multi-rack X8M system.
  • sfleaf - Identifies a leaf switch in a single rack system that is enabled to support Exadata Secure RDMA Fabric Isolation.
  • msfleaf - Identifies a leaf switch in a multi-rack X8M system that is enabled to support Exadata Secure RDMA Fabric Isolation.
  • leaf23 - Identifies a leaf switch in a single rack system that is configured with 23 host ports. This configuration is required only for 8-socket systems (X8M-8 and later) with 3 database servers and 11 storage servers.
  • mleaf23 - Identifies a leaf switch in a multi-rack system that is configured with 23 host ports. This configuration is required only for 8-socket X8M-8 systems with 3 database servers and 11 storage servers.
  • mleaf_u14 - Identifies a leaf switch in a multi-rack system that is configured with 14 inter-switch links. This is the typical multi-rack leaf switch configuration for X9M and later model systems.
  • msfleaf_u14 - Identifies a leaf switch in a multi-rack system that is enabled to support Exadata Secure RDMA Fabric Isolation and is configured with 14 inter-switch links. This configuration is required for X9M and later model systems with Secure Fabric enabled.
  • mleaf23_u13 - Identifies a leaf switch in a multi-rack system that is configured with 23 host ports and 13 inter-switch links. This configuration is required only for 8-socket X9M-8 systems with three database servers and 11 storage servers.

For example:

rack1sw-rocea0:leaf
rack1sw-roceb0:leaf

For multi-rack configurations only, also specify a unique loopback octet for each switch.

The loopback octet is the last octet of the switch loopback address, which uniquely identifies a switch.

To specify the loopback octet for each switch, append a period (.) and numeric loopback octet value to each tagged switch entry in the switch list file.

Caution:

Every switch in a multi-rack configuration must have a unique loopback octet. If multiple switches use the same loopback octet, the RoCE Network Fabric cannot function correctly, resulting in a system outage.

For the leaf switches, start with 101 as the first loopback octet value and increment as follows:

  • 101 - Rack 1 lower leaf switch (rack1sw-rocea0 in the following example)

  • 102 - Rack 1 upper leaf switch (rack1sw-roceb0 in the following example)

  • 103 - Rack 2 lower leaf switch (rack2sw-rocea0 in the following example)

  • 104 - Rack 2 upper leaf switch (rack2sw-roceb0 in the following example)

  • 105 - Rack 3 lower leaf switch

  • 106 - Rack 3 upper leaf switch, and so on.

For the spine switches, start with 201 as the first loopback octet value and increment as follows:

  • 201 - Rack 1 spine switch (rack1sw-roces0 in the following example)

  • 202 - Rack 2 spine switch (rack2sw-roces0 in the following example)

  • 203 - Rack 3 spine switch

  • 204 - Rack 4 spine switch, and so on.

For example, the switch list file for a 2-rack Exadata X9M system might contain:

rack1sw-rocea0:mleaf_u14.101
rack1sw-roceb0:mleaf_u14.102
rack1sw-roces0:mspine.201
rack2sw-rocea0:mleaf_u14.103
rack2sw-roceb0:mleaf_u14.104
rack2sw-roces0:mspine.202

If no file name is provided, then the command acts on all RoCE Network Fabric switches discovered from the host that is running patchmgr.

--upgrade

Upgrade the firmware on the RoCE Network Fabric switches.

If required, this option also installs or upgrades the client software component that propagates switch alerts to Oracle Auto Service Request (ASR).

Note: Commencing with the August 2022 patchmgr release, patchmgr performs an additional series of checks on the RoCE Network Fabric. The checks occur immediately before any firmware upgrade and also during prerequisite checking using the --roceswitch-precheck option. These checks mitigate the risks of failure associated with unexpected problems in the RoCE Network Fabric. For example, if one of the RoCE Network Fabric ports on a storage server is down, the storage server would become unavailable if the switch connected to the only operational port is taken offline for an upgrade. If any check fails, patchmgr reports the problem and ends immediately. In this case, you must correct the problem with the RoCE Network Fabric before you can perform the upgrade.

--downgrade

Downgrade the firmware on the RoCE Network Fabric switches.

Note: Commencing with the August 2022 patchmgr release, patchmgr performs an additional series of checks on the RoCE Network Fabric. The checks occur immediately before any firmware downgrade and also during prerequisite checking using the --roceswitch-precheck option. These checks mitigate the risks of failure associated with unexpected problems in the RoCE Network Fabric. For example, if one of the RoCE Network Fabric ports on a storage server is down, the storage server would become unavailable if the switch connected to the only operational port is taken offline for an upgrade. If any check fails, patchmgr reports the problem and ends immediately. In this case, you must correct the problem with the RoCE Network Fabric before you can perform the downgrade.

--apply-config

Applies the golden configuration template to each switch.

This option relies on tags in the switch list file, which specify the configuration type for each switch.

--verify-config [ --newswitchlist new_list_file ]

Verify each switch configuration against the golden configuration.

Verification is performed automatically when using --upgrade or --downgrade. You can use this option to perform verification as a standalone operation.

If specified, the --newswitchlist option generates a new switch list file with entries that match the current configuration of each switch.

Supported Options

The following options are supported for RoCE Network Fabric switch configuration and firmware update:

Table 8-5 Patchmgr Options for RoCE Network Fabric Switches

Option Description
--roceswitch-precheck Performs switch firmware upgrade or downgrade simulation on the RoCE Network Fabric switches in the list file but does not perform the actual install. Use this option with --upgrade or --downgrade.
--unkey

In conjunction with --upgrade or --downgrade, this option removes the configuration settings that enable passwordless SSH access to the RoCE Network Fabric switch.

--force

In conjunction with --upgrade or --downgrade, this option proceeds with the upgrade or downgrade even if the switch is already on target firmware version or the RoCE Network Fabric switch is experiencing non-critical failures.

In conjunction with --apply-config, this option bypasses the check that determines if the current switch configuration matches the configuration type specified in the switch list file.

-log_dir ( absolute_path_to_log_directory | AUTO )

When running patchmgr as a non-root user, use -log_dir to specify the absolute path to the log directory or use the keyword AUTO. If you specify AUTO, then patchmgr generates and sets the path to the log directory based on the directory patchmgr is launched from and the content of the nodes list file.

Note: Specifying -log_dir enables multiple patch manager invocations and is required when running patch manager as a non-root user.

Usage Notes

  • Starting with Oracle Exadata System Software release 19.3, the options are prefixed with --. Prior to this release, the options were prefixed with -.

Example 8-9 Using patchmgr to Upgrade Firmware on RoCE Network Fabric Switches

This example runs the upgrade prerequisite checks on all detected switches, then upgrades the switches.

$ ./patchmgr --roceswitches --upgrade --roceswitch-precheck

$ ./patchmgr --roceswitches --upgrade

Example 8-10 Using patchmgr to Apply Golden Configurations to RoCE Network Fabric Switches

This example applies golden configuration settings to RoCE Network Fabric switches as specified in the switches.lst file. Logs for the operation are written to /tmp/switchlogs.

$ cat switches.lst
switch456-rocea0:leaf          
switch456-roceb0:leaf          

$ ./patchmgr --roceswitches switches.lst --apply-config –log_dir /tmp/switchlogs