4 Maintaining Other Oracle Exadata Components

Besides the database and storage servers, Oracle Exadata contains other components, such as power distribution units, ILOM, and network switches.

Note:

  • All procedures in this chapter are applicable to Oracle Exadata and Oracle Exadata Storage Expansion Rack.
  • For ease of reading, the name "Oracle Exadata Rack" is used when information refers to both Oracle Exadata and Oracle Exadata Storage Expansion Rack.

4.1 Replacing a Power Distribution Unit

Power distribution units (PDUs) can be replaced while Oracle Exadata Rack is online. PDU-A is on the left, and PDU-B is on the right when viewing the rack from the rear.

4.1.1 Reviewing the PDU Replacement Guidelines

Before replacing a PDU, review the following guidelines to ensure the procedure is safe and does not disrupt availability.

  • Unlatching the RDMA Network Fabric cables while removing or inserting PDU-A may cause a loss of service due to nodes being removed from the cluster. This could cause the rack to be unavailable. Care should be taken when handling the RDMA Network Fabric cables, which are normally latched securely. Do not place excessive tension on the RDMA Network Fabric cables by pulling them.

  • Unhooking the wrong power feeds causes the rack to shut down. Trace the power cables running from the PDU that will be replaced to the power source, and only unplug those feeds.

  • Allow time to unpack and repack the PDU replacement parts. Note how the power cords are coiled in the packaging so the failed unit can be repacked the same way.

  • Removal of the side panel lessens the amount of time needed to replace the PDU. However, it is not necessary to remove the side panel to replace the PDU.

  • Use of a cordless drill or power screwdriver lessens the amount of time needed to replace the PDU. Allow more time for the replacement if using the hand wrench tool provided with the replacement rack. If using a screwdriver, then ensure that there are Torx T30 and T25 bits.

  • It may be necessary to remove the server cable arms to move the power cables. If that is the case, then twist the plug connection and flex the cable arm connector to avoid having to unclip the cable arm. If it is necessary to unclip the cable arm, then support the cables with one hand, remove the power cord, and then clip the cable arm. Do not leave the cable arm hanging.

  • When removing the T30 screws from the L-bracket, do not remove the T25 screws or nuts that attach the PDU to the bracket until the PDU is out of the rack.

4.1.2 Replacing a PDU

This procedure describes how to replace a power distribution unit (PDU).

  1. Use the PDU monitor as follows to identify its network settings, if it is not the reason for the PDU replacement:
    1. Press the reset button for 20 seconds until it starts to count from 5 to 0. While it is counting down, release the button, and then press it once.
    2. Record the network settings, firmware version, and so on, displayed on the LCD screen as the monitor restarts.

      Note:

      If the PDU monitor is not working, then retrieve the network settings by connecting to the PDU over the network, or from the network administrator.
  2. Turn off all the PDU breakers.
  3. Unplug the PDU power plugs from the AC outlets.

    Note:

    • If the power cords use overhead routing, then put the power plugs in a location where they will not fall or hit anyone.
    • If the rack is on a raised floor, then move the power cords out through the floor cutout. It may be necessary to maneuver the rack over the cutout in order to move the power cords out.
  4. Do the following procedure for a PDU-B replacement when there is not side panel access, and the rack does not have an InfiniBand cable harness:

    Note:

    Do not unstrap any cables attached to the cable arms.
    1. Unscrew the T25 screws holding the square cable arms to the rack.
    2. Move the RDMA Network Fabric cables to the middle, out of the way.
  5. Unplug all power cables going from the servers and switches to the PDU. Keep the power cables together in group bundles.
  6. Remove the T30 screws from the top and bottom of the L-bracket, and note where the screws go.
  7. Note where the PDU sits in the rack frame. It is usually 1 inch back from the rack frame to allow access to the breaker switches.
  8. Angle and maneuver the PDU out of the rack.
  9. Hold the PDU or lay it down, if there is enough room, while maneuvering the AC power cords through the rack. It may be necessary to cut the cable ties that hold the AC cord flush with the bottom side of the PDU.
  10. Pull the cords as near to the bottom or top of the rack as possible where there is more room between the servers to get the outlet plug through the routing hole.
  11. Remove the smaller Torx T25 screws, and loosen the nut on the top and bottom to remove the PDU from the L-bracket. The nut does not have to be removed.
  12. Attach the L-bracket to the new PDU.
  13. Lay the new PDU next to the rack.
  14. Route the AC cords through the rack, and to where the outlets are.

    Note:

    Do not cable tie the AC cord to the new PDU at this time.
  15. Place the new PDU in the rack by angling and maneuvering it until the L-brackets sit on the top and bottom rails.
  16. Line up the holes and slots so that the PDU sits about 1 inch back from the rack frame.
  17. Attach the power cords using the labels on the cords as a guide.
    For example, G5-0 indicates PDU group 5 outlet 0 on the PDU.
  18. Attach the InfiniBand cable holders if they were removed in step 4.

    Oracle recommends screwing the holders in by hand at first to avoid stripping the screws.

  19. Attach the AC power cords to the outlets.
  20. Turn on the breakers.
  21. Cable and program the PDU monitor for the network, as needed.

4.2 Resetting a Non-Responsive ILOM

When Oracle Exadata System Software detects that the ILOM is unresponsive, it automatically resets the ILOM Service Processor.

Also, as a proactive measure, the ILOM is reset automatically every 90 days. To help predict the next automatic reset, you can retrieve the ILOM up-time by querying the ILOM directly or using ipmitool. For example:

  • Using ILOM:

    -> show /SP/clock uptime
    
    /SP/clock
    Properties:
    uptime = 54 days, 15:41:51
  • Using ipmitool:

    # ipmitool sunoem getval /SP/clock/uptime
    Target Value: 54 days, 15:41:51

You can also manually reset the ILOM Service Processor using various methods:

See Also:

Oracle Integrated Lights Out Manager (ILOM) Documentation at http://www.oracle.com/goto/ilom/docs

4.2.1 Resetting the ILOM Using SSH

The following procedure describes how to reset the ILOM by connecting to it using SSH:

  1. Connect to the ILOM using SSH from another machine.
  2. Enter the following command at the ILOM prompt:
    reset /SP
    

4.2.2 Resetting the ILOM Using the ILOM Remote Console

If it is not possible to connect to the ILOM using SSH, then log in to the ILOM remote console. The following procedure describes how to reset the ILOM using the remote console.

  1. Log in to the ILOM remote console.
  2. Select Reset SP from the Maintenance tab.
  3. Click Reset SP.

4.2.3 Resetting the ILOM Using IPMItool

If you could not connect to the ILOM using SSH or the remote console, then log in to the local host or another host on the ILOM network, and use IPMItool. The following procedure describes how to reset the ILOM using IPMItool:

  1. Log in to local host or another host on the ILOM network.
  2. Run the following IPMItool command:
    • Using local host:

      $ ipmitool mc reset cold
      Sent cold reset command to MC
      
    • Using another host:

      $ ipmitool -H ILOM_host_name -U ILOM_user mc reset cold
      Sent cold reset command to MC
      

      In the preceding command, ILOM_host_name is the host name being used, and ILOM_user is the user name for the ILOM.

4.2.4 Resetting the ILOM Using the SP Reset Pin on Oracle Exadata Database Machine X2-2 Servers and Exadata Storage Servers

If you could not connect to the ILOM using SSH, the remote console, or IPMItool on the Oracle Exadata Database Machine X2-2 server or Exadata Storage Server, then press the SP reset pin. The following procedure describes how to reset the ILOM using the SP reset pin.

  1. Obtain a small, non-conductive stick.
  2. Go to the rear of the rack.
  3. Locate the SP reset pin opening. The SP reset pin opening is the first opening to the right of the NET MGT port.
  4. Insert the stick into the opening and press the pin.

4.2.5 Removing the SP from Sun Fire X4800 Oracle Database Servers and Sun Server X2-8 Oracle Database Servers

If you could not reset the ILOM on the Sun Fire X4800 Oracle Database Server or Sun Server X2-8 Oracle Database Server using SSH, the remote console or IPMItool, then remove the service processor (SP) from the server, and put it back.

Messages are displayed at the operating system level. These messages can be ignored. The fans will speed up because there is no fan control.

4.2.6 Unplugging the ILOM Power Supply

If you could not reset to the ILOM using the preceding options, then unplug the power supply, and then plug it back in. This action power cycles the server as well as the ILOM.

4.3 Configuring Service Processor and ILOM Network Settings

The following procedure describes how to configure the service processor (SP) and ILOM network settings:

  1. Log in to the SP as the root user using SSH.
  2. Use the version command to check the SP/ILOM firmware release. The following is an example of the output from the command:
    -> version
    SP firmware 3.2.4.10
    SP firmware build number: 93199
    SP firmware date: Sat Oct  4 18:42:56 EDT 2014
    SP filesystem version: 0.2.10
    

    Note:

    The ipmitool can be used to log into the server SP. This is useful when the SP/ILOM is not accessible from the management network. The following command is used to connect to the SP:

    # ipmitool sunoem cli
    Connected. Use ^D to exit.
    -> version
    SP firmware 3.2.4.10
    SP firmware build number: 93199
    SP firmware date: Sat Oct  4 18:42:56 EDT 2014
    SP filesystem version: 0.2.10
    
  3. Configure the DNS server settings using the set command as follows:
    cd /SP/clients/dns/  
        /SP/clients/dns
    show
         /SP/clients/dns
            Targets:
            Properties:
                auto_dns = enabled
                nameserver = 0.0.0.0
                retries = 1
                searchpath =
                timeout = 5
            Commands:
                cd
                set
                show
    set nameserver=192.68.0.2
    set searchpath=yourdomain.com
    
  4. Configure the NTP server settings using the set command as follows.
    cd /SP/clients/ntp/server/1/
    /SP/clients/ntp/server/1
    show
     /SP/clients/ntp/server/1
        Targets:
        Properties:
            address = 0.0.0.0
        Commands:
            cd
            set
            show
    set address=192.68.0.1

    Note:

    Two NTP servers can be configured. Set the first NTP server using the set command, and then use the path SP/clients/ntp/server/2 to configure the second server.

  5. Use the set command to configure the network settings as follows:
    cd /SP/network
       /SP/network
    show
       /SP/network
        Targets:
            interconnect
            ipv6
            test
        Properties:
            commitpending = (Cannot show property)
            dhcp_clientid = none
            dhcp_server_ip = none
            ipaddress = 0.0.0.0
            ipdiscovery = dhcp
            ipgateway = 0.0.0.0
            ipnetmask = 0.0.0.0
            managementport = MGMT
            pendingipaddress = 0.0.0.0
            pendingipdiscovery = dhcp
            pendingipgateway = 0.0.0.0
            pendingipnetmask = 0.0.0.0
            pendingmanagementport = MGMT
            pendingvlan_id = (none)
            state = enabled
            vlan_id = (none)
        Commands:
            cd
            set
            show
    
  6. Configure the corresponding pendingip* settings for the ipaddress, ipdiscovery, ipgateway, ipnetmask, and vlan_id, and then commit the pending settings using the following command:
    set commitpending=true
    
  7. Disconnect from the command line interface after the network configuration is complete.

    Note:

    Use ^D to exit the session when using the ipmitool.

4.4 Verifying and Modifying the Link Speed on the Client Network Ports for X7 and Later Systems

You can configure 10 GbE connections or 25 GbE connections on the client network on Oracle Exadata X7 and later database servers.

Note:

You should configure the client network ports using Oracle Exadata Deployment Assistant (OEDA) during system deployment. See Using Oracle Exadata Deployment Assistant.

The following steps may be necessary to configure a client access port if the OEDA deployment was not performed or was performed incorrectly. You can also use these steps to change the client network from 10 GbE to 25 GbE, or from 25 GbE to 10 GbE.

  1. List the network interfaces on the system by using the following command:
    # ip link show
  2. To view details about a specific network interface, use the ethtool command and specify the network interface.

    For example:

    # ethtool ethx
  3. For each network interface (designated by ethx) that does not have the link detected, run the following commands:
    • For 10GbE network interfaces:
      # ifdown ethx
      # ethtool -s ethx 10000 duplex full autoneg off
      # ifup ethx
      # ethtool ethx

      For 10 Gb/s, you must use SFP+ transceivers; SFP28 transceivers do not support 10 Gb/s traffic.

    • For 25GbE network interfaces:
      # ifdown ethx
      # ethtool -s ethx 25000 duplex full autoneg off
      # ifup ethx
      # ethtool ethx
  4. Confirm that the output from the ethtool command shows yes for Link detected.
            Link detected: yes
  5. Edit the appropriate files in /etc/sysconfig/network-scripts, where x is the number associated with the network interface.
    1. Locate the /etc/sysconfig/network-scripts/ifcfg-ethx file. Add the following lines, if they are not already present in the file:
      • For 10 GbE network interfaces:

        ONBOOT=YES
        ETHTOOL_OPTS="speed 10000 duplex full autoneg off"
      • For 25 GbE network interfaces:

        ONBOOT=YES
        ETHTOOL_OPTS="speed 25000 duplex full autoneg off"
    2. Repeat the previous step for all network interfaces that do not have the ETHTOOL_OPTS setting in the associated ifcfg-ethx file and are connected to 10 GbE or 25 GbE switches.

    The network interface should now show the link as detected. These changes are persistent, and do not need to be repeated after a server reboot.

  6. Check the ILOM on each compute node to validate the LAN on Motherboard is properly configured to detect the 25 GbE transceiver.
    show /HOST/network
      /HOST/network
         Targets:
    
         Properties:
             active_media = none
             auto_media_detection = enabled
             current_active_media = (none)
    
         Commands:
             cd
             set
             show

    If the NIC is not working, change the active_media and current_active_media to the proper values:

    • For 25 GbE transceivers (Fiber or Copper) these parameters should be set to SPF28
    • For 10 GbE network using RJ-45 ended CAT6 cables, these parameters should be set to RJ45

4.5 Verify the Link Speed on Network Ports

Ensure you are using the correct link speed for Oracle Exadata Database Machine X7-2 compute nodes.

On Oracle Exadata Database Machine X7-2 compute nodes you might experience issues when connecting to 10GbE switches. These issues include links not being detected or being unable to connect to the gateway.

Resolving 10GbE Network Speed Configuration on Client Network Ports

  1. Log in as the root user.
  2. Use the cat command to review the /proc/net/bonding/bondeth0 file.
  3. For each 10GbE network interface (designated by x) that does not have the link detected, run the following commands:
    # ifdown ethx
    # ethtool -s ethx 10000 duplex full autoneg off
    # ifup ethx
    # ethtool ethx
  4. Confirm that the output from the ethtool command shows yes for Link detected.
            Link detected: yes
  5. Edit the appropriate files in /etc/sysconfig/network-scripts, where x is the number associated with the network interface.
    1. Locate the /etc/sysconfig/network-scripts/ifcfg-ethx file. Add the following line, if it is not already present in the file:
      ETHTOOL_OPTS="speed 10000 duplex full autoneg off"
    2. Repeat the previous step for all network interfaces that do not have the ETHTOOL_OPTS setting in the associated ifcfg-ethx file and are connected to 10GbE switches.

    The network interface should now show the link as detected. These changes are persistent, and do not need to be repeated after a server reboot.

4.6 Changing from 1 GbE Connections to 10 GbE Connections

1 GbE network connections can be changed to 10 GbE connections.

This procedure applies to Oracle Exadata models X6 and earlier.

When changing the connections, note the following:

  • To prevent a single point of failure for a bonded 10 GbE interface on Oracle Exadata X2-8, use different ports on the Network Express Modules (NEMs) on the two cards, such as NEM0 NET1 and NEM1 NET0.

  • The 10 GbE interfaces are identified as eth4 and eth5 on Sun Fire X4170 M2 Oracle Database Servers, and as eth8 through eth15 on Sun Fire X4800 Oracle Database Servers. Oracle recommends using following on Oracle Exadata X2-8:

    • BONDETH0 using interfaces eth9 and eth15
    • 10 GbE NEM0(left)/NET1
    • 10 GbE NEM1(right)/NET3
  • Oracle Clusterware is shut down, and the database server is restarted during the procedure.

4.6.1 Task 1: Verify ping Functionality

Verify the functionality of the ping command before any changes using the following commands. By verifying the ping command before any changes, you know what is the results should be after changing the interfaces. Similar commands can be used to check other servers that connect to Oracle Exadata Database Machine.

# grep "^nameserver" /etc/resolv.conf
nameserver ip_address_1
nameserver ip_address_2

# ping -c 2 ip_address_1
PING ip_address_1 (ip_address_1) 56(84) bytes of data.
64 bytes from ip_address_1: icmp_seq=1 ttl=57 time=1.12 ms
64 bytes from ip_address_1: icmp_seq=2 ttl=57 time=1.05 ms
 
--- ip_address_1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 1.054/1.087/1.120/0.033 ms

If the test is not successful, showing 100% packet loss, then you should expect similar results when this same verification is run in "Task 4: Verify the 10 GbE Interfaces". If the test is successful, showing 0% packet loss, then you must see similar results after changing the 10 GbE connections.

4.6.2 Task 2: Back up the Current Interface Files

The following procedure describes how to back up the current interface files:

  1. Log in as the root user.
  2. Create hidden directories for the current and new 10 GbE files in the /etc/sysconfig/network-scripts directory similar to the following:
    # cd /etc/sysconfig/network-scripts
    # mkdir .Pre_10GigE_Settings
    # mkdir .Post_10GigE_Settings
    

    Note:

    Linux startup scripts search for files that begin with ifcfg, and assume files beginning with ifcfg are used for network setup. Placing the backup files in hidden directories avoids them from being used to set up the network interface.

  3. Identify the connected 10 GbE interfaces using the following command. Run the command for each 10 GbE interface.
    # ethtool interface
    

    In the preceding command, interface is the 10 GbE interface. The interface is eth4 and eth5 for Sun Fire X4170 M2 Oracle Database Servers, and eth8 through eth15 for Sun Fire X4800 Oracle Database Servers.

    The following is an example of the output from the command. The speed should be 10000Mb/s, Link detected should be yes, and Duplex should be full.

    # ethtool eth9
    Settings for eth9:
            Supported ports: [ FIBRE ]
            Supported link modes:  1000baseT/Full 
                                   10000baseT/Full 
            Supports auto-negotiation: No
            Advertised link modes:  1000baseT/Full 
                                    10000baseT/Full 
            Advertised auto-negotiation: No
            Speed: 10000Mb/s
            Duplex: Full
            Port: FIBRE
            PHYAD: 0
            Transceiver: external
            Auto-negotiation: on
            Supports Wake-on: umbg
            Wake-on: umbg
            Current message level: 0x00000007 (7)
            Link detected: yes
    
  4. Verify the current bonded interface using the following command. An example of the output from the command is also shown.
    # grep -i bondeth0 ifcfg-eth*
    
    ifcfg-eth1:MASTER=bondeth0
    ifcfg-eth2:MASTER=bondeth0
    
  5. Copy the 1 GbE interface files to the .Pre_10GigE_Settings directory using a command similar to the following:
    # cp -p ifcfg-eth1 ifcfg-eth2 ./.Pre_10GigE_Settings/.
    
  6. Copy the 10 GbE interface files to the .Pre_10GigE_Settings directory using a command similar to the following:
    # cp -p ifcfg-eth9 ifcfg-eth15 ./.Pre_10GigE_Settings/.
    
  7. Copy the files from the .Pre_10GigE_Settings directory to the .Post_10GigE_Settings directory using a command similar to the following:
    # cp -p ./.Pre_10GigE_Settings/* ./.Post_10GigE_Settings/.

4.6.3 Task 3: Edit the 10 GbE Interface Settings

The following procedure describes how to edit the ifcfg configuration files:

  1. Edit the ifcfg configuration files as shown in the following table. The files must be edited in the ./Post_10GigE_Settings/. directory.
    File Name Before Modification After Modification

    ifcfg-eth1

    #### DO NOT REMOVE THESE LINES ####
    #### %GENERATED BY CELL% ####
    DEVICE=eth1
    USERCTL=no
    ONBOOT=yes
    BOOTPROTO=none
    HOTPLUG=no
    IPV6INIT=no
    HWADDR=00:21:28:44:d2:5e
    MASTER=bondeth0
    SLAVE=yes
    #### DO NOT REMOVE THESE LINES ####
    #### %GENERATED BY CELL% ####
    DEVICE=eth1
    USERCTL=no
    ONBOOT=no 
    BOOTPROTO=none
    HOTPLUG=no
    IPV6INIT=no
    HWADDR=00:21:28:44:d2:5e

    ifcfg-eth2

    #### DO NOT REMOVE THESE LINES ####
    #### %GENERATED BY CELL% ####
    DEVICE=eth2
    USERCTL=no
    ONBOOT=yes
    BOOTPROTO=none
    HOTPLUG=no
    IPV6INIT=no
    HWADDR=00:21:28:44:d2:f2
    MASTER=bondeth0
    SLAVE=yes
    #### DO NOT REMOVE THESE LINES ####
    #### %GENERATED BY CELL% ####
    DEVICE=eth2
    USERCTL=no
    ONBOOT=no 
    BOOTPROTO=none
    HOTPLUG=no
    IPV6INIT=no
    HWADDR=00:21:28:44:d2:f2

    ifcfg-eth4 on Oracle Exadata Database Machine X2-2

    ifcfg-eth9 on Oracle Exadata Database Machine X2-8 Full Rack

    #### DO NOT REMOVE THESE LINES ####
    #### %GENERATED BY CELL% ####
    DEVICE=eth_interface
    ONBOOT=no
    BOOTPROTO=none
    HOTPLUG=no
    IPV6INIT=no
    HWADDR=00:1b:21:66:4b:c0

    In the preceding syntax, eth_interface is eth4 for Oracle Exadata Database Machine X2-2, or eth9 for Oracle Exadata Database Machine X2-8 Full Rack

    #### DO NOT REMOVE THESE LINES ####
    #### %GENERATED BY CELL% ####
    DEVICE=eth_interface
    USERCTL=no
    ONBOOT=yes
    BOOTPROTO=none
    HOTPLUG=no
    IPV6INIT=no
    HWADDR=00:1b:21:66:4b:c0
    MASTER=bondeth0
    SLAVE=yes

    In the preceding syntax, eth_interface is eth4 for Oracle Exadata Database Machine X2-2, or eth9 for Oracle Exadata Database Machine X2-8 Full Rack

    ifcfg-eth5 on Oracle Exadata Database Machine X2-2

    ifcfg-eth15 on Oracle Exadata Database Machine X2-8 Full Rack

    #### DO NOT REMOVE THESE LINES ####
    #### %GENERATED BY CELL% ####
    DEVICE=eth_interface2
    ONBOOT=no 
    BOOTPROTO=none
    HOTPLUG=no
    IPV6INIT=no
    HWADDR=00:1b:21:66:4b:c1

    In the preceding syntax, eth_interface is eth5 for Oracle Exadata Database Machine X2-2, or eth15 for Oracle Exadata Database Machine X2-8 Full Rack

    #### DO NOT REMOVE THESE LINES ####
    #### %GENERATED BY CELL% ####
    DEVICE=eth_interface2
    USERCTL=no
    ONBOOT=yes
    BOOTPROTO=none
    HOTPLUG=no
    IPV6INIT=no
    MASTER=bondeth0
    SLAVE=yes
    HWADDR=00:1b:21:66:4b:c1

    In the preceding syntax, eth_interface is eth5 for Oracle Exadata Database Machine X2-2, or eth15 for Oracle Exadata Database Machine X2-8 Full Rack

  2. Copy the edited files to the /etc/sysconfig/network-scripts directory using the following command:
    # cp -fp /etc/sysconfig/network-scripts/.Post_10GigE_Settings/ifcfg-eth* \
      /etc/sysconfig/network-scripts/.
    
  3. Restart the database server using the console.
  4. Monitor the boot sequence to ensure no errors occurred during bondeth0 initialization.

4.6.4 Task 4: Verify the 10 GbE Interfaces

The following procedure describes how to verify the 10 GbE interfaces:

  1. Log in as the root user.
  2. Use the cat command to review the /proc/net/bonding/bondeth0 file. The following is an example of the command and output from the command:
    # cat /proc/net/bonding/bondeth0
    
    Ethernet Channel Bonding Driver: v3.4.0 (October 7, 2008)
     
    Bonding Mode: fault-tolerance (active-backup)
    Primary Slave: None
    Currently Active Slave: eth9
    MII Status: up
    MII Polling Interval (ms): 100
    Up Delay (ms): 5000
    Down Delay (ms): 5000
     
    Slave Interface: eth9
    MII Status: up
    Link Failure Count: 0
    Permanent HW addr: 00:1b:21:66:4b:c0
     
    Slave Interface: eth15
    MII Status: up
    Link Failure Count: 0
    Permanent HW addr: 00:1b:21:66:4b:c1
    

    In the output, verify the slave interfaces are correct, and the MII statuses for the slave interface are up.

  3. Use the netstat -nr command to check the routing table. The routing table should not have changed. The following is an example of the command and output:
    # netstat -nr
    
    Kernel IP routing table
    Destination   Gateway      Genmask         Flags   MSS Window  irtt Iface
    scan_subnet 0.0.0.0        255.255.255.0   U         0 0          0 bondeth0
    192.168.80.0  0.0.0.0      255.255.254.0   U         0 0          0 bondib0
    192.168.80.0  0.0.0.0      255.255.254.0   U         0 0          0 bondib1
    192.168.80.0  0.0.0.0      255.255.254.0   U         0 0          0 bondib2
    192.168.80.0  0.0.0.0      255.255.254.0   U         0 0          0 bondib3
    mgmt_subnet 0.0.0.0        255.255.254.0   U         0 0          0 eth0
    0.0.0.0       scan_gw      0.0.0.0         UG        0 0          0 bondeth0
    
  4. Use the following commands to check the default gateway. The gateway is the SCAN network gateway, and should use bondeth0 on the 10 GbE interfaces.
    # grep GATEWAY /etc/sysconfig/network
    GATEWAY=gw_address
    GATEWAYDEV=bondeth0 
    
    # ping -c 2 gw_address
    PING gw_address (gw_address) 56(84) bytes of data.
    64 bytes from gw_address: icmp_seq=1 ttl=57 time=1.12 ms
    64 bytes from gw_address: icmp_seq=2 ttl=57 time=1.05 ms
    
    --- gw_address ping statistics ---
    2 packets transmitted, 2 received, 0% packet loss, time 1002ms
    rtt min/avg/max/mdev = 1.054/1.087/1.120/0.033 ms
    

    In the preceding commands and output, gw_address is the IP address of the default gateway.

  5. If the name servers were responding to the ping command in "Task 1: Verify ping Functionality", then use the following commands to check the name servers. Similar commands can be used to check other servers that connect to Oracle Exadata Database Machine.
    # grep "^nameserver" /etc/resolv.conf
    nameserver ip_address_1
    nameserver ip_address_2
    
    # ping -c 2 ip_address_1
    PING ip_address_1 (ip_address_1) 56(84) bytes of data.
    64 bytes from ip_address_1: icmp_seq=1 ttl=57 time=1.12 ms
    64 bytes from ip_address_1: icmp_seq=2 ttl=57 time=1.05 ms
     
    --- ip_address_1 ping statistics ---
    2 packets transmitted, 2 received, 0% packet loss, time 1002ms
    rtt min/avg/max/mdev = 1.054/1.087/1.120/0.033 ms
    

4.7 Maintaining the RoCE Network Fabric

The RoCE Network Fabric connects the database servers and Exadata Storage Servers through the bonded interface to the RoCE Network Fabric switches in the rack.

4.7.1 Backing Up Settings on the Cisco Nexus 9336C-FX2 RoCE Network Fabric Switch

The following procedure describes how to back up the Cisco Nexus 9336C-FX2 RoCE Network Fabric switch settings. A backup is recommended after the switch is initially configured, and again after every configuration change.

  1. Access the switch using SSH, and log in as the admin user and password.
  2. Review the current configuration.
    switch# show running-config
  3. Copy the current configuration to a file.

    You copy the current configuration to a file on the database server or storage server, using the format:

    
    switch# copy running-config tftp://hostname/directory_name/switch_name-start-config.back

    You can use any of the supported transport schemes on the RoCE Network Fabric switch: tftp, ftp, scp or sftp. The hostname is the address or name of the remote server, and the directory_name is the path to the directory that contains the file on the remote server.The hostname, directory_name, and file name are case sensitive.

  4. Exit from the session.
    switch# exit

4.7.2 Applying Golden Configuration Settings on Cisco Nexus 9336C-FX2 RoCE Network Fabric Switches

The golden configuration settings are normally applied on the RoCE Network Fabric switches during initial deployment.

Caution:

  • Take care when performing this procedure, as misconfiguration of the RoCE Network Fabric will likely cause a system outage.

  • Do not apply the golden configuration settings to an active switch in the RoCE Network Fabric. Applying the golden configuration settings to an active switch may disrupt the RoCE Network Fabric and cause a system outage.

Starting with Oracle Exadata System Software release 20.1.0, you can use the following procedure to apply the golden configuration settings on the RoCE Network Fabric switches. For earlier releases, see Restoring Settings on a Cisco Nexus 9336C-FX2 RoCE Network Fabric Switch.

The following procedure applies the latest golden configuration settings to one or more switches in an Exadata single rack or multi-rack configuration. The switches must be powered on along with one server that has SSH access to the switches for accessing the switch configuration files.

  1. Ensure you have a backup of the current switch configuration for each switch.
  2. Log in to a server that has SSH access to the switch, and contains the latest RDMA Network Fabric patch ZIP file.

    To find the available RDMA Network Fabric patches, search for 'RDMA network switch' in My Oracle Support document 888828.1. Download and use the latest patch for your Oracle Exadata System Software release.

  3. Unzip the RDMA Network Fabric patch ZIP file and change directories to the location of the patchmgr utility.
  4. Create a switch list file to drive the configuration of the RoCE Network Fabric switches.
    1. Create a file that contains the host name or IP address of the switches that you want to configure. Place each switch on a separate line.
      For example, create a file named switches.lst, which contains the host name of each switch on separate lines. On a single rack system, with only two leaf switches, the file might contain switch host name entries like:
      rack1sw-rocea0
      rack1sw-roceb0
    2. Tag each line to specify the configuration type for each switch.

      To specify the configuration type for each switch, append a colon (:) and tag to each switch host name or IP address in the switch list file. The following tags are supported:

      • leaf - Identifies a leaf switch in a single rack system. This configuration type is assumed if no tag is specified.
      • mspine - Identifies a spine switch. Note that one spine switch configuration supports all spine switches on single and multi-rack systems, with and without Exadata Secure RDMA Fabric Isolation.
      • mleaf - Identifies a leaf switch in a multi-rack X8M system.
      • sfleaf - Identifies a leaf switch in a single rack system that is enabled to support Exadata Secure RDMA Fabric Isolation.
      • msfleaf - Identifies a leaf switch in a multi-rack X8M system that is enabled to support Exadata Secure RDMA Fabric Isolation.
      • leaf23 - Identifies a leaf switch in a single rack system that is configured with 23 host ports. This configuration is required only for 8-socket systems (X8M-8 and later) with 3 database servers and 11 storage servers.
      • mleaf23 - Identifies a leaf switch in a multi-rack system that is configured with 23 host ports. This configuration is required only for 8-socket X8M-8 systems with 3 database servers and 11 storage servers.
      • mleaf_u14 - Identifies a leaf switch in a multi-rack system that is configured with 14 inter-switch links. This is the typical multi-rack leaf switch configuration for X9M and later model systems.
      • msfleaf_u14 - Identifies a leaf switch in a multi-rack system that is enabled to support Exadata Secure RDMA Fabric Isolation and is configured with 14 inter-switch links. This configuration is required for X9M and later model systems with Secure Fabric enabled.
      • mleaf23_u13 - Identifies a leaf switch in a multi-rack system that is configured with 23 host ports and 13 inter-switch links. This configuration is required only for 8-socket X9M-8 systems with three database servers and 11 storage servers.
      For example:
      rack1sw-rocea0:leaf
      rack1sw-roceb0:leaf
    3. For multi-rack configurations only, specify a unique loopback octet for each switch.

      The loopback octet is the last octet of the switch loopback address, which uniquely identifies a switch.

      To specify the loopback octet for each switch, append a period (.) and numeric loopback octet value to each entry in the switch list file.

      Caution:

      Every switch in a multi-rack configuration must have a unique loopback octet. If multiple switches use the same loopback octet, the RoCE Network Fabric cannot function correctly, resulting in a system outage.

      For the leaf switches, start with 101 as the first loopback octet value and increment as follows:

      • 101 - Rack 1 lower leaf switch (rack1sw-rocea0 in the following example)

      • 102 - Rack 1 upper leaf switch (rack1sw-roceb0 in the following example)

      • 103 - Rack 2 lower leaf switch (rack2sw-rocea0 in the following example)

      • 104 - Rack 2 upper leaf switch (rack2sw-roceb0 in the following example)

      • 105 - Rack 3 lower leaf switch

      • 106 - Rack 3 upper leaf switch, and so on.

      For the spine switches, start with 201 as the first loopback octet value and increment as follows:

      • 201 - Rack 1 spine switch (rack1sw-roces0 in the following example)

      • 202 - Rack 2 spine switch (rack2sw-roces0 in the following example)

      • 203 - Rack 3 spine switch

      • 204 - Rack 4 spine switch, and so on.

      For example, the switch list file for a 2-rack Exadata X9M system might contain:
      rack1sw-rocea0:mleaf_u14.101
      rack1sw-roceb0:mleaf_u14.102
      rack1sw-roces0:mspine.201
      rack2sw-rocea0:mleaf_u14.103
      rack2sw-roceb0:mleaf_u14.104
      rack2sw-roces0:mspine.202
      Or, if you were adding a 5th rack to an existing 4-rack Exadata X9M system the switch list file might contain:
      rack5sw-rocea0:mleaf_u14.109
      rack5sw-roceb0:mleaf_u14.110
      rack5sw-roces0:mspine.205
  5. Use patchmgr to apply the latest golden configuration settings to the RoCE Network Fabric switches in the switch list file.
    For example:
    # ./patchmgr --roceswitches switches.lst --apply-config –log_dir log-directory
  6. Use patchmgr to verify the configuration of the RoCE Network Fabric switches in the switch list file.
    For example:
    # ./patchmgr --roceswitches switches.lst --verify-config –log_dir log-directory

4.7.3 Restoring Settings on a Cisco Nexus 9336C-FX2 RoCE Network Fabric Switch

You can restore the Cisco Nexus 9336C-FX2 RoCE Network Fabric switch settings from a backup.

The following procedure outlines how to restore Cisco Nexus 9336C-FX2 RoCE Network Fabric switch settings from a backup. This procedure can also be used to restore golden configuration settings on a switch prior to Oracle Exadata System Software release 20.1.0.

Note:

Oracle Exadata System Software release 20.1.0 contains a simplified and enhanced procedure for applying golden configuration settings on the RoCE Network Fabric switches. See Applying Golden Configuration Settings on Cisco Nexus 9336C-FX2 RoCE Network Fabric Switches.

  1. Access the switch using SSH, and log in as the admin user and password.
  2. Delete any existing backup configuration (or golden configuration) file on the switch for the configuration you are restoring.

    The golden configuration files are:

    • Single rack leaf (leaf): roce_leaf_switch.cfg
    • Multi-rack spine (mspine): roce_spine_switch_multi.cfg
    • Multi-rack leaf (mleaf): roce_leaf_switch_multi.cfg
    • Single rack leaf with Secure Fabric support (sfleaf): roce_sf_leaf_switch.cfg
    • Multi-rack leaf with Secure Fabric support (msfleaf): roce_sf_leaf_switch_multi.cfg
    • Single rack leaf configured with 23 host ports (leaf23): roce_leaf_switch_23hosts.cfg
    • Multi-rack leaf configured with 23 host ports (mleaf23): roce_leaf_switch_23hosts_multi.cfg
    • Multi-rack leaf configured with 14 inter-switch links (mleaf_u14): roce_leaf_switch_14uplinks_multi.cfg
    • Multi-rack leaf configured with 14 inter-switch links and with Secure Fabric support (msfleaf_u14): roce_sf_leaf_switch_14uplinks_multi.cfg
    • Multi-rack leaf configured with 23 host ports and 13 inter-switch links (mleaf23_u13): roce_leaf_switch_23hosts_13uplinks_multi.cfg

    Note:

    If you do not remove the file you are replacing, then when you attempt to overwrite the file you will get a 'permission denied' error.

    For example:

    rack3sw-rocea0# delete bootflash:roce_leaf_switch.cfg
    Do you want to delete "/roce_leaf_switch.cfg" ? (yes/no/abort) [y] y
    rack3sw-rocea0# 
  3. Copy the backup configuration file (or golden configuration file) to the switch.

    For example:

    [root@server_hostname ~]# scp roce_leaf_switch.cfg admin@100.104.10.21:/
    User Access Verification
    Password:
    roce_leaf_switch.cfg 100% 23KB 23.5KB/s 00:00

    Note:

    You can use any of the supported transport schemes on the RoCE Network Fabric switch: tftp, ftp, scp or sftp.

    If you are restoring a golden configuration file (instead of restoring a backup configuration file), you can restore the appropriate golden configuration file based on your system configuration and type of switch. The files are located within the patchmgr switch bundle in the roce_switch_templates/ directory.

  4. Apply the backup configuration (or golden configuration) file.

    Choose one of the following.

    1. If you are restoring a backup configuration file, apply the backup configuration using the following commands.

      In the following example, the backup configuration file being restored is running-config.bak. Adjust the command to suit your backup file name.

      rack3sw-rocea0# copy bootflash:running-config.bak startup-config
      rack3sw-rocea0# reload
    2. If you are applying a golden configuration file, use the following commands.

      In the following example, the golden configuration file being applied is roce_leaf_switch.cfg. Adjust the command to suit the golden configuration file that you want to apply.

      rack3sw-rocea0# run-script bootflash:roce_leaf_switch.cfg | grep 'none'
      rack3sw-rocea0# copy running-config startup-config

      Note:

      The run-script command may take up to 1-2 minutes on single-rack switch and up to 3-4 minutes on a multi-rack switch.
  5. Exit from the session.
    rack3sw-rocea0# exit

4.7.4 Using Access VLANs with Cisco Nexus 9336C-FX2 RoCE Network Fabric Switches

You can change the switchport access vlan ID setting to implement server-level isolation across the RoCE Network Fabric.

By default, Oracle Exadata uses Access VLAN ID 3888 for all RoCE Network Fabric private network traffic, on the server re0 and re1 interfaces.

If you change the default Access VLAN ID setting on the Cisco Nexus 9336C-FX2 RoCE Network Fabric switches, the corresponding server can no longer communicate with other database servers or storage servers using the default setting. The range of valid Access VLAN IDs is 2744-3967.

Use the following procedure to implement different Access VLANs on the RoCE Network Fabric.

Note:

It is recommended that you shut down the servers while modifying the switch configuration to avoid any outages. If you are changing the Access VLAN ID for a subset of the servers in the rack, then only the affected servers need to be shut down prior to modifying the switch.
  1. Identify the RoCE Network Fabric switch host name and port that is connected to the re0 interface of the server that you are configuring.
    [root@dbm01adm02 ~]# /opt/oracle.SupportTools/ibdiagtools/utils/lldp_cap.py re0 
    | egrep 'SWITCH_PORT:|SWITCH_NAME:'
    SWITCH_PORT: Ethernet1/21
    SWITCH_PORT_DESCRIPTION: adm02
  2. Log in to the lower leaf switch using the SWITCH_NAME identified in the previous step.
    # ssh admin@dbm01sw-rocea0
    User Access Verification
    Password: *******
    
  3. Check the configuration for the switch port.

    To specify the interface, use the SWITCH_PORT identified in Step 1.

    dbm01sw-rocea0# show running-config interface ethernet 1/21
    !Running configuration last done at: Wed Nov 13 10:34:58 2019
    !Time: Wed Nov 13 14:55:48 2019
    
    version 7.0(3)I7(6) Bios:version 05.33 
    
    interface Ethernet1/21
      description adm02
      switchport access vlan 3888
      priority-flow-control mode on
      spanning-tree port type edge
      spanning-tree bpduguard enable
      mtu 2300
      speed 100000
      duplex full
      no negotiate auto
      service-policy type qos input QOS_MARKING no-stats  
  4. Create the new VLAN ID on the switch.

    This configuration is only required once on each switch. However, it is harmless to repeat the VLAN configuration on a switch.

    For example, to create a new VLAN ID with the value 3889:

    dbm01sw-rocea0# configure terminal 
    Enter configuration commands, one per line. End with CNTL/Z.
    dbm01sw-rocea0(config)# vlan 3889
    dbm01sw-rocea0(config-vlan)# exit
    dbm01sw-rocea0(config)# exit
    dbm01sw-rocea0# 
  5. Modify the switch port configuration to change the switchport access vlan setting.

    Specify the same interface as in the previous steps. Then, remove the old VLAN ID (for example, 3888), add the new VLAN ID (for example, 3889), and exit configuration mode.

    dbm01sw-rocea0# configure terminal 
    Enter configuration commands, one per line. End with CNTL/Z.
    dbm01sw-rocea0(config)# interface ethernet 1/21
    dbm01sw-rocea0(config-if)# no switchport access vlan 3888
    dbm01sw-rocea0(config-if)# switchport access vlan 3889
    dbm01sw-rocea0(config-if)# exit
    dbm01sw-rocea0(config)# exit
    dbm01sw-rocea0# 
  6. Verify that the switch interface is using the new VLAN ID.

    Specify the same interface as in the previous steps.

    dbm01sw-rocea0# show running-config interface ethernet 1/21
     
    !Command: show running-config interface Ethernet1/21
    !Running configuration last done at: Wed Nov 20 23:53:38 2019
    !Time: Wed Nov 20 23:53:45 2019
     
    version 7.0(3)I7(6) Bios:version 05.33 
     
    interface Ethernet1/21
      description adm02
      switchport access vlan 3889
      priority-flow-control mode on
      spanning-tree port type edge
      spanning-tree bpduguard enable
      mtu 2300
      speed 100000
      duplex full
      no negotiate auto
      service-policy type qos input QOS_MARKING no-stats
  7. Save the configuration.
    dbm01sw-rocea0# copy running-config startup-config 
    [########################################] 100%
    Copy complete, now saving to disk (please wait)...
    Copy complete.
  8. Repeat Steps 1 to 7 for all of the database servers (bare metal servers or KVM hosts) and all of the storage servers that you want to change to the new VLAN ID.
  9. Repeat Steps 1 to 8 for the re1 interfaces, which are connected to the upper leaf switch.

4.7.5 Replacing a Failed RoCE Network Fabric Switch

This procedure describes how to replace a failed RoCE Network Fabric switch.

This procedure depends on having a backup of the configuration for the failed switch.

  1. Power off both power supplies on the switch by removing the power plugs.
  2. Verify each cable is labeled, and then disconnect each cable from the switch.

    All RoCE Network Fabric cables should have labels at both ends indicating their locations. If there are any cables that do not have labels, then add a label before disconnecting the cable.

    You will use these labels to cable the replacement switch.

  3. Remove the switch from the rack.
    1. Extend the rack cabinet's anti-tilt bar.
    2. Attach an antistatic wrist strap.
    3. Remove the switch from the rack.
  4. Install the new switch in the rack.
    After you have placed the new switch in the correct position in the rack, if there is no further equipment being serviced in the rack, then you can retract the anti-tilt bar on the rack cabinet.
  5. Power on the switch by plugging in the power plugs.
  6. Restore the switch settings using the backup, as described in Restoring Settings on a Cisco Nexus 9336C-FX2 RoCE Network Fabric Switch.
  7. Connect the cables to the new switch.
    Use the labels on each cable to ensure that you connect each cable to the correct port on the new switch.
  8. Complete the steps in Verifying the RoCE Network Fabric Configuration.

4.7.6 Verifying the RoCE Network Fabric Configuration

This procedure describes how to verify the RoCE Network Fabric configuration.

  1. Verify the proper oracle-rdma-release software versions are being used on the database servers.
    [root@dbm01adm08 ~]# rpm -qa |grep oracle-rdma-release
    oracle-rdma-release-0.11.0-1.el7ora.x86_64

    The oracle-rdma-release software and adapter firmware versions are automatically maintained on the Oracle Exadata storage servers.

  2. Check the adapter firmware versions on the database servers.

    Use the CheckHWnFWProfile script to check firmware versions for the RDMA Network Fabric adapters.

    # /opt/oracle.SupportTools/CheckHWnFWProfile -action list
  3. Visually check all the RDMA Network Fabric cable connections within the rack.
    The port lights should be on, and the LEDs should be on. Do not press each connector to verify connectivity.
  4. Complete the steps described in My Oracle Support Doc ID 2587717.1

4.7.7 Verifying RoCE Network Fabric Operation

Verify the RoCE Network Fabric is operating properly after making modifications to the underlying hardware.

If hardware maintenance has taken place with any component in the RoCE Network Fabric, including replacing an RDMA Network Fabric Adapter on a server, a switch, or a cable, or if the operation of the RoCE Network Fabric is suspected to be substandard, then verify the RoCE Network Fabric is operating properly. The following procedure describes how to verify network operation:

  1. Complete the steps in Verifying the RoCE Network Fabric Configuration.
  2. Prepare for infinicheck.

    You may need to run the following commands before you can use the infinicheck command to perform RoCE Network Fabric configuration, connectivity, and performance checks.

    • If required, use the -s option set up user equivalence for password-less SSH across the RoCE Network Fabric. For example:

      # /opt/oracle.SupportTools/ibdiagtools/infinicheck -g hostips -c cellips -s
    • You can use the -z option to clear the files that were created during the last run of the infinicheck command. For example:

      # /opt/oracle.SupportTools/ibdiagtools/infinicheck -g hostips -c cellips -z

    In the previous commands, hostips is the name of an input file that contains a list of RoCE Network Fabric IP addresses for the database servers, and cellips is the name of an input file that contains a list of RoCE Network Fabric IP addresses for the storage servers.

  3. Run the infinicheck command to perform RoCE Network Fabric configuration, connectivity, and performance checks.

    On a properly configured system, you can run the infinicheck command on any database server with minimal arguments. For example:

    # /opt/oracle.SupportTools/ibdiagtools/infinicheck

    By default, the infinicheck command performs a group of configuration and connectivity checks on the RoCE Network Fabric. You can use the -p option to run the optional performance tests. Or, use the -a option to perform all checks, including the performance tests. For example:

    # /opt/oracle.SupportTools/ibdiagtools/infinicheck -a

    Note:

    System performance may be impacted when the infinicheck command performs performance stress tests. Consequently, only run the infinicheck performance tests when required and preferably when there is no workload on the system.

    You can also specify the servers in your system explicitly by using the -g option to specify the database servers and the -c option to specify the storage servers. For example:

    # /opt/oracle.SupportTools/ibdiagtools/infinicheck -g hostips -c cellips

    In the previous example, hostips is the name of an input file that contains a list of RoCE Network Fabric IP addresses for the database servers, and cellips is the name of an input file that contains a list of RoCE Network Fabric IP addresses for the storage servers.

    Instead of listing the database servers and storage servers in input files, you can supply a comma-separated list of IP addresses on the command line.

    The following example displays typical terminal output from the infinicheck command.

    # /opt/oracle.SupportTools/ibdiagtools/infinicheck -g hostips -c cellips
                            INFINICHECK
                    [Network Connectivity, Configuration and Performance]
    
                        #### FABRIC TYPE TESTS ####
    
    System type identified: RoCE
    Verifying User Equivalence of user=root from all DBs to all CELLs.
    
                    #### RoCE CONFIGURATION TESTS ####
            Checking for presence of RoCE devices on all DBs and CELLs
    [SUCCESS].... RoCE devices on all DBs and CELLs look good
            Checking for RoCE Policy Routing settings on all DBs and CELLs
    [SUCCESS].... RoCE Policy Routing settings look good
            Checking for RoCE DSCP ToS mapping on all DBs and CELLs
    [SUCCESS].... RoCE DSCP ToS settings look good
            Checking for RoCE PFC settings and DSCP mapping on all DBs and CELLs
    [SUCCESS].... RoCE PFC and DSCP settings look good
            Checking for RoCE interface MTU settings. Expected value : 2300
    [SUCCESS].... RoCE interface MTU settings look good
            Verifying switch advertised DSCP on all DBs and CELLs ports ( ~ 2 min )
    [SUCCESS].... Advertised DSCP settings from RoCE switch looks good
    
                        #### CONNECTIVITY TESTS ####
                        [COMPUTE NODES -> STORAGE CELLS]
                               (60 seconds approx.)
                       (Will walk through QoS values: 0-6)
    [SUCCESS]..............Results OK
    [SUCCESS]....... All can talk to all storage cells
                        [COMPUTE NODES -> COMPUTE NODES]
                               (60 seconds approx.)
                       (Will walk through QoS values: 0-6)
    [SUCCESS]..............Results OK
    [SUCCESS]....... All hosts can talk to all other nodes
            Verifying Subnet Masks on all nodes
    [SUCCESS] ......... Subnet Masks is same across the network

4.7.8 Upgrading the Switch Firmware for RoCE Network Fabric

The patchmgr utility is used to upgrade the RoCE Network Fabric switches.

The switch firmware is upgraded in a rolling manner. patchmgr upgrades the switches in the order they are listed in the supplied file, for example roceswitch.lst.

4.7.9 Downgrading the RoCE Network Fabric Switch Software

The patchmgr utility is used to downgrade the RoCE Network Fabric switches.

The switch firmware is downgraded in a rolling manner. patchmgr downgrades the switches in the order they are listed in the supplied file, for example roceswitch.lst.

Refer to Downgrading RoCE Network Fabric Switch Firmware for the instructions.

4.8 Maintaining the InfiniBand Network Fabric Network

The InfiniBand Network Fabric network connects the database servers and Exadata Storage Servers through the bonded interface to the InfiniBand Network Fabric switches in the rack.

4.8.1 Backing Up and Restoring InfiniBand Switch Settings

The procedure for backing up and restoring InfiniBand switch settings depends on the firmware on the switch.

The InfiniBand firmware release 1.1.3-2 or later has Integrated Lights Out Manager (ILOM) which provides backup and restore capability. The InfiniBand firmware release 1.0.1 does not have ILOM. You can either upgrade to the latest available firmware release and then use the procedure in Backing Up Settings on a Switch with 2.1.3-4 Firmware, or you can manually perform the backup and restore of individual files.

See Also:

Oracle Integrated Lights Out Manager (ILOM) Documentation at http://www.oracle.com/goto/ilom/docs
4.8.1.1 Backing Up Settings on a Switch with 2.1.3-4 Firmware

The following procedure describes how to back up a switch with 2.1.3-4 firmware. The backup only needs to be done once after the switch has been initially configured with the right settings.

  1. Navigate to the switch ILOM URL in a browser. For example: http://dbm002-i1.us.example.com.
  2. Log in as the ilom-admin user.
  3. Select the Maintenance tab.
  4. Select the Backup/Restore tab.
  5. Select the Backup operation and the Browser method.
  6. Enter a passphrase. This is used to encrypt sensitive information, such as user passwords, in the backup.
  7. Click Run, and save the resulting XML file in a secure location.
  8. Log in to the Sun Datacenter InfiniBand Switch 36 switch as the root user.
  9. Use the scp command to copy the following files:
    • root SSH keys: /root/.ssh/authorized_keys
    • nm2user SSH keys (if it exists): /home/nm2user/.ssh/authorized_keys
    • host file: /etc/hosts
    • host file: /etc/opensm/opensm.conf to preserve the openSM settings
  10. Save the output from the version command.
4.8.1.2 Backing Up Settings on a Switch with 1.1.3-2 Firmware

The following procedure describes how to back up a switch with 1.1.3-2 firmware. The backup only needs to be done once after the switch has been initially configured with the right settings.

  1. Navigate to the switch ILOM URL in a browser. For example: http://dbm002-i1.us.example.com.
  2. Log in as the ilom-admin user.
  3. Select the Maintenance tab.
  4. Select the Backup/Restore tab.
  5. Select the Backup operation and the Browser method.
  6. Enter a passphrase. This is used to encrypt sensitive information, such as user passwords, in the backup.
  7. Click Run, and save the resulting XML file in a secure location.
  8. Log in to the Sun Datacenter InfiniBand Switch 36 switch as the root user.
  9. Use the scp command to copy the following files:
    • Network configuration: /etc/sysconfig/network-scripts/ifcfg-eth0

    • DNS information: /etc/resolv.conf

    • NTP information: /etc/ntp.conf

    • Time zone information: /etc/localtime

    • openSM settings: /etc/opensm/opensm.conf

    • Host name: /etc/sysconfig/network

    • root SSH keys: /root/.ssh/authorized_keys

    • nm2user SSH keys (if it exists): /home/nm2user/.ssh/authorized_keys

  10. Run the hostname command, and then save the output. This is done in case the host name is not set in the /etc/sysconfig/network file.
  11. Save the passwords for the root and nm2user accounts.
  12. Run the nm2version command, and then save the output.
4.8.1.3 Backing Up Settings on a Switch with 1.0.1 Firmware

The following procedure describes how to back up the settings on a switch with 1.0.1 firmware:

  1. Log in to the switch as the root user. If you do not have the password for the root user, then contact Oracle Support Services.
  2. Make copies of the following files:
    • Network configuration: /etc/sysconfig/network-scripts/ifcfg-eth0

    • DNS information: /etc/resolv.conf

    • NTP information: /etc/ntp.conf

    • Time zone information: /etc/localtime

    • openSM settings: /etc/opensm/opensm.conf

    • Host name: /etc/sysconfig/network

    • root SSH keys: /root/.ssh/authorized_keys

    • nm2user SSH keys (if it exists): /home/nm2user/.ssh/authorized_keys

  3. Run the hostname command and save the output, in case the host name is not set in the /etc/sysconfig/network file.
  4. Save the passwords for the root and nm2user accounts.
  5. Run the nm2version command and save the output.
4.8.1.4 Restoring Settings on a Switch with 2.1.3-4 Firmware

The following procedure describes how to restore the settings on a switch with 2.1.3-4 firmware:

  1. Run the version command, and ensure that the switch is at the right firmware level. If not, then upgrade the switch to the correct firmware level.
  2. Navigate to the switch ILOM URL in a browser. For example: http://dbm002-i1.us.example.com.
  3. Log in as the ilom-admin user.
  4. Select the Maintenance tab.
  5. Select the Backup/Restore tab.
  6. Select the Restore operation and the Browser method.
  7. Click Browse, and select the XML file that contains the switch configuration backup.
  8. Enter the passphrase that was used during the backup.
  9. Click Run to restore the configuration.
  10. Log in to the Sun Datacenter InfiniBand Switch 36 switch as the root user.
  11. Restore the following files from the backup:
    • root SSH keys: /root/.ssh/authorized_keys
    • nm2user SSH keys (if it exists): /home/nm2user/.ssh/authorized_keys
    • host file: /etc/hosts
    • host file: /etc/opensm/opensm.conf
  12. Restart openSM from the switch CLI using the following commands:
    disablesm
    enablesm
    
  13. Log in as the root user.
  14. Restart the switch.
4.8.1.5 Restoring Settings on a Switch with 1.1.3-2 Firmware

The following procedure describes how to restore the settings on a switch with 1.1.3-2 firmware:

  1. Run the version command, and ensure that the switch is at the right firmware level. If not, then upgrade the switch to the correct firmware level.
  2. Navigate to the switch ILOM URL in a browser. For example: http://dbm002-i1.us.example.com.
  3. Log in as the ilom-admin user.
  4. Select the Maintenance tab.
  5. Select the Backup/Restore tab.
  6. Select the Restore operation and the Browser method.
  7. Click Browse, and select the XML file that contains the switch configuration backup.
  8. Type in the passphrase that was used during the backup.
  9. Click Run to restore the configuration.
  10. Log in to the Sun Datacenter InfiniBand Switch 36 switch as the root user.
  11. Restore the following files from the backup:
    • Network configuration: /etc/sysconfig/network-scripts/ifcfg-eth0

    • DNS information: /etc/resolv.conf

    • NTP information: /etc/ntp.conf

    • Time zone information: /etc/localtime

    • openSM settings: /etc/opensm/opensm.conf

    • Host name: /etc/sysconfig/network

    • root SSH keys: /root/.ssh/authorized_keys

    • nm2user SSH keys (if it exists): /home/nm2user/.ssh/authorized_keys

  12. Restore the host name by adding the following line to the /etc/sysconfig/network file, if it not already in the file.
    HOSTNAME=switch_host_name
    
  13. Restore the passwords of the root and nm2user users using the passwd command.
  14. Run the following commands in the order shown to restart the services and openSM:
    service network restart 
    service ntpd restart 
    disablesm 
    enablesm 
    
  15. Log in as the root user.
  16. Restart the switch.
4.8.1.6 Restoring Settings on a Switch with 1.0.1 Firmware

The following procedure describes how to restore the settings to a switch with 1.0.1 firmware:

  1. Log in to the switch as the root user. If you do not have the password for the root user, then contact Oracle Support Services.
  2. Ensure that the switch is at the right firmware level. If not, then upgrade the switch to the correct firmware level.
  3. Restore the following files from the backup:
    • Network configuration: /etc/sysconfig/network-scripts/ifcfg-eth0

    • DNS information: /etc/resolv.conf

    • NTP information: /etc/ntp.conf

    • Time zone information: /etc/localtime

    • openSM settings: /etc/opensm/opensm.conf

    • Host name: /etc/sysconfig/network

    • root SSH keys: /root/.ssh/authorized_keys

    • nm2user SSH keys (if it exists): /home/nm2user/.ssh/authorized_keys

  4. Restore the host name by adding a HOSTNAME=switch_host_name line to the /etc/sysconfig/network file, if not already present.
  5. Restore the passwords of the root and nm2user users using the passwd command.
  6. Run the following commands in the order shown to restart the services and openSM:
    service network restart 
    service ntpd restart 
    disablesm 
    enablesm 
    
  7. Log in as the root user.
  8. Restart the switch.

4.8.2 Verifying the InfiniBand Network Fabric Configuration

This procedure describes how to verify the InfiniBand Network Fabric configuration.

  1. Verify the proper OpenFabrics Enterprise Distribution (OFED) software and HCA firmware versions are being used on the database servers.

    The OFED software and HCA firmware versions are automatically maintained on the Exadata storage servers.

  2. Verify the InfiniBand Network Fabric topology using the following command from a database server or Exadata Storage Server:
    # /opt/oracle.SupportTools/ibdiagtools/verify-topology
    

    If any errors occur, then contact Oracle Support Services.

4.8.3 Using the verify-topology Utility

The verify-topology utility can be used to identify various network connection problems.

The problems you can diagnose using verify-topology include:

  • Missing InfiniBand Network Fabric cable
  • Missing InfiniBand Network Fabric connection
  • Incorrectly-seated cable
  • Cable connected to the wrong endpoint

The utility is available in the ibdiagtools directory on all servers. To view the options for the verify-topology utility, use the following command:

./verify-topology -h

[ DB Machine Infiniband Cabling Topology Verification Tool ]
Usage: ./verify-topology 
    [-v|--verbose]
    [-r|--reuse (cached maps)]
    [-m|--mapfile]
    [-ibn|--ibnetdiscover (specify location of ibnetdiscover output)]
    [-ibh|--ibhosts (specify location of ibhosts output)]
    [-ibs|--ibswitches (specify location of ibswitches output)]
    [-t|--topology [torus | fattree | halfrack] default is fattree]

Example 4-1 Using verify-topology to Identify Cables Seated Incorrectly

The following is an example shows the output when using the verify-topology utility. In the example, the error shows the cables are connected incorrectly. Both cables from the server are going to same InfiniBand Network Fabric switch. If the switch fails, then the server loses connectivity to InfiniBand Network Fabric network.

[ DB Machine Infiniband Cabling Topology Verification Tool ]

Bad link:Switch 0x21283a8371a0a0 Port 11A - Sun Port 11B
        Reason : 2.5 Gbps Speed found. Could be 10 Gbps
        Possible cause : Cable isn't fully seated in

Bad link:Switch 0x21283a89eba0a0 Port 11B - Sun Port 11A
        Reason : 2.5 Gbps Speed found. Could be 10 Gbps
        Possible cause : Cable isn't fully seated in

Is every external switch connected to every internal switch..........[SUCCESS]
Are any external switches connected to each other....................[SUCCESS]
Are any hosts connected to spine switch..............................[SUCCESS]
Check if all hosts have 2 CAs to different switches..................[ERROR]
Node trnA-db01 has 1 endpoints. (Should be 2)
Port 2 of this node is not connected to any switch

--------fattree End Point Cabling verification failed-----

Leaf switch check: cardinality and even distribution.................[ERROR]

Internal QDR Switch 0x21283a8371a0a0 has fewer than 4 compute nodes
It has only 3 links belonging to compute nodes
Check if each rack has a valid internal ring.........................[SUCCESS]

4.8.4 Verifying InfiniBand Network Fabric Operation

Verify the InfiniBand Network Fabric network is operating properly after making modifications to the underlying hardware.

If hardware maintenance has taken place with any component in the InfiniBand Network Fabric network, including replacing an InfiniBand HCA on a server, an InfiniBand Network Fabric switch, or an InfiniBand Network Fabric cable, or if operation of the InfiniBand Network Fabric is suspected to be substandard, then verify the InfiniBand Network Fabric is operating properly. The following procedure describes how to verify network operation:

Note:

The following procedure can be used any time the InfiniBand Network Fabric is performing below expectations.
  1. Complete the steps in Verifying the InfiniBand Network Fabric Configuration.
  2. Run the ibdiagnet command to verify the InfiniBand Network Fabric operation.
    # ibdiagnet -c 1000

    All errors reported by this command should be investigated. This command generates a small amount of network traffic, and may be run while normal workload is running.

  3. Run the ibqueryerrors.pl command to report on switch port error counters and port configuration information.
    #  ibqueryerrors.pl -rR -s RcvSwRelayErrors,XmtDiscards,XmtWait,VL15Dropped

    Errors such as LinkDowned, RcvSwRelayErrors, XmtDiscards, and XmtWait are ignored when using the preceding command.

    Note:

    • The InfiniBand Network Fabric counters are cumulative and the errors may have occurred at any time in the past. If there are errors reported, then Oracle recommends clearing the InfiniBand Network Fabric counters using the ibclearcounters command. After running the command, let the system run for a few minutes under load, and then run the ibquerryerrors command.

    • Some counters, such as SymbolErrors or RcvErrors can increment when servers are rebooted. Small values for these counters which are less than the LinkDowned counter are generally not a problem. The LinkDowned counter indicates the number of times the port has gone down usually for valid reasons, such as a reboot, and is not usually an error indicator by itself.

    • Any links reporting high, persistent errors especially SymbolErrors, LinkRecovers, RcvErrors, or LinkIntegrityErrors may indicate a bad or loose cable or port.

    • If there are persistent, high InfiniBand Network Fabric error counters, then investigate and correct the problem.

  4. If there is no load running on any portion of the InfiniBand Network Fabric, such as no databases running, then run the infinicheck command to perform full InfiniBand Network Fabric configuration, connectivity and performance evaluation.

    Note:

    This command evaluates full network maximum throughput and should not be run when there is workload running on any system on the InfiniBand Network Fabric.

    This command relies on a fully-configured system. The first command clears the files that were created during the last run of the infinicheck command.

    # /opt/oracle.SupportTools/ibdiagtools/infinicheck -z 
    
    # /opt/oracle.SupportTools/ibdiagtools/infinicheck

    The following is an example of the output from the command:

    Verifying User Equivalance of user=root to all hosts.
    (If it isn't setup correctly, an authentication prompt will appear to push keys
     to all the nodes)
     
     Verifying User Equivalance of user=root to all cells.
    (If it isn't setup correctly, an authentication prompt will appear to push keys
     to all the nodes)
     
     
                        ####  CONNECTIVITY TESTS  ####
                        [COMPUTE NODES -> STORAGE CELLS]
                               (30 seconds approx.)
    [SUCCESS]..............Connectivity verified
     
    [SUCCESS]....... All hosts can talk to all storage cells
     
            Verifying Subnet Masks on Hosts and Cells
    [SUCCESS] ......... Subnet Masks is same across the network
     
            Checking for bad links in the fabric
    [SUCCESS].......... No bad fabric links found
     
                        [COMPUTE NODES -> COMPUTE NODES]
                               (30 seconds approx.)
    [SUCCESS]..............Connectivity verified
     
    [SUCCESS]....... All hosts can talk to all other nodes
     
     
                        ####  PERFORMANCE TESTS  ####
     
                        [(1) Every COMPUTE NODE to its STORAGE CELL]
                              (15 seconds approx.)
    [SUCCESS]........ Network Bandwidth looks OK.
    .......... To view only performance results run ./infinicheck -d -p
     
                        [(2) Every COMPUTE NODE to another COMPUTE NODE]
                              (10 seconds approx.)
    [SUCCESS]........ Network Bandwidth looks OK.
    ...... To view only performance results run ./infinicheck -d -p
     
                        [(3) Every COMPUTE NODE to ALL STORAGE CELLS]
                      (45 seconds approx.) (looking for SymbolErrors)
     
    [SUCCESS]....... No port errors found

4.8.5 Understanding the Network Subnet Manager Master

The Subnet Manager manages all operational characteristics of the InfiniBand Network Fabric network.

The operational characteristics of the Subnet Manager include:

  • Discover the network topology
  • Assign a local identifier to all ports connected to the network
  • Calculate and program switch forwarding tables
  • Monitor changes in the fabric

The InfiniBand Network Fabric network can have more than one Subnet Manager, but only one Subnet Manager is active at a time. The active Subnet Manager is the Master Subnet Manager. The other Subnet Managers are the Standby Subnet Managers. If a Master Subnet Manager is shut down or fails, then a Standby Subnet Manager automatically becomes the Master Subnet Manager.

Each Subnet Manager has a priority that can be configured. When there is more than one Subnet Manager on the InfiniBand Network Fabric network, the Subnet Manager with the highest priority becomes the Master Subnet Manager. On Oracle Exadata, the Subnet Managers on leaf switches should be configured as priority 5, and the Subnet Managers on spine switches should be configured as priority 8.

The following guidelines determine where Subnet Managers run on Oracle Exadata:

  • Only run Subnet Managers on the RDMA Network Fabric switches specified for use in your Oracle Engineered System. Running Subnet Manager on any other device is not supported.

  • In Exadata-only configurations, when the InfiniBand Network Fabric network consists of one, two, or three racks cabled together, all switches should run Subnet Manager. The Master Subnet Manager should be run on a spine switch. If the network has only leaf switches, as in Oracle Exadata Quarter Racks, then Subnet Manager Master runs on a leaf switch. When the InfiniBand Network Fabric network consists of four or more racks cabled together, then only spine switches should run Subnet Manager. The leaf switches should disable Subnet Manager.

  • In multi-rack configurations, using different types of racks such as Oracle Exadata Database Machine and Oracle Exalogic Elastic Cloud, see My Oracle Support Doc ID 1682501.1.

See Also:

Sun Datacenter InfiniBand Switch 36 Firmware Version 2.1 Documentation at http://docs.oracle.com/cd/E36265_01/index.html

4.8.6 Upgrading the Switch Firmware for InfiniBand Network Fabric

The patchmgr utility is used to upgrade and downgrade the InfiniBand Network Fabric switches. The minimum switch firmware release that can use the patchmgr utility is release 1.3.3-2. If the switch firmware is at an earlier release, then it is necessary to upgrade the firmware to release 1.3.3-2 using the instructions in My Oracle Support note 888828.1.

4.8.7 Downgrading the InfiniBand Network Fabric Switch Software

Use patchmgr to downgrade the switch firmware.

4.9 Modifying the InfiniBand Network Fabric Configuration

You can change how the InfiniBand Network Fabric network is configured by changing the IP addresses or host names, or by implementing partitioning.

4.9.1 Configuring InfiniBand Partitioning

Configuring InfiniBand partitioning is described in Implementing InfiniBand Partitioning across Oracle VM Oracle RAC Clusters on Oracle Exadata. You can use InfiniBand partitioning with or without Oracle VM.

4.9.2 Changing InfiniBand IP Addresses and Host Names

It may be necessary to change the InfiniBand network information on an existing Oracle Exadata Rack. The change may be needed to support a media server with multiple InfiniBand cards, or keep InfiniBand traffic on a distinct InfiniBand network such as having production, test and QA environments in the same rack.

All InfiniBand addresses must be in the same subnet, with a minimum subnet mask of 255.255.240.0 (or /20). The subnet mask chosen should be wide enough to accommodate possible future expansion of the Oracle Exadata Rack and InfiniBand network.

Note:

It is not recommended to use SDP over InfiniBand on Exadata Database Machine.

4.9.3 Changing InfiniBand Network Information

This procedure describes how to change the InfiniBand network information.

The procedure described in this section is based on the following assumptions:

  • All changes should be done as the ilom-admin user using the Integrated Lights Out Manager (ILOM) interface.

  • Channel bonding is used for the client access network, such that the NET1 and NET2 interfaces are bonded to create BONDETH0. If channel bonding is not used, then replace BONDETH0 with NET1 in the procedure.

  • On Oracle Exadata X4-2 and later hardware, as of Oracle Exadata System Software release 11.2.3.3.0, the name used for InfiniBand bonding changed from BONDIB0 to IB0 and IB1. These interfaces are changed the same way as the ifcfg-bondib0 interface.

  • As of Oracle Exadata System Software release 11.2.2.1.0, the names used for bonding changed. The names are BONDIB0 for the InfiniBand bonding and BONDETH0 for Ethernet bonding. In earlier releases, the names were BOND0 and BOND1, respectively.

  • The procedure uses the dcli utility and the root user. This significantly reduces the overall time to complete the procedure by running the commands in parallel on the database servers.

  • The dcli utility requires SSH user-equivalence. If SSH user-equivalence is not configured, then some commands must be run explicitly on each database server.

  • The database group file, dbs_group, must exist and be located in the /root directory.

  • Ensure recent backups of the Oracle Cluster Registry (OCR) exist before changing the InfiniBand network information. OCR backups are located in the $Grid_home/cdata/cluster-name directory, where Grid_home represents the location of your Oracle Grid Infrastructure software installation.

  • Starting with Oracle Grid Infrastructure 11g release 2 (11.2), the private network configuration is stored in the Grid Plug and Play (GPNP) profile as well as the OCR. If the GPNP definition is not correct, then Oracle Clusterware CRS does not start. Take a backup of the GPNP profile on all nodes before changing the InfiniBand network information using the following commands:

    $ cd $Grid_home/gpnp/hostname/profiles/peer/
    $ cp -p profile.xml profile.xml.bk
    
  1. Determine if the CLUSTER_INTERCONNECT parameter is used in the Oracle Database and Oracle ASM instances.
    SQL> SELECT inst_id, name,value FROM gv$parameter WHERE name = \
    'cluster_interconnects';

    If the CLUSTER_INTERCONNECT parameter is set in OCR, then no value is returned. If the CLUSTER_INTERCONNECT parameter is defined in the server parameter file (SPFILE), then the query returns an IP addresses for each instance, and they need to be changed to new IP addresses.

    The following is an example of the commands to change the IP addresses for the Oracle ASM instances. In the example, the IP address 192.168.10.1 is the new IP address assigned to BONDIB0 on the server where the +ASM1 instance runs, 192.168.10.2 is the IP address for BONDIB0 on the server where the +ASM2 instance runs, and so on.

    ALTER SYSTEM SET CLUSTER_INTERCONNECTS='192.168.10.1' SCOPE=SPFILE SID='+ASM1';
    ALTER SYSTEM SET CLUSTER_INTERCONNECTS='192.168.10.2' SCOPE=SPFILE SID='+ASM2';
    ALTER SYSTEM SET CLUSTER_INTERCONNECTS='192.168.10.3' SCOPE=SPFILE SID='+ASM3';
    ...

    Use a similar command to change the IP addresses for each Oracle Database instance that was returned.

  2. Verify the assignment of the new InfiniBand network information for all servers.
    Verification should include the InfiniBand IP addresses, netmask, broadcast, and network IP information.
  3. Shut down all cluster-managed services on each database server as the oracle user.
    $ srvctl stop home -o db_home -s state_filename -n node_name
    

    In the preceding command, db_home is the full directory name for the Oracle Database home directory, state_filename is the path name where you want the state file to be written, and node_name is the name of the database server. The following is an example of the command:

    $ srvctl stop home -o /u01/app/oracle/product/11.2.0.3/dbhome_1 -s \
    /tmp/dm02db01_dbhome -n dm02db01
    

    In the preceding example, /u01/app/oracle/product/11.2.0.3/dbhome_1 is the Oracle Database home directory, /tmp/dm02db01_dbhome is the state file name, and dm02db01 is the name of the database server.

  4. Modify the cluster interconnect interface to use the BONDIB0 interface on the first database server.

    Note:

    At this point, only Oracle Clusterware, Oracle Clusterware CRS, and Oracle ASM instances are started.
    1. Log in as the oracle user.
    2. Set $ORACLE_HOME to the Oracle Grid Infrastructure home.
    3. Set the base for the ORACLE_SID environment variable.
      The ORACLE_HOME environment variable must be set to the Oracle Grid Infrastructure home.
      $ ORACLE_SID=+ASM1
      
    4. List the available cluster interfaces.
      $ oifcfg iflist
      

      The following is an example of the output:

      bondeth0 10.128.174.160
      bondeth1 10.128.176.0
      eth0 10.128.174.128
      ib0 192.168.160.0
      ib0 169.254.0.0
      ib1 192.168.160.0
      ib1 169.254.128.0
      
    5. List the currently-assigned cluster interfaces.
      $ oifcfg getif
      

      The following is an example of the output:

      bondeth0 10.204.76.0 global public
      ib0 192.168.16.0 global cluster_interconnect,asm
      ib1 192.168.16.0 global cluster_interconnect,asm
      
    6. Assign the ib0 and ib1 interfaces new IP addresses as global cluster interconnect interfaces.
      oifcfg setif -global ib0/192.168.8.0:cluster_interconnect
      oifcfg setif -global ib1/192.168.8.0:cluster_interconnect
    7. List the current interfaces.
      $ oifcfg getif
      

      The following is an example of the output:

      bondeth0 10.128.174.160 global public
      ib0 192.168.8.0 global cluster_interconnect
      ib1 192.168.8.0 global cluster_interconnect
      

      The old private interface is removed at a later time.

  5. Shut down Oracle Clusterware and Oracle Clusterware CRS on each database server.
    1. Log in as the root user.
    2. Shut down Oracle Clusterware CRS on each database server using the following command:
      # Grid_home/grid/bin/crsctl stop crs -f
      
    3. Disable automatic Oracle Clusterware CRS restart on each database server.
      # Grid_home/grid/bin/crsctl disable crs
      
  6. Change the InfiniBand IP addresses on each Oracle Exadata Storage Server.
    1. Log in as the root user.
    2. Shut down the cell services.
      # cellcli -e alter cell shutdown services all
        Stopping the RS, CELLSRV, and MS services...  The SHUTDOWN of services was successful.
    3. Run the ipconf command.

      The following is an example of the prompts and responses for the ipconf command. Changes are applied after the prompt for basic Integrated Lights Out Manager (ILOM) settings.

      # ipconf
      
      Logging started to /var/log/cellos/ipconf.log
      Interface ib0 is Linked.  hca: mlx4_0
      Interface ib1 is Linked.  hca: mlx4_0
      Interface eth0 is Linked.  driver/mac: ixgbe/00:00:00:00:cd:01
      Interface eth1 is ... Unlinked.  driver/mac: ixgbe/00:00:00:00:cd:02
      Interface eth2 is ... Unlinked.  driver/mac: ixgbe/00:00:00:00:cd:03
      Interface eth3 is ... Unlinked.  driver/mac: ixgbe/00:00:00:00:cd:04
       
      Network interfaces
      Name     State      IP address      Netmask         Gateway         Net type     Hostname
      ib0      Linked
      ib1      Linked
      eth0     Linked
      eth1     Unlinked
      eth2     Unlinked
      eth3     Unlinked
      Warning. Some network interface(s) are disconnected. Check cables and switches and retry
      Do you want to retry (y/n) [y]: n
       
      The current nameserver(s): 192.0.2.10 192.0.2.12 192.0.2.13
      Do you want to change it (y/n) [n]:
      The current timezone: America/Los_Angeles
      Do you want to change it (y/n) [n]:
      The current NTP server(s): 192.0.2.06 192.0.2.12 192.0.2.13
      Do you want to change it (y/n) [n]:
       
      Network interfaces
      Name     State           IP address    Netmask        Gateway       Net type            Hostname
      eth0     Linked       192.0.2.151  255.255.252.0 192.0.2.15    Management   myg.example.com
      eth1     Unlinked
      eth2     Unlinked
      eth3     Unlinked
      bondib0  ib0,ib1      192.168.13.101 255.255.252.0  Private             myg-priv.example.com
      Select interface name to configure or press Enter to continue: bondib0
      Selected interface. bondib0
      IP address or none [192.168.13.101]: 192.168.10.3
      Netmask [255.255.252.0]:255.255.248.0
      Fully qualified hostname or none [myg-priv.example.com]:
      Continue configuring or re-configuring interfaces? (y/n) [y]: n
       
      Select canonical hostname from the list below
      1: myg.example.com
      2: myg-priv.example.com 
      Canonical fully qualified domain name [1]:
       
      Select default gateway interface from the list below
      1: eth0
      Default gateway interface [1]:
       
      Canonical hostname: myg.example.com
      Nameservers: 192.0.2.10 192.0.2.12 192.0.2.13
      Timezone: America/Los_Angeles
      NTP servers: 192.0.2.06 192.0.2.12 192.0.2.13
      Default gateway device: eth0
      Network interfaces
      Name     State      IP address      Netmask         Gateway         Net type     Hostname
      eth0     Linked     192.0.2.151   255.255.252.0 192.0.2.15     Management   myg.example.com
      eth1     Unlinked
      eth2     Unlinked
      eth3     Unlinked
      bondib0  ib0,ib1    192.168.10.3    255.255.248.0                   Private      myg-priv.example.com
      Is this correct (y/n) [y]:
       
      Do you want to configure basic ILOM settings (y/n) [y]: n
      
      Starting the RS services...
      Getting the state of RS services...  running
       
      Starting MS services...
      The STARTUP of MS services was successful.
      A restart of all services is required to put new network configuration into
      effect. MS-CELLSRV communication may be hampered until restart.
      Cell myg successfully altered
       
      Stopping the RS, CELLSRV, and MS services...
      The SHUTDOWN of services was successful.
      ipaddress1=192.168.10.3/21
      
    4. Restart the Oracle Exadata Storage Server.
      # shutdown -r now
  7. Restart the cell services.
    # cellcli -e alter cell restart services all
    
  8. Verify the newly-assigned InfiniBand address on Oracle Exadata Storage Server.
    # cellcli -e list cell detail | grep ipaddress1
    

    The following is an example of the output:

    ipaddress1: 192.168.10.3/21
    
  9. Change the InfiniBand IP addresses on each database server.
    1. Log in as the root user.
    2. Change to the /etc/sysconfig/network-scripts directory.
    3. Copy the ifcfg-bondib0 file.

      The copied file name must not start with ifcfg.

      # cp ifcfg-bondib0 orig_ifcfg-bondib0
      
    4. Edit the ifcfg-bondib0 file to update the IPADDR, NETMASK, NETWORK and BROADCAST fields.

      Example of original ifcfg-bondib0 file:

      #### DO NOT REMOVE THESE LINES ####
      #### %GENERATED BY CELL% ####
      DEVICE=bondib0
      USERCTL=no
      BOOTPROTO=none
      ONBOOT=yes
      IPADDR=192.168.20.8
      NETMASK=255.255.248.0
      NETWORK=192.168.16.0
      BROADCAST=192.168.23.255
      BONDING_OPTS="mode=active-backup miimon=100 downdelay=5000 updelay=5000"
      IPV6INIT=no
      MTU=65520
      

      Example of updated ifcfg-bondib0 file:

      #### DO NOT REMOVE THESE LINES ####
      #### %GENERATED BY CELL% ####
      DEVICE=bondib0
      USERCTL=no
      BOOTPROTO=none
      ONBOOT=yes
      IPADDR=192.168.10.8
      NETMASK=255.255.248.0
      NETWORK=192.168.8.0
      BROADCAST=192.168.15.255
      BONDING_OPTS="mode=active-backup miimon=100 downdelay=5000 updelay=5000"
      IPV6INIT=no
      MTU=65520
      

      Note:

      The MTU size for the InfiniBand interfaces on the database servers should be set as follows:

      • For Oracle Exadata System Software release 11.2.3.3 and later, set the MTU size to 7000.

      • For Oracle Exadata System Software releases earlier than release 11.2.3.3, set the MTU size to 65520 to ensure a high transfer rate to external devices using TCP/IP over InfiniBand such as media servers or NFS servers.

    5. Restart the database server.
      # shutdown -r now
      
    6. Verify the InfiniBand IP address information.
      # ifconfig -a
      

      The following is an example of the BONDIB0 information. It shows the updated InfiniBand network information:

      inet addr:192.168.10.8 Bcast:192.168.15.255 Mask:255.255.248.0
      
  10. Update the cellinit.ora and cellip.ora files on each database server.

    Note:

    Do not edit the cellinit.ora or cellip.ora files when the database or Oracle ASM instance are running. To make changes to the files, perform a procedure similar to the following:

      1. Create a copy of the file.

        cp cellinit.ora cellinit.new
      2. Edit the cellinit.new file with a text editor.

      3. Replace the old cellinit.ora file with the updated cellinit.new file.

        mv cellinit.new cellinit.ora
    1. Log in as the root user.
    2. Change to the /etc/oracle/cell/network-config directory.
    3. Make a backup copy of the cellip.ora file.
      # cp cellip.ora orig_cellip.ora
      

      Note:

      If you are using SSH user-equivalence, then the dcli utility can be used. The following is an example of the dcli command:

      # dcli -l root -g /root/dbs_group "cp cellip.ora orig_cellip.ora"
    4. Make a backup copy of the cellinit.ora file.

      The following is an example of the command:

      # cp cellinit.ora orig_cellinit.ora
      

      Note:

      If you are using SSH user-equivalence, then the dcli utility can be used. The following is an example of the dcli command:

      # dcli -l root -g /root/dbs_group "cp cellinit.ora \
      orig_cellinit.ora"
    5. Change the InfiniBand IP addresses in the cellip.ora file.

      Example of original file:

      cell="192.168.20.1"
      cell="192.168.20.2"
      cell="192.168.20.3"
      cell="192.168.20.4"
      cell="192.168.20.5"
      cell="192.168.20.6"
      cell="192.168.20.7"
      

      Example of updated file:

      cell="192.168.10.1"
      cell="192.168.10.2"
      cell="192.168.10.3"
      cell="192.168.10.4"
      cell="192.168.10.5"
      cell="192.168.10.6"
      cell="192.168.10.7"
      

      Note:

      If you are using SSH user-equivalence, then the dcli utility can be used to copy the updated file from the first database server to the other database servers. The following is an example of using the dcli command:

      # dcli -l root -g /root/dbs_group -f \
      /etc/oracle/cell/network-config/cellip.ora 
      
      # dcli -l root -g /root/dbs_group "mv /root/cellip.ora \
      /etc/oracle/cell/network-config/"
    6. Change the InfiniBand IP addresses in the cellinit.ora file.

      The file is updated with the subnet ID and its subnet mask.

      Example of original file:

      ipaddress="192.168.20.8/21"
      

      Example of updated file:

      ipaddress="192.168.10.8/21"
      

      Update the cellinit.ora file on each database server. The contents of the file is specific to the database server. The dcli utility cannot be used for this step.

    7. Run the ALTER DBSERVER command on each database server to update the /etc/oracle/cell/network-config/cellinit.ora file.
      # dbmcli -e alter dbserver interconnect1 = "ib0"
      # dbmcli -e alter dbserver interconnect2 = "ib1"
      # dbmcli -e alter dbserver interconnect3 = "ib2"
      # dbmcli -e alter dbserver interconnect4 = "ib3"
      
  11. Update the /etc/hosts file on each database server and Oracle Exadata Storage Servers to use the new InfiniBand IP addresses.
    1. Log in as the root user.
    2. Make a backup copy of the /etc/hosts file.
      # cp /etc/hosts /etc/orig_hosts
      
    3. Change the InfiniBand IP addresses for the database servers and Oracle Exadata Storage Server files.
  12. Start Oracle Clusterware as the root user on each server.
    # Grid_home/grid/bin/crsctl start crs
    
  13. Verify the cluster interconnect is using the RDS protocol on each database server by examining the Oracle ASM alert.log.
    The log is in the directory/u01/app/oracle/diag/asm/+asm/+ASM1/trace. An entry similar to the following should be listed for the most-recent Oracle ASM restart:
    CELL interconnect IPC version: Oracle RDS/IP (generic)
    

    For Oracle Exadata System Software releases 11.2.0.2 and later, the following command can be used to verify cluster interconnect. The command is run as the oracle user on each database server.

    $ORACLE_HOME/bin/skgxpinfo
    

    The output from the command should be rds.

    If the instance is not using the RDS protocol over InfiniBand, then relink the Oracle software using the following steps:

    Note:

    Do not use the relink all command to relink the Oracle software.
    1. As the oracle user, shut down any processes using Oracle software.
    2. If you are relinking the Oracle Grid Infrastructure home, then as the root user, run one of the following commands. Do not perform this step if you are not relinking the Oracle Grid Infrastructure home.
      • For Oracle Grid Infrastructure release 12.2.0.1 or higher:

        # Grid_home/crs/install/rootcrs.sh -unlock
      • For Oracle Grid Infrastructure release 12.1.0.1 or 12.1.0.2:

        # Grid_home/crs/install/rootcrs.pl -unlock
        
    3. As the oracle user, change to the $ORACLE_HOME/rdbms/lib directory.
    4. As the oracle user, run the following command:
      $ make -f ins_rdbms.mk ipc_rds ioracle
      
    5. If you are relinking the Oracle Grid Infrastructure home, then as the root user, run one of the following commands. Do not perform this step if you are not relinking the Oracle Grid Infrastructure home.
      • For Oracle Grid Infrastructure release 12.2.0.1 or higher:

        # Grid_home/crs/install/rootcrs.sh -unlock
        # Grid_home/bin/crsctl start crs
      • For Oracle Grid Infrastructure release 12.1.0.1 or 12.1.0.2:

        # Grid_home/crs/install/rootcrs.pl -patch
        
  14. Start all cluster-managed services using the SRVCTL utility.
    1. Log in as the oracle user.
    2. Start the database using the following command, where Oracle_home is your Oracle home directory:
      $ srvctl start home -o Oracle_home \
      -s /tmp/dm02db01_dbhome -n dm02db01
      
    3. Verify the database instances are running.
      $ srvctl status database -d dbm
      
  15. Verify the Oracle ASM and database instances are using the new network settings.
    1. Log in to an Oracle ASM and database instance using SQL*Plus.
    2. Query the cluster interconnect information.
      SQL> SELECT inst_id, name,value FROM gv$parameter WHERE name = \
      'cluster_interconnects';
  16. Delete the old private network.
    $ oifcfg delif -global bondib0/192.168.16.0
    
  17. Verify that the old interface is not present.
    $ oifcfg getif
    bondeth0  10.204.76.0  global public
    bondib0   192.168.8.0  global cluster_interconnect
    
  18. Enable Oracle Clusterware CRS automatic restart on each database server.
    1. Log in as the root user.
    2. Enable Oracle Clusterware CRS.
      # Grid_home/grid/bin/crsctl enable crs
      

      Note:

      To use the dcli utility to enable Oracle Clusterware CRS.

      # dcli -l root -g dbs_group "Grid_home/grid/bin/crsctl \
      enable crs"
  19. Perform a full restart of Oracle Clusterware on all nodes.
  20. Perform a health check of Oracle Exadata Rack using the steps described in My Oracle Support Doc ID 1070954.1.

    Note:

    Oracle EXAchk utility collects data for key software, hardware, and firmware releases, and configuration best practices for Oracle Exadata Rack.

    Oracle recommends you periodically review the current data for key components of Oracle Exadata Rack, and compare them to the supported release levels, and recommended best practices.

    Oracle EXAchk is not a database, network, or SQL performance analysis tool. It is not a continuous monitoring utility, and does not duplicate other monitoring or alerting tools, such as ILOM, or Oracle Enterprise Manager Cloud Control.

  21. Verify the private network configuration using the clusterware verification utility, cluvfy.

4.10 Configuring Network Routing on Database Servers

The tasks for network routing are for boot-time routing or real-time routing.

4.10.1 About Network Routing on Database Servers

There are three logical network interfaces configured on the database servers.

The interface names are:

  • Management network: eth0
  • Client access network: bond1 or bondeth0
  • RDMA Network Fabric network: bond0, bondib0, or ib0 and ib1, or re0 and re1

Note:

The tasks in this section are for Oracle Exadata Database Servers that were configured prior to Oracle Exadata System Software release 11.2.3.2.1.

Starting with Oracle Exadata System Software release 11.2.2.3.0, connections that come in on the management network have their responses sent out on the management network interface, and connections on the client access network have their responses sent out on the client access network interface.

For Oracle Exadata System Software releases earlier than release 11.2.2.3.0, the default route for outbound traffic not destined for an IP address on the management or private InfiniBand network is sent out using the client access network. The tasks in this section modify the routing such that traffic that comes in on the management network has the responses sent out on the management network. Similarly, traffic coming in on the client network has the responses sent out on the client network.

The tasks for network routing are for boot-time routing or real-time routing. The following apply to both types of routing:

  • These tasks are for database servers running a release earlier than Oracle Exadata System Software release 11.2.2.3.0.

  • The following sample IP addresses, netmasks, and gateways are used in the tasks:

    • Management network has IP address 10.149.49.12, netmask 255.255.252.0 (network 10.149.48.0/22), and gateway 10.149.48.1.

    • Client access network has IP address 10.204.78.15, netmask 255.255.255.0 (network 10.204.78.0/24), and gateway 10.1.78.1.

Note:

If the database server has additional networks configured, then files should be set up for the additional networks.

4.10.2 Task 1: Configure for Boot-Time Routing

To configure network routing for boot-time routing, rule and routing files must be created for each database server. The rule and routing files must be located in the /etc/sysconfig/network-scripts directory on each database server. For each Ethernet interface on the management network that has a configured IP address, the database server must have route-ethn and rule-ethn files. For each bonded Ethernet interface, the database server must have route-bondethn and rule-bondethn files. The following are examples of the content in the files:

File Content

/etc/sysconfig/network-scripts/rule-eth0

from 10.149.49.12 table 220
to 10.149.49.12 table 220

/etc/sysconfig/network-scripts/route-eth0

10.149.48.0/22 dev eth0 table 220
default via 10.149.48.1 dev eth0 table 220

/etc/sysconfig/network-scripts/rule-bondeth0

from 10.204.78.0/24 table 210
to 10.204.78.0/24 table 210

/etc/sysconfig/network-scripts/route-bondeth0

10.204.78.0/24 dev bondeth0 table 210
default via 10.204.78.1 dev bondeth0 table 210

4.10.3 Task 2: Configure for Real-Time Routing

To configure the rules on a running system, use the /sbin/ip command to create the same configuration that is performed at startup. The following commands result in the same configuration as the boot-time files:

/sbin/ip rule add from 10.149.49.12 table 220
/sbin/ip rule add to 10.149.49.12 table 220
/sbin/ip route add 10.149.48.0/22 dev eth0 table 220
/sbin/ip route add default via 10.149.48.1 dev eth0 table 220

/sbin/ip rule add from 10.204.78.0/24 table 210
/sbin/ip rule add to 10.204.78.0/24 table 210
/sbin/ip route add 10.204.78.0/24 dev bondeth0 table 210
/sbin/ip route add default via 10.204.78.1 dev bondeth0 table 210

Oracle recommends restarting the database server after running the commands to validate that the boot-time configuration is correct.

4.10.4 Task 3: Verify Network Routing Rules and Routes

Use the following command to verify the network routing rules. The command output shows all the rules on the system.

# /sbin/ip rule list
0:      from all lookup 255 
32762:  from all to 10.204.78.0/24 lookup 210 
32763:  from 10.204.78.0/24 lookup 210 
32764:  from all to 10.149.49.12 lookup 220 
32765:  from 10.149.49.12 lookup 220 
32766:  from all lookup main 
32767:  from all lookup default 

The default routing table is not changed because two new routing tables are created during the preceding tasks. The new routing tables are used when the rules dictate their use. The following commands show how to check the default and new routing tables:

  • To check the default routing table. The following is an example of the command and output.

    # /sbin/ip route list
    10.204.78.0/24 dev bondeth0  proto kernel  scope link  src 10.204.78.15
    192.168.10.0/24 dev bondib0  proto kernel  scope link  src 192.168.10.8 
    10.149.48.0/22 dev eth0  proto kernel  scope link  src 10.149.49.12 
    default via 10.149.52.1 dev bondeth0
    
  • To check that the supplemental tables include the table name with the command. The following is an example of the command and output.

    # /sbin/ip route list table 220
    10.149.48.0/22 dev eth0  scope link 
    default via 10.149.48.1 dev eth0 
    root@dbhost# ip route list table 210
    10.204.78.0/24 dev bondeth0  scope link 
    default via 10.204.78.1 dev bondeth0

4.10.5 Removing Network Routing Configuration for Troubleshooting

The network routing configuration can be removed to configure or troubleshoot Oracle Exadata Database Machine. Use the following commands to remove the rules and routes:

/sbin/ip route del default via 10.149.48.1 dev eth0 table 220
/sbin/ip route del 10.149.48.0/22 dev eth0 table 220
/sbin/ip rule del to 10.149.49.12 table 220
/sbin/ip rule del from 10.149.49.12 table 220

/sbin/ip route del default via 10.204.78.1 dev bondeth0 table 210
/sbin/ip route del 10.204.78.0/24 dev bondeth0 table 210
/sbin/ip rule del to 10.204.78.0/24 table 210
/sbin/ip rule del from 10.204.78.0/24 table 210

4.10.6 Returning to Default Routing

To return to the default network routing, delete the supplemental files from the /etc/sysconfig/network-scripts directory, and then restart the server.

The following is an example of the commands to remove the files, and restart the server:

/bin/rm -f /etc/sysconfig/network-scripts/rule-eth0
/bin/rm -f /etc/sysconfig/network-scripts/route-eth0
/bin/rm -f /etc/sysconfig/network-scripts/rule-bondeth0
/bin/rm -f /etc/sysconfig/network-scripts/route-bondeth0
shutdown -r now

4.11 Changing the DNS Servers

The configuration settings for the Domain Name System (DNS) servers can be changed after initial setup.

All servers and switches in Oracle Exadata should reference the same DNS servers. All domains that Oracle Exadata references should be resolvable through each individual DNS server.

The following topics contain the tasks and procedures for setting the Oracle Exadata servers and switches to the same DNS servers. Oracle recommends changing the servers one at a time.

4.11.1 Change the DNS Server Address on the Database Server

This procedure describes how to change the DNS server address on the database servers.

  1. If you are using Oracle Exadata System Software 20.1.0 or later, use ipconf with the -update and -dns options to modify the DNS settings.
    1. Log in to the database server as the root user.
    2. Check to make sure there are no configuration issues with the new settings.

      Use the following command, where IP_addr_list is a comma-separate list of IP addresses for the DNS servers. If you also want to check the DNS servers for ILOM, then include the -ilom-dns parameter and replace ILOM_DNS_list with a comma-separate list of up to 3 IP addresses for the DNS servers. Including the -dry parameter means the settings are checked, but not applied.

      # ipconf -update -dns IP_addr_list [-ilom-dns ILOM_DNS_list] -dry
    3. Update the DNS settings using the following ipconf command:

      In the following command, IP_addr_list is a comma-separate list of IP addresses for the DNS servers. If you also want to change the DNS servers for ILOM, then include the -ilom-dns parameter and replace ILOM_DNS_list with a comma-separate list of up to 3 IP addresses for the DNS servers. Include the -force parameter to force the update, bypassing all checks.

      # ipconf -update -dns IP_addr_list [-ilom-dns ILOM_DNS_list] [-force]

      Here is an example of the command and its output.

      [root@dbm03adm02]# ipconf -update -dns 10.31.138.25,10.231.225.65
      [Info]: ipconf command line: ipconf -update -dns 10.31.138.25,10.231.225.65
      Logging started to /var/log/cellos/ipconf.log
      [Info]: Updating dns/ntp
      [Info]: Backup existing cell configuration file /opt/oracle.cellos/cell.conf 
      to /var/log/exadatatmp/cell.conf_2020_01_13-17_59_44
      [Info]: Custom changes have been detected in /etc/resolv.conf
      [Info]: Original file /etc/resolv.conf will be saved in /etc/resolv.conf.backupbyExadata
      [Done]: Update cell configuration file /opt/oracle.cellos/cell.conf OK
      
    4. Repeat these steps for each database server.
  2. If you are using Oracle Exadata System Software 19.3.x or earlier, then use the following steps to modify the DNS servers.
    1. Log in to the database server as the root user.
    2. Edit the /etc/resolv.conf file.
      Set the DNS server and domain name using an editor such as vi. There should be a name server line for each DNS server.
      search        example.com
      nameserver    10.7.7.3
      
    3. Set the DNS server in the server ILOM.
      ipmitool sunoem cli 'set /SP/clients/dns nameserver=dns_ip'
      

      In the preceding command, dns_ip is the IP address of the DNS server. If there is more than one DNS server, then enter a comma-separated list such as set /SP/clients/dns nameserver=dns_ip1,dns_ip2,dns_ip3.

    4. Repeat these steps for each database server.

4.11.2 Change the DNS Server on Oracle Exadata Storage Server

You can set or change the DNS server on each Oracle Exadata Storage Server.

  1. Log in to the Oracle Exadata Storage Server as the root user.
  2. Use the ipconf utility to change the DNS settings.
    1. Check to ensure there are no configuration issues with the new settings.

      Use the following command, where IP_addr_list is a comma-separate list of IP addresses for the DNS servers. If you also want to check the DNS servers for ILOM, then include the -ilom-dns parameter and replace ILOM_DNS_list with a comma-separate list of up to 3 IP addresses for the DNS servers. Including the -dry parameter means the settings are checked, but not applied.

      Note:

      If you use host names for the DNS servers instead of IP addresses, then the cellwall service will fail when restarted. Use only IP addresses when defining NTP and DNS servers.
      # ipconf -update -dns IP_addr_list [-ilom-dns ILOM_DNS_list] -dry
    2. Update the DNS settings using the ipconf command:

      In the following command, IP_addr_list is a comma-separate list of IP addresses for the DNS servers. If you also want to change the DNS servers for ILOM, then include the -ilom-dns parameter and replace ILOM_DNS_list with a comma-separate list of up to 3 IP addresses for the DNS servers. Include the -force parameter to force the update, bypassing all checks.

      # ipconf -update -dns IP_addr_list [-ilom-dns ILOM_DNS_list] [-force]

      Here is an example of the command and its output.

      [root@dbm03celadm06]# ipconf -update -dns 10.31.138.25,10.231.225.65
      [Info]: ipconf command line: ipconf -update -dns 10.31.138.25,10.231.225.65
      Logging started to /var/log/cellos/ipconf.log
      [Info]: Updating dns/ntp
      [Info]: Backup existing cell configuration file /opt/oracle.cellos/cell.conf 
      to /var/log/exadatatmp/cell.conf_2020_01_13-17_59_44
      [Info]: Custom changes have been detected in /etc/resolv.conf
      [Info]: Original file /etc/resolv.conf will be saved in /etc/resolv.conf.backupbyExadata
      [Done]: Update cell configuration file /opt/oracle.cellos/cell.conf OK
      

4.11.3 Change the DNS Server Address on the Cisco RoCE Network Fabric Switches

This procedure describes how to change the DNS server address on the Cisco RoCE Network Fabric switches.

  1. Access the switch using SSH, and log in as the admin user with the administrator password.

    Note:

    If SSH has not been configured, then use Telnet to access the switch as the admin user.
  2. Review the current configuration.
    Switch# show running-config
    ...
    ip domain-name  example.com
    ip name-server 192.0.2.2 198.51.100.4 203.0.113.2 use-vrf management
    ...
  3. Erase the current DNS server information.
    Switch# configure terminal
    Enter configuration commands,one per line.End with CNTL/Z.
    Switch(config)# no ip name-server 192.0.2.2 use-vrf management
    Switch(config)# no ip name-server 198.51.100.4 use-vrf management
    Switch(config)# no ip name-server 203.0.113.2 use-vrf management
    Switch(config)# end
    Switch# copy running-config startup-config
    [########################################] 100%
    Copy complete, now saving to disk (please wait)...
    Copy complete.

    Note:

    Each current DNS IP address to be changed needs to be erased. Invalid IP addresses must also be erased.
  4. Set the domain name and then configure up to three DNS servers, as shown in the following example:
    Switch# configure terminal
    Enter configuration commands,one per line.End with CNTL/Z.
    Switch(config)# ip domain-name example.com
    Switch(config)# ip name-server 192.0.2.3 use-vrf management
    Switch(config)# ip name-server 198.51.100.5 use-vrf management
    Switch(config)# ip name-server 203.0.113.1 use-vrf management
    Switch(config)# end
    Switch# copy running-config startup-config
    [########################################] 100%
    Copy complete, now saving to disk (please wait)...
    Copy complete.
  5. Verify the changes.
    Switch# show running-config

    The command output should include entries for the new DNS servers.

    For example:

    
    !Command: show running-config
    ...
    ip domain-name example.com
    ip name-server 192.0.2.3 198.51.100.5 203.0.113.1 use-vrf management
    ...
  6. Exit the session.
    Switch# exit

4.11.4 Change the DNS Server Address on the Cisco 9300 Series Management Network Switch

This procedure describes how to change the DNS server address on the Cisco 9300 Series Management Network Switch.

  1. Access the switch using SSH, and log in as the admin user with the administrator password.

    Note:

    If SSH has not been configured, then use Telnet to access the switch as the admin user.
  2. Review the current configuration.
    Switch# show running-config
    ...
    ip domain-name  example.com
    ip name-server 192.0.2.2 198.51.100.4 203.0.113.2
    ...
  3. Erase the current DNS server information.
    Switch# configure terminal
    Enter configuration commands,one per line.End with CNTL/Z.
    Switch(config)# no ip name-server 192.0.2.2
    Switch(config)# no ip name-server 198.51.100.4
    Switch(config)# no ip name-server 203.0.113.2
    Switch(config)# end
    Switch# copy running-config startup-config
    [########################################] 100%
    Copy complete, now saving to disk (please wait)...
    Copy complete.

    Note:

    Each current DNS IP address to be changed needs to be erased. Invalid IP addresses must also be erased.
  4. Set the domain name and then configure up to three DNS servers, as shown in the following example:
    Switch# configure terminal
    Enter configuration commands,one per line.End with CNTL/Z.
    Switch(config)# ip domain-name example.com
    Switch(config)# ip name-server 192.0.2.3
    Switch(config)# ip name-server 198.51.100.5
    Switch(config)# ip name-server 203.0.113.1
    Switch(config)# end
    Switch# copy running-config startup-config
    [########################################] 100%
    Copy complete, now saving to disk (please wait)...
    Copy complete.
  5. Verify the changes.
    Switch# show running-config

    The command output should include entries for the new DNS servers.

    For example:

    
    !Command: show running-config
    ...
    ip domain-name example.com
    ip name-server 192.0.2.3 198.51.100.5 203.0.113.1
    ...
  6. Exit the session.
    Switch# exit

4.11.5 Change the DNS Server Address on the Cisco 4948 Ethernet Switch

This procedure describes how to change the DNS server address on the Cisco 4948 Ethernet switch.

  1. Access the switch using one of the following methods, based on the firmware release:
    • Firmware release 12.2 or later:

      Access the switch using SSH, and log in as the admin user with the administrator password.

      Note:

      If SSH has not been configured, then use Telnet to access the switch as the admin user.
    • Firmware earlier than release 12.2:

      Access the switch using Telnet, and log in as the administrator using the administrative password.

  2. Change to enable mode.
    Switch> enable
    
    When prompted for a password, use the administrator password.
  3. Review the current configuration.
    Switch# show running-config
    
  4. Erase the current DNS server information.
    Switch# configure terminal
    Enter configuration commands,one per line.End with CNTL/Z.
    Switch(config)# no ip name-server 192.0.2.2
    Switch(config)# no ip name-server 198.51.100.4
    Switch(config)# no ip name-server 203.0.113.2
    Switch(config)# end
    Switch# write memory
    Building configuration...
    Compressed configuration from 2603 bytes to 1158 bytes [OK ]
    

    Note:

    Each current DNS IP address to be changed needs to be erased. Invalid IP addresses must also be erased.
  5. Set the domain name and then configure up to three DNS servers, as shown in the following example:
    Switch# configure terminal
    Enter configuration commands,one per line.End with CNTL/Z.
    Switch(config)# ip domain-name example.com
    Switch(config)# ip name-server 192.0.2.3
    Switch(config)# ip name-server 198.51.100.5
    Switch(config)# ip name-server 203.0.113.1
    Switch(config)# end
    Switch# write memory
    Building configuration...
    Compressed configuration from 2603 bytes to 1158 bytes [OK ]
    
  6. Verify the changes.
    Switch# show running-config
    

    The command output should include entries for the new DNS servers.

    For example:

    Building configuration...
    ...
    ip domain-name example.com
    ip name-server 192.0.2.3
    ip name-server 198.51.100.5
    ip name-server 203.0.113.1
    ...
    
  7. Save the configuration.
    Switch# copy running-config startup-config
    Destination filename [startup-config]? 
    Building configuration...
    Compressed configuration from 14343 bytes to 3986 bytes[OK]
    
  8. Exit the session.
    Switch# exit
    

4.11.6 Change the DNS Server Address on the InfiniBand Network Fabric Switch

All configuration procedures should be done as the ilom-admin user using the Integrated Lights Out Manager (ILOM) interface. Use one of the following procedures to change the DNS server, depending on firmware release:

  1. If your switch is using firmware 2.0.4 or later:
    1. Log in to the InfiniBand Network Fabric switch as the ilom-admin user.
    2. Set the DNS address using one of the following options:
      • Using the ILOM web interface:

        Select the Configuration tab and set the DNS server addresses.

      • Using the command line interface, set the DNS server using the following command:

        set /SP/clients/dns nameserver=dns_ip
        

        In the preceding command, dns_ip is the new IP address of the DNS server. If there is more than one DNS server, then enter a comma-separated list such as set /SP/clients/dns nameserver=dns_ip1,dns_ip2,dns_ip3.

  2. If your switch is using firmware earlier than 2.0.4:
    1. Log in to the InfiniBand Network Fabric switch as the root user.
    2. Edit the /etc/resolv.conf file.
      Set the DNS server and domain name using an editor such as vi. There should be a line for each DNS server.
    3. Save the file.

4.11.7 Change the DNS Server on the KVM Switch

This procedure describes how to change the DNS server configuration using the KVM switch.

Note:

  • The KVM switch is only available in Oracle Exadata Database Machine X2-2 racks and Oracle Exadata Storage Expansion Racks with Oracle Exadata Storage Server with Sun Fire X4270 M2 Servers.

  • The KVM switch does not support NTP.

  1. Log in to the KVM switch. You can log in directly on the KVM switch or access the switch using the host name or IP address over the Internet.
  2. Select Appliance from Unit View.
  3. Select DNS from Appliance Settings.
  4. Select DNS Configuration.
  5. Enter the DNS configuration. The following configuration options are available:
    • DNS Mode (Manual, DHCP, DHCPv6)

    • DNS Server Addresses (Primary, Secondary, Tertiary)

  6. Click Save.

4.12 Changing the NTP Servers

The configuration settings for the Network Time Protocol (NTP) servers can be changed after initial setup.

All servers and switches in Oracle Exadata should reference the same NTP servers so that the servers are synchronized to the same time.

The following topics contain the tasks and procedures for setting the Oracle Exadata servers and switches to the same NTP server addresses. Oracle recommends changing the servers one at a time.

Note:

  • These procedures assume that there is not a large time discrepancy between the two NTP servers. Use the command ntpq -p to see if the system is healthy first before performing the NTP server update.

  • Up to two NTP servers can be configured for use with Oracle Exadata.

4.12.1 Set the NTP Server Address on the Database Servers

You can set or change the Network Time Protocol (NTP) server address on the database server of Oracle Exadata.

  1. If you are using Oracle Exadata System Software 20.1.0 or later, use ipconf to set or change the NTP server.
    1. Log in to the database server as the root user.
    2. Check to make sure there are no configuration issues with the new settings.

      Use the following command, where IP_addr_list is a comma-separate list of IP addresses for the NTP servers. If you are also modifying the NTP servers for ILOM, then include the -ilom-ntp parameter and replace ILOM_NTP_list with a comma-separate list of up to 2 IP addresses for the NTP servers. Including the -dry parameter means the settings are checked, but not applied.

      # ipconf -update -ntp IP_addr_list [-ilom-ntp ILOM_NTP_list] -dry
    3. Update the NTP settings using the following ipconf command:

      In the following command, IP_addr_list is a comma-separate list of IP addresses for the NTP servers. If you also want to change the NTP servers for ILOM, then include the -ilom-ntp parameter and replace ILOM_NTP_list with a comma-separate list of up to 3 IP addresses for the NTP servers. Include the -force parameter to force the update, bypassing all checks.

      # ipconf -update -ntp IP_addr_list [-ilom-ntp ILOM_NTP_list] [-force]

      If the timestamp obtained from the new NTP server differs from the current time known to the system by more than 1 second (time step), then the command errors out and does not update the NTP settings. You can use the -force option with the command line to override this check.

      Here is an example of the command and its output.

      [root@dbm03adm02 oracle.cellos]# ipconf -update -ntp 10.31.138.20,10.31.16.1 
      -ilom-ntp 10.31.138.20,10.31.16.1
      [Info]: ipconf command line: ipconf -update -ntp 10.31.138.20,10.31.16.1 
      -ilom-ntp 10.31.138.20,10.31.16.1
      Logging started to /var/log/cellos/ipconf.log
      [Info]: Updating dns/ntp
      [Warning]: ntpd service is not running
      [Info]: Backup existing cell configuration file /opt/oracle.cellos/cell.conf to 
      /var/log/exadatatmp/cell.conf_2020_01_13-17_54_56
      [Info]: Restart ntpd service
      Shutting down ntpd:                                        [  OK  ]
      Starting ntpd:                                             [  OK  ]
      [Done]: Update cell configuration file /opt/oracle.cellos/cell.conf OK
      
    4. Repeat these steps for each database server.
  2. If the database server operating system is Oracle Linux 7 but using Oracle Exadata System Software 19.3.x or earlier, follow these instructions:
    1. Stop the time synchronization service on the database server.
      # systemctl stop chronyd
      
    2. Update the /etc/chrony.conf file with the IP address of the new NTP server.
    3. Start the time synchronization service on the database server.
      # systemctl start chronyd
      
    4. Repeat Steps 2.a through 2.c for each database server.
  3. If the database server operating system is Oracle Linux 5 or 6:
    1. Stop the NTP services on the database server.
      # service ntpd stop
      
    2. Update the ntp.conf file with the IP address of the new NTP server.
    3. Start the NTP services on the database server.
      # service ntpd start
      
    4. Repeat Steps 3.a through 3.c for each database server.

4.12.2 Change the NTP Server on Oracle Exadata Storage Server

You can set or change the Network Time Protocol (NTP) server on each Oracle Exadata Storage Server.

  1. Log in to the cell as the root user.
  2. Use the ipconf utility to change the NTP settings.

    Note:

    Oracle Exadata System Software releases 23.1.2 and 22.1.11 (released in May 2023) contain an update to ipconf with improved handling of time variations resulting from changing NTP servers. Oracle recommends using this update to avoid previous issues with NTP server changes.

    1. Check to ensure there are no configuration issues with the new settings.

      Use the following command, where IP_addr_list is a comma-separate list of IP addresses for the NTP servers. If you are also modifying the NTP servers for ILOM, then include the -ilom-ntp parameter and replace ILOM_NTP_list with a comma-separate list of up to 2 IP addresses for the NTP servers. Including the -dry parameter means the settings are checked, but not applied.

      Note:

      If you use host names for the NTP servers instead of IP addresses, then the cellwall service will fail when restarted. Use only IP addresses when defining NTP and DNS servers.
      # ipconf -update -ntp IP_addr_list [-ilom-ntp ILOM_NTP_list] -dry
    2. Update the NTP settings using the ipconf command:

      In the following command, IP_addr_list is a comma-separate list of IP addresses for the NTP servers. If you also want to change the NTP servers for ILOM, then include the -ilom-ntp parameter and replace ILOM_NTP_list with a comma-separate list of up to 3 IP addresses for the NTP servers. Include the -force parameter to force the update, bypassing all checks.

      # ipconf -update -ntp IP_addr_list [-ilom-ntp ILOM_NTP_list] [-force]

      If the timestamp obtained from the new NTP server differs from the current time known to the system by more than 1 second (time step), then the command fails and does not update the NTP settings. You can use the -force option with the command line to override this check.

      Here is an example of the command and its output.

      [root@dbm03adm02 oracle.cellos]# ipconf -update -ntp 10.31.138.20,10.31.16.1 
      -ilom-ntp 10.31.138.20,10.31.16.1
      [Info]: ipconf command line: ipconf -update -ntp 10.31.138.20,10.31.16.1 
      -ilom-ntp 10.31.138.20,10.31.16.1
      Logging started to /var/log/cellos/ipconf.log
      [Info]: Updating dns/ntp
      [Warning]: ntpd service is not running
      [Info]: Backup existing cell configuration file /opt/oracle.cellos/cell.conf to 
      /var/log/exadatatmp/cell.conf_2020_01_13-17_54_56
      [Info]: Restart ntpd service
      Shutting down ntpd:                                        [  OK  ]
      Starting ntpd:                                             [  OK  ]
      [Done]: Update cell configuration file /opt/oracle.cellos/cell.conf OK
      

4.12.3 Set the NTP Server Address on the Cisco RoCE Network Fabric Switches

This procedure describes how to change the Network Time Protocol (NTP) server address on the Cisco RoCE Network Fabric switches.

  1. Access the switch using SSH, and log in as the admin user with the administrator password.

    Note:

    If SSH has not been configured, then use Telnet to access the switch as the admin user.
  2. Review the current configuration.
    Switch# show running-config

    The command output includes entries for the current NTP servers.

    For example:

    ...
    ntp server 10.10.10.1 prefer use-vrf management
    ntp server 10.8.8.1 use-vrf management
    ...
  3. Erase the current NTP server configuration.
    Switch# configure terminal
    Enter configuration commands,one per line.End with CNTL/Z.
    Switch(config)# no ntp server 10.10.10.1
    Switch(config)# no ntp server 10.8.8.1
    Switch(config)# end
    Switch# copy running-config startup-config
    [########################################] 100%
    Copy complete, now saving to disk (please wait)...
    Copy complete.

    Note:

    Each current NTP IP address being changed needs to be erased. Invalid IP addresses must also be erased.
  4. Configure up to two NTP servers.

    In this example, the new NTP server IP addresses are 10.7.7.1 and 10.9.9.1.

    Switch# configure terminal
    Enter configuration commands,one per line.End with CNTL/Z.
    Switch(config)# feature ntp
    Switch(config)# ntp server 10.7.7.1 prefer use-vrf management
    Switch(config)# ntp server 10.9.9.1 use-vrf management
    Switch(config)# clock protocol ntp
    Switch(config)# end
  5. Verify the changes.
    Switch# show running-config

    The command output should include entries for the new NTP servers.

    For example:

    ...
    ntp server 10.7.7.1 prefer use-vrf management
    ntp server 10.9.9.1 use-vrf management
    ...
  6. Save the configuration.
    Switch# copy running-config startup-config
    [########################################] 100%
    Copy complete, now saving to disk (please wait)...
    Copy complete.
  7. Exit from the session.
    Switch# exit

4.12.4 Set the NTP Server Address on the Cisco 9300 Series Management Network Switch

This procedure describes how to change the Network Time Protocol (NTP) server address on the Cisco 9300 Series Management Network Switch.

  1. Access the switch using SSH, and log in as the admin user with the administrator password.

    Note:

    If SSH has not been configured, then use Telnet to access the switch as the admin user.
  2. Review the current configuration.
    Switch# show running-config

    The command output includes entries for the current NTP servers.

    For example:

    ...
    ntp server 10.10.10.1 prefer use-vrf default
    ntp server 10.8.8.1 use-vrf default
    ...
  3. Erase the current NTP server configuration.
    Switch# configure terminal
    Enter configuration commands,one per line.End with CNTL/Z.
    Switch(config)# no ntp server 10.10.10.1
    Switch(config)# no ntp server 10.8.8.1
    Switch(config)# end
    Switch# copy running-config startup-config
    [########################################] 100%
    Copy complete, now saving to disk (please wait)...
    Copy complete.

    Note:

    Each current NTP IP address being changed needs to be erased. Invalid IP addresses must also be erased.
  4. Configure up to two NTP servers.

    In this example, the new NTP server IP addresses are 10.7.7.1 and 10.9.9.1.

    Switch# configure terminal
    Enter configuration commands,one per line.End with CNTL/Z.
    Switch(config)# feature ntp
    Switch(config)# ntp server 10.7.7.1 prefer
    Switch(config)# ntp server 10.9.9.1
    Switch(config)# clock protocol ntp
    Switch(config)# end
  5. Verify the changes.
    Switch# show running-config

    The command output should include entries for the new NTP servers.

    For example:

    ...
    ntp server 10.7.7.1 prefer use-vrf default
    ntp server 10.9.9.1 use-vrf default
    ...
  6. Save the configuration.
    Switch# copy running-config startup-config
    [########################################] 100%
    Copy complete, now saving to disk (please wait)...
    Copy complete.
  7. Exit from the session.
    Switch# exit

4.12.5 Set the NTP Server Address on the Cisco 4948 Ethernet Switch

You can set or change the Network Time Protocol (NTP) server on the Cisco 4948 Ethernet switch.

  1. Access the switch using one of the following methods, based on the firmware version:
    1. Firmware versions earlier than version 12.2: Access the switch using Telnet, and log in as the administrator using the administrative password.
    2. Firmware version 12.2 or later: Access the switch using SSH, and log in as the admin user with the admin password.

      Note:

      If SSH has not been configured, then use Telnet to access the switch as the admin user.
  2. Change to enable mode. When prompted for a password, use the administrator password.
    Switch> enable
  3. Review the current configuration.
    Switch# show running-config
  4. Erase the current NTP server configuration.

    In this example, the current IP addresses are 10.10.10.1 and 10.8.8.1.

    Switch# configure terminal
    Enter configuration commands,one per line.End with CNTL/Z.
    Switch(config)# no ntp server 10.10.10.1
    Switch(config)# no ntp server 10.8.8.1
    Switch(config)# end
    Switch# write memory
    Building configuration...
    Compressed configuration from 2603 bytes to 1158 bytes [OK ]

    Note:

    Each current NTP IP address being changed needs to be erased. Invalid IP addresses must also be erased.
  5. Configure up to two NTP servers.
    Switch# configure terminal
    Enter configuration commands,one per line.End with CNTL/Z.
    Switch(config)# ntp server 10.7.7.1 prefer
    Switch(config)# ntp server 10.9.9.1
    Switch(config)# end
    Switch# write memory
    Building configuration...
    Compressed configuration from 2603 bytes to 1158 bytes [OK ]
  6. Verify the changes.
    Switch# show running-config

    The command output should include entries for the new NTP servers.

    For example:

    
    Building configuration...
    ...
    ntp server 10.7.7.1 prefer
    ntp server 10.9.9.1
    ...
  7. Save the configuration.
    Switch# copy running-config startup-config
    Destination filename [startup-config]? 
    Building configuration...
    Compressed configuration from 14343 bytes to 3986 bytes[OK]
  8. Exit from the session.
    Switch# exit

4.12.6 Set the NTP Server Address on the InfiniBand Network Fabric Switch

You can set or change the Network Time Protocol (NTP) server address on the InfiniBand Network Fabric switch.

Note:

Do not manually edit the files on the InfiniBand Network Fabric switches.
  1. Log in as the ilom-admin user.
  2. Set the date, time zone, and NTP server using one of the following methods:
    • Using the Configuration page on the Integrated Lights Out Manager (ILOM) graphical interface.

    • Manually, using the following commands:

      set /SP/clock timezone=preferred_tz
      set /SP/clients/ntp/server/1 address=ntp_ip1
      set /SP/clients/ntp/server/2 address=ntp_ip2
      set /SP/clock usentpserver=enabled 
      

      In the preceding commands, preferred_tz is the preferred time zone, and ntp_ip1 and ntp_ip2 are the NTP server IP addresses. It is not necessary to configure both NTP servers, but at least one should be configured.

4.13 Changing the Time Zone Settings

You can change the time zones on Oracle Exadata after initial configuration and deployment.

The following components need to be modified when changing the time zone settings:

  • Storage servers
  • Database servers
  • RDMA Network Fabric switches
  • Ethernet switch

Note:

Cell services and Oracle Clusterware services must be stopped before changing the time zone settings.

4.13.1 Change Time Zone Settings on Storage Servers

Use these steps to change the time zone setting on storage servers.

Complete the setting changes to all storage servers before changing the settings on the database servers.

  1. Log in as the root user on the database server node.
  2. Stop the Oracle Clusterware stack on all nodes.

    Use a command similar to the following, where Grid_home is the location of your Oracle Grid Infrastructure software installation.

    # Grid_home/bin/crsctl stop crs
  3. Log in as the root user on the storage server.
  4. Stop the processes on the storage server.
    # cellcli -e alter cell shutdown services all
    
  5. Run the ipconf script.
    # /opt/oracle.cellos/ipconf
    
    1. Proceed through the script prompts until you get to the time zone prompts. Do not change any other settings.

      Each Country is identified by a number and once it is selected, you will see another set of numbers for the different time zones within that country. The following is an example of the time zone prompts for changing the time zone from Antarctica to the United States. The number for the United States is 230.

      The current timezone: Antarctica/McMurdo
      Do you want to change it (y/n) [n]: y
       
      Setting up local time...
       
      1) Andorra
      2) United Arab Emirates
      3) Afghanistan
      .
      .
      .
      15) Aruba
      16) Aaland Islands
      Select country by number, [n]ext, [l]ast: 230
      
      Selected country: United States (US). Now choose a zone
       
      1) America/New_York
      2) America/Detroit
      3) America/Kentucky/Louisville
      .
      .
      .
      15) America/North_Dakota/New_Salem
      16) America/Denver
      Select zone by number, [n]ext: 1
      
      Selected timezone: America/New_York
      Is this correct (y/n) [y]:
      
    2. Proceed through the rest of the script prompts, but do not change any other values. Do not change ILOM settings.

    After responding to all change requests, the script generates new files.

    Info. Run /opt/oracle.cellos/validations/init.d/saveconfig
    /opt/oracle.cellos/validations/init.d/saveconfig started at 2017_05_12_10_28
    Copy cell configs from /opt/oracle/cell/cellsrv/deploy/config to /opt/oracle.cellos/iso/lastGoodConfig/cell/cellsrv/deploy/config
    [INFO] Copying ssh host settings from //etc/ssh to /opt/oracle.cellos/iso/lastGoodConfig/etc/ssh ...
    uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel)
  6. Verify the time zone changes have been propagated to the following files. Examples of the changes are shown for the files.
    • /opt/oracle.cellos/cell.conf:

      $VAR1 = {
                'Hostname' => 'xdserver.example.com',
                'Ntp servers' => [
                                   '10.141.138.1'
                                 ],
                'Timezone' => 'America/New_York',
      
    • /etc/sysconfig/clock:

      ZONE="America/New_York"
      UTC=false
      ARC=false
      #ZONE="Antarctica/McMurdo"
      #ZONE="America/New_York"
      #ZONE="America/Los_Angeles"
      

      The uncommented ZONE value (line not preceded by #) is the current setting.

    • /etc/localtime:

      Run the command strings /etc/localtime command to verify the change. The last line shows the time zone.

      ~^Ip
      EST5EDT,M3.2.0,M11.1.0
      
  7. Restart the storage server.
  8. Use the date command to see the current time zone. The following is an example of the output from the command:
    # date
    Tue Jan 29 12:44:21 EDT 2019
    
  9. Review the $ADR_BASE/diag/asm/cell/host_name/alert.log file. The time that processes were restarted should match the current and correct time.
  10. Repeat steps 3 through 9 on each cell.

4.13.2 Change Time Zone Settings on the Database Servers

After modifying the time zone setting on the storage cells, you can change the time zone setting on the database servers.

Before starting this procedure, you should have already stopped the Oracle Clusterware stack and modified the time zone on the storage cells, as described in Change Time Zone Settings on Storage Servers.

  1. As the root user, copy the file /etc/localtime from any of the storage cells to the database server.
    # scp root@cell_name:/etc/localtime /etc/localtime
  2. Copy the file /etc/sysconfig/clock from any of the storage cells to the database server.
    # scp root@cell_name:/etc/sysconfig/clock /etc/sysconfig/clock
    
  3. Change the Oracle Clusterware settings to prevent the CRS stack from starting automatically after restarting the database server.
    # Grid_home/bin/crsctl disable crs
    
  4. Reboot the database server.
    shutdown -r now
  5. Verify the date has been changed on the database server.

    Use the date command verify the change for the time zone.

    # date
    Tue Jan 29 13:08:46 EDT 2017
  6. Change the Oracle Clusterware settings to automatically restart the CRS stack after restarting the database server.
    # Grid_home/bin/crsctl enable crs
    
  7. Start the CRS stack on the database server.
    # Grid_home/bin/crsctl start crs
    

4.13.3 Change Time Zone Settings on the InfiniBand Network Fabric Switches

You can change the time zone setting on the InfiniBand Network Fabric switches.

  1. Connect to the InfiniBand Network Fabric switch using SSH as the ilom-admin user.
    ssh -l ilom-admin switch_hostname
  2. Use the version command to check the version of the switch software.

    The following is an example of the output from the command:

    $ version
    SUN DCS 36p version: 2.2.2-7
    Build time: Aug 26 2016 10:00:25
    SP board info:
    Manufacturing Date: 2012.06.23
    Serial Number: "ABCDE1234"
    Hardware Revision: 0x0007
    Firmware Revision: 0x0000
    BIOS version: SUN0R100
    BIOS date: 06/22/2010
  3. Administer the switch as follows, depending on the software version:
    • If the software version is 1.1.3-2 or later, then administration of the switch is done using ILOM as follows:

      1. Log in to ILOM using the web address http://switch_alias.

      2. Select the Configuration tab. Or, in the left-side navigation, select ILOM Administration, then Date and Time.

      3. Select the Clock tab.

      4. Click the Enabled check box next to Synchronize Time Using NTP.

      5. In the Server 1 text box, enter the correct IP address for the primary NTP server.

      6. In the Server 2 text box, type the IP address of the secondary NTP server you want to use.

      7. Click Save.

    • If the software version is earlier than 1.1.3-2, then log in to the switch using SSH as follows:

      1. Log in to the switch using the following command:

        # ssh -l root {switch_ip | switch_name}
        
      2. Stop the ntpd daemon using the following command:

        # service ntpd stop
        
      3. Save a copy of the /etc/localtime file using the following command:

        # cp /etc/localtime /etc/localtime.backup
        
      4. Identify the file in the /usr/share/zoneinfo directory for the time zone. The following is an example for the United States:

        #cd /usr/share/zoneinfo/US
        #ls 
        Alaska  Aleutian  Arizona  Central  Eastern  East-Indiana  Hawaii 
        Indiana-Starke  Michigan  Mountain  Pacific  Samoa
        
      5. Copy the appropriate file to the /etc/localtime directory. The following is an example of the command:

        # cp /usr/share/zoneinfo/US/Eastern /etc/localtime
        
      6. Manually set the current date and time to values near the current time.

      7. Synchronize the time to the NTP server for the new time zone using the date command with the MMddHHmmCCyy format for Month, Day, Hour, Minute, Century, Year. The following is an example of the command:

        # date 013110452013
        # ntpd -q -g
        
      8. Validate the date using the following command:

        # date
        
      9. Restart the nptd daemon using the following command:

        # service ntpd start
        

4.13.4 Change Time Zone Settings on the Cisco RoCE Network Fabric Switches

You can change the time zone setting on the Cisco RoCE Network Fabric Switches.

  1. Access the switch using SSH, and log in as the admin user.
  2. Use the configure terminal command to begin configuration.
  3. Set the clock using the following commands:
    1. To change the time zone:
      clock timezone zone hours_offset minutes_offset
      

      In the preceding syntax:

      • zone is the name of the time zone to be displayed when standard time is in effect. The default time zone is UTC.

      • hours_offset is the hours offset from UTC

      • minutes_offset are the minutes offset from UTC

    2. To set summer time (daylight savings time) in areas where it starts and ends on a particular day of the week each year, use the following command:
      clock summer-time zone recurring [week day month  hh:mm week day month  hh:mm[offset]]
      

      In the preceding syntax, the values of week day month hh:mm are listed twice, once for the starting time and again for the ending time.

      • recurring specifies that summer time starts and ends on the specified days every year. Summer time is disabled by default. If you specify clock summer-time zone recurring without any other parameters, the summer time rules default to the United States rules.

      • week is the week of the month, between 1 to 5. The first occurrence of week is the start date and the second occurrence is the end date.

      • day is the day of the week, such as Sunday or Monday. The first occurrence of week is the start date and the second occurrence is the end date.

      • month is the month, such as January or June. The first occurrence of week is the start date and the second occurrence is the end date.

      • hh:mm is the time in 24-hour format in hours and minutes, such as 15:42

      • offset is the number of minutes to add during summer time. The default is 60.

Example 4-2 Setting the Time Zone on the RoCE Network Fabric Switch

The following is an example of setting the time zone to US Eastern time with summer time enabled:

dbm0sw-rocea0#configure terminal
Enter configuration commands, one per line.  End with CNTL/Z.
dbm0sw-rocea0(config)#clock timezone EST -5 0
dbm0sw-rocea0(config)#clock summer-time EDT recurring
dbm0sw-rocea0(config)#end
dbm0sw-rocea0#copy running-config startup-config
Building configuration...
Compressed configuration from 6421 bytes to 2041 bytes[OK]
dbm0sw-rocea0#show clock
12:03:43.516 EDT Wed May 12 2012
dbm0sw-rocea0#

4.13.5 Change Time Zone Settings on the Cisco Management Network Switch

You can change the time zone setting on the Ethernet switch.

  1. Use Telnet to connect to the Ethernet switch.
  2. Use the enable command to enter privileged mode.
  3. Use the configure terminal command to begin configuration.
  4. Set the clock using the following commands:
    1. To change the time zone:
      clock timezone zone hours_offset minutes_offset
      

      In the preceding syntax:

      • zone is the name of the time zone to be displayed when standard time is in effect. The default time zone is UTC.

      • hours_offset is the hours offset from UTC

      • minutes_offset are the minutes offset from UTC

    2. To set summer time (daylight savings time) in areas where it starts and ends on a particular day of the week each year, use the following command:
      clock summer-time zone recurring [week day month  hh:mm week day month  hh:mm[offset]]
      

      In the preceding syntax, the values of week day month hh:mm are listed twice, once for the starting time and again for the ending time.

      • recurring specifies that summer time starts and ends on the specified days every year. Summer time is disabled by default. If you specify clock summer-time zone recurring without any other parameters, the summer time rules default to the United States rules.

      • week is the week of the month, between 1 to 5. The first occurrence of week is the start date and the second occurrence is the end date.

      • day is the day of the week, such as Sunday or Monday. The first occurrence of week is the start date and the second occurrence is the end date.

      • month is the month, such as January or June. The first occurrence of week is the start date and the second occurrence is the end date.

      • hh:mm is the time in 24-hour format in hours and minutes, such as 15:42

      • offset is the number of minutes to add during summer time. The default is 60.

Example 4-3 Setting the Time Zone on the Ethernet Switch

The following is an example of setting the time zone to US Eastern time with summer time enabled:

$ telnet dbmcisco-ip
Connected to switch name
Escape character is '^]'.

User Access Verification

Password: 
dmbcisco-ip>enable
Password: 
dmbcisco-ip#configure terminal
Enter configuration commands, one per line.  End with CNTL/Z.
dmbcisco-ip(config)#clock timezone EST -5 0
dmbcisco-ip(config)#clock summer-time EDT recurring
dmbcisco-ip(config)#end
dmbcisco-ip#write memory
Building configuration...
Compressed configuration from 6421 bytes to 2041 bytes[OK]
dmbcisco-ip#show clock
12:03:43.516 EDT Wed May 12 2012
dmbcisco-ip#

4.14 Managing the KVM Switch

The KVM switch is only available in Oracle Exadata Database Machine X2-2 racks and Oracle Exadata Storage Expansion Racks with Exadata Storage Server with Sun Fire X4270 M2 Servers.

4.14.1 Configuring the KVM Switch

This procedure describes how to configure the KVM (Keyboard, Video, Mouse) switch.

The switch is configured with all the connected components powered off.

Note:

The KVM switch is only available in Oracle Exadata Database Machine X2-2 racks and Oracle Exadata Storage Expansion Racks with Exadata Storage Server with Sun Fire X4270 M2 Servers.

  1. Pull the KVM tray out from the front of the rack, and open it using the handle.

  2. Touch the touch pad.

  3. Toggle between the host and KVM interface by pressing the Ctrl key on the left side twice, similar to a double-click on a mouse.

  4. Select Target Devices from the Unit View of the user interface. The number of sessions shown should be 22 for Oracle Exadata Database Machine Full Rack, 11 for Oracle Exadata Database Machine Half Rack, and 5 for Oracle Exadata Database Machine Quarter Rack. The number of sessions should be 18 for Oracle Exadata Storage Expansion Full Rack, 9 for Oracle Exadata Storage Expansion Half Rack, and 4 for Oracle Exadata Storage Expansion Quarter Rack.

    Note:

    If all sessions are not shown, then select IQ Adaptors from the Ports heading. Click the table heading, and then Port, to sort the sessions by port number. Note any missing items. The sessions are numbered from the bottom of the rack to the top.

  5. Return to the Target Devices screen.

  6. Select Local from User Accounts.

  7. Click Admin under Users.

  8. Set a password for the Admin account. Do not modify any other parameters.

  9. Click Save.

  10. Select Network from Appliance Settings. The Network Information screen appears.

  11. Select IPv4 or IPv6.

  12. Enter the values for Address, Subnet, Gateway, and the IP addresses of the DNS servers.

  13. Click Save.

  14. Connect the KVM LAN1 Ethernet port to the management network.

  15. Verify the port has been configured correctly by checking the MAC address on the Network Information screen. The address should match the label next to the LAN1/LAN2 ports on the rear of the KVM switch.

  16. Select Overview from Appliance.

  17. Enter a name for the KVM switch.

  18. Click Save.

  19. Restart the KVM switch by selecting Reboot under Overview.

  20. Examine the firmware version of the switch by selecting Versions from Appliance Settings. There are two version numbers shown, Application and Boot, as shown in the following:

    Required version is:
    Application 1.2.10.15038
    Boot  1.6.15020
    

    Note:

    The recommended firmware version is 1.2.8 or later.

    If the firmware is 1.2.3 or earlier, then it can be upgraded from a network browser. If it is version 1.2.3 or later, then it can be upgraded from the local keyboard using a flash drive plugged in to the KVM USB port. To upgrade the firmware, do the following:

    1. Select Overview from Appliance.

    2. Select Upgrade Firmware from the Tools list.

    3. Select the method to upgrade.

    4. Click Upgrade.

    5. Confirm the firmware version.

See Also:

Avocent Web site for information about KVM switch Management Information Base (MIB) at https://www.vertivco.com/en-us/support/software-download/it-management/avocent-mergepoint-unity-switches-software-downloads/

4.14.2 Configuring the KVM Switch to Access a Server

The following procedure describes how to configure the KVM switch to access the servers:

Note:

The KVM switch is only available in Oracle Exadata Database Machine X2-2 racks and Oracle Exadata Storage Expansion Racks with Exadata Storage Server with Sun Fire X4270 M2 Servers.

  1. Select Target Devices from Unit View.
  2. Power on the server. The power button is on the front panel. If the button seems stuck, then use a small tool to loosen the button.
  3. Click the system name in the Name column using the left mouse button.
  4. Click Overview, and overwrite the name with the Oracle standard naming format of customer prefix, node type, and number. For example, trnacel03 has the prefix trna, and is storage cell 3 from the bottom of the rack, and trnadb02 has the prefix trna, and is database server 2 from the bottom of the rack.
  5. Press Save.
  6. Repeat steps 2 through 5 for each server in the rack. Each server boots up through BIOS, and boots the operating system with the default factory IP configuration.

4.14.3 Accessing a Server Using the KVM Switch

The following procedure describes how to access a server using the KVM switch:

Note:

The KVM switch is only available in Oracle Exadata Database Machine X2-2 racks and Oracle Exadata Storage Expansion Racks with Exadata Storage Server with Sun Fire X4270 M2 Servers.

  1. Select Target Devices from Unit View.
  2. Click the system name in the Name column using the left mouse button.
  3. Click the KVM session.

4.15 LED Status Descriptions

The LEDs on the Oracle Exadata Rack components help you identify the component that needs servicing.

4.15.1 Sun Datacenter InfiniBand Switch 36 Switch LEDs

Table 4-1 describes the color codes of the LEDs on Sun Datacenter InfiniBand Switch 36 switches.

Table 4-1 Sun Datacenter InfiniBand Switch 36 Switch LED Status Descriptions

Component LED Status

Sun Datacenter InfiniBand Switch 36 chassis

  • Locator LED is white: It flashes when identifying itself. It is on when there is no function, and off when disabled.

  • Attention LED is amber: There is a fault condition. It flashes when there is no function.

  • OK LED is green: Switch is functioning correctly. It flashes when there is no function.

Sun Datacenter InfiniBand Switch 36 link status

Link LED is green: It is on when link is established. It is off when link is down, and it flashes when there are symbol errors.

Sun Datacenter InfiniBand Switch 36 network management ports

  • Link speed LED: Green indicates 1000BASE-T. Amber indicates 100BASE-T. Off indicates no link. Flashing indicates no function.

  • Activity LED: Flashing indicates packet activity. On indicates no function. Off indicates no activity.

Sun Datacenter InfiniBand Switch 36 power supply

  • OK LED is green: Indicates 12 VDC is supplied. Flashing indicates no function.

  • Attention LED is amber: There is a fault, and 12 VDC shut down. Flashing indicates no function.

  • AC LED is green: AC power is present and good. Flashing indicates no function.

4.15.2 Cisco Nexus 9336C-FX2 Switch LEDs

This topic describes the color codes of the LEDs on Cisco Nexus 9336C-FX2 switches.

Table 4-2 Cisco Nexus 9336C-FX2 Switch LED Status Descriptions

Component LED Status Location

Switch chassis Beacon (BCN) LED

  • Flashing blue: Operator activated LED to identify the switch.
  • Off: Default status, switch is not being identified.

The BCN LED is located on the left side of the front of the switch

Switch chassis Status (STS) LED

  • Flashing amber: Switch is booting up.
  • Amber or Red: Temperature of the switch has exceeded the minor alarm threshold
  • Off: The switch has no power.

The STS LED is located on the left side of the front of the switch

Switch chassis Environment (ENV) LED
  • Green: The fans and power supplies are functioning correctly.
  • Amber: At least one power supply or fan has stopped functioning.

The ENV LED is located on the left side of the front of the switch

Cisco Nexus 9336C-FX2 network ports

  • Green: The port admin state is enabled, Small form-factor pluggable (SFP) transceiver is present, and the interface is connected.
  • Amber: The port admin state is disabled, the SFP is absent, or both.
  • Off: The port admin state is enabled, SFP is present, but the interface is not connected.

The port LEDs appear as triangles pointing up or down to the nearest port.

Switch chassis lane link

  • One or more are lit: The lane is being checked.
  • None: All four lanes are being checked.

The Lane Link LEDs indicate which of the break out lanes are being checked.

Power supply OK and Fault LEDs

  • OK-Green, Fault-Off: Power supply is functioning correctly.
  • OK-Flashing Green, Fault-Off: Power supply is connected to a power source but not outputting power to the switch. The power supply might not be installed in the chassis.
  • OK-Off, Fault-Off: Power supply is not receiving power or is shut down.
  • OK-Green, Fault-Flashing Amber: Power supply warning, indicating possible high voltage, high power, low voltage, power supply warning condition, or power supply fan not operating correctly.
  • OK-Green, Fault-Amber: Power supply failure.

The power supply LEDs are located on the left front portion of the power supply. Combinations of states indicated by the Okay and Fault LEDs indicate the status for the module.

Fan Module STS LED

  • Green: Fan module is operating correctly.
  • Red: Fan module is not operational.
  • Off: Fan module does not have power.

The fan module LED is located below the air holes on the front of the module.