8 Configuring a Recovery Appliance Rack

This chapter describes how to configure the hardware components of a Recovery Appliance rack. It contains the following sections:

Note:

The procedures in this chapter use the files generated by Oracle Exadata Deployment Assistant. You must run this utility before doing the procedures in this chapter.

Supporting Auto Service Request

Auto Service Request is an optional component of Recovery Appliance. To configure Recovery Appliance for Auto Service Request, ASR Manager must be installed first.

Prerequisites for Using Auto Service Request

Verify that Auto Service Request was selected for use in Oracle Exadata Deployment Assistant. Recovery Appliance cannot also be used with Oracle Advanced Support Gateway or Oracle Platinum Gateway.

You must know the IP address and the root password of the ASR Manager host.

Checking an Existing ASR Manager Installation

If ASR Manager is already operating at the site, then verify that it is version 4.5 or higher. Otherwise, you must upgrade it.

To obtain the version number of ASR Manager:

  • On a Linux system:

    # rpm -qa | grep SUNWswasr
    SUNWswasr-2.7-1
    
  • On a Solaris system:

    # pkginfo -l SUNWswasr
    PKGINST: SUNWswasr
    NAME: SASM ASR Plugin
    CATEGORY: application
    ARCH: all
    VERSION: 2.6
    BASEDIR: /
    VENDOR: Sun Microsystems, Inc.
         .
         .
         .
    

The output from the previous examples indicate that ASR Manager must be updated to 4.5 or higher.

Installing ASR Manager

If ASR Manager is not already installed, then follow the instructions in Setting Up Auto Service Request. After you register ASR Manager with the Oracle ASR back end, return to these instructions for configuring Recovery Appliance.

Installing the Tape Hardware

Oracle Secure Backup tape backup is an option to Recovery Appliance. You must install the QLogic ZLE8362 fiber cards and transceivers on site; they are not factory installed.

The QLogic fiber cards are shipped from Oracle as ride-alongs with the rack. The transceivers are shipped directly from the supplier.

To install the tape networking hardware:

  1. Insert a fiber card into PCiE Slot 1 of each compute server.
  2. Verify that the cards are seated properly and align with the adjacent cards.
  3. Install the transceivers in the SAN switch or tape library, and cable them to the fiber cards.

    If the SAN switch and tape library are not installed yet, then contact your supplier.

See Also:

My Oracle Support Doc ID 1592317.1 for full instructions about replacing a PCIe card

Verifying the Network Configuration Prior to Configuring the Rack

Use the checkip.sh script to ensure there are no IP address conflicts between the existing network and your new ZDLRA Rack.

The checkip.sh script performs a pre-installation check to verify that the IP addresses and host names that you specified in Oracle Exadata Deployment Assistant (OEDA) are defined in the DNS, that the NTP servers and gateways are available, and that private addresses are not pingable. Running this script before the hardware arrives help to avoid additional delays that would be caused by misconfigured network services, such as Domain Name System (DNS) and NTP.

The checkip.sh script is created in a format that matches the operating system of the client on which you ran OEDA. Because this script is run before the engineered system rack has arrived, you typically do not run this script on an engineered system server, but on a client. The client must have access to the same network where the engineered system will be deployed. The script is also available in the ZIP file generated by OEDA.

  1. On the client where OEDA was run, copy the checkip.sh script generated by OEDA and the XML file CustomerName_hostname.xml to the same directory (one directory level up) as the OEDA config.sh script.
  2. Run the checkip.sh script on the client machine or existing server.

    Use a command similar to the following, where configuration_file is the name of the configuration generated by the Oracle Exadata Deployment Assistant for the rack being installed.

    # ./checkip.sh -cf configuration_file
    If the command is run from a Microsoft Windows machine, then the command is checkip.cmd.

    If this engineered system rack is an addition for an existing installation, then run the checkip.sh script from an existing engineered system server. This enables the script to identify in-use IP addresses in the fabric. Not identifying existing IP addresses may cause IP collisions after installation of the new engineered system rack. To create a checkip.sh that can run on an existing server, you must run OEDA on a server or client that uses the same operating system as the existing engineered system server. OEDA supports IPv6 addresses.

    The output from the script is a file that contains status messages such as GOOD or ERROR.

If there are conflicts that you are unable to resolve, then work with your assigned Oracle representative to correct the problems.

Configuring the RDMA Network Fabric Switch

You must perform an initial configuration of the RDMA Network Fabric switch.

Configuring the InfiniBand Switches

The two Sun Datacenter InfiniBand Switch 36 leaf switches are identified in Recovery Appliance as iba and ibb. Complete these configuration procedures for both switches:

Configuring an InfiniBand Switch

The default identifier for leaf switch 1 in U20 is iba, and for leaf switch 2 in U22 is ibb.

To configure a Sun Datacenter InfiniBand Switch 36 switch:

  1. For a one-rack installation, unplug the InfiniBand cable from Port 8B on the InfiniBand leaf switches. Use hook-and-loop tape to hang it out of the way.

    This cable is preinstalled for a connection to the spine switch in a multirack installation. However, in a one-rack installation, the monitoring software might show it as a down link.

  2. Connect a serial cable between your laptop and the InfiniBand switch USB serial adapter. Use these terminal settings, as needed:
    TERM=vt100; export TERM
    
  3. Log in to Oracle ILOM:
    localhost: ilom-admin
    password: welcome1
    
  4. Ensure that the firmware version is 2.1.5-1 or later:
    -> version
    

    If the switch has a lower version than 2.1.5-1, then contact Oracle Support Services.

  5. Set the switch host name, without the domain name. The following example assigns the name ra1sw to the first gateway switch (iba):
    -> set /SP hostname=ra1sw-iba
    -> show /SP hostname
    /SP
    Properties:
    hostname = ra1sw-iba
    

    See the Installation Template for the name of the switch.

  6. Set the DNS server and domain names. In the following syntax, IP_addresses can have up to three IP addresses, separated by commas, in the preferred search order.
    -> set /SP/clients/dns auto_dns=enabled
    -> set /SP/clients/dns nameserver=IP_addresses
    -> set /SP/clients/dns searchpath=domain_name
    
  7. Verify the settings:
    -> show /SP/clients/dns
    /SP/clients/dns
    Targets:
    Properties:
    auto_dns = enabled
    nameserver = 10.196.23.245, 138.2.202.15
    retries = 1
    searchpath = example.com
    timeout = 5
         .
         .
         .
    
  8. Configure the switch management network settings. In the following commands, pending_ip, pending_gw, and pending_nm are IP addresses defined by the network administrator:
    -> cd /SP/network
    -> set pendingipaddress=pending_ip
    -> set pendingipgateway=pending_gw
    -> set pendingipnetmask=pending_nm
    -> set pendingipdiscovery=static
    -> set commitpending=true
    
  9. Verify the settings:
    -> show
    /SP/network
    Targets:
    test
    Properties:
    commitpending = (Cannot show property)
    dhcp_server_ip = none
    ipaddress = 10.196.16.152
    ipdiscovery = static
    ipgateway = 10.196.23.254
    ipnetmask = 255.255.248.0
    macaddress = 00:E0:4B:38:77:7E
    pendingipaddress = 10.196.16.152
    pendingipdiscovery = static
    pendingipgateway = 10.196.23.254
    pendingipnetmask = 255.255.248.0
    state = enabled
         .
         .
         .
    
  10. If any of the values are wrong, repeat the set pendingipparameter command, and then the commitpending=true command.
Setting the Time on an InfiniBand Switch

To set the time on an InfiniBand switch:

  1. Set the time zone, using the value shown in the Installation Template. The following commands display the current setting, change the time zone, and verify the new setting:
    -> show /SP/clock 
    -> set /SP/clock timezone=zone identifier
    -> show /SP/clock
    

    The Oracle Exadata Deployment Assistant generates the Installation Template. See Using Oracle Exadata Deployment Assistant.

  2. Set the SP clock to the current time. Use the time format MMddHHmmCCyy, indicating the month, day, hour, minute, century, and year. The following commands display the current setting, change the time, and verify the new setting:
    -> show /SP/clock
    -> set /SP/clock datetime=MMddHHmmCCyy
    -> show /SP/clock
    
  3. Configure NTP. The following commands configure both the primary (1) and the secondary (2) NTP servers:
    -> set /SP/clients/ntp/server/1 address=IP_address
    -> set /SP/clients/ntp/server/2 address=IP_address
    -> set /SP/clock usentpserver=enabled
    

    Note:

    If the network does not use NTP, then configure the first compute server (U16) as an NTP server before you install the software in Installing the Recovery Appliance Software.

  4. Verify the IP address of the primary NTP server:
    -> show /SP/clients/ntp/server/1
    /SP/clients/ntp/server/1
       Targets:
    
       Properties:
          address = 10.204.74.2
    
       Commands:
          cd
          set
          show
    
  5. Verify the IP address of the secondary NTP server:
    -> show /SP/clients/ntp/server/2
    /SP/clients/ntp/server/2
       Targets:
    
       Properties:
          address = 10.196.16.1
         .
         .
         .
    
  6. Verify the time:
    -> show /SP/clock
    /SP/clock
       Targets:
    
       Properties:
          datetime = Mon Nov 04 11:53:19 2013
          timezone = EST (US/Eastern)
          usentpserver = enabled
         .
         .
         .
Setting the Serial Number on a Spine Switch

In a multirack configuration, set the rack master serial number in the ILOM of the spine switch. Skip this procedure when configuring the leaf switches.

To set the serial number on the spine switch:

  1. Set the system identifier to 40 characters or fewer:
    -> set /SP system_identifier="Oracle ZDLRA X5 serial_number"
    

    An invalid property value error indicates too many characters.

  2. Verify that the value is set:
    -> show /SP system_identifier
         /SP
           Properties:
             system_identifier = Oracle ZDLRA X5 AK012345678
Checking the Health of an InfiniBand Switch

To check the health of an InfiniBand switch:

  1. Open the fabric management shell:

    -> show /SYS/Fabric_Mgmt
    NOTE: show on Fabric_Mgmt will launch a restricted Linux shell.
    User can execute switch diagnosis, SM Configuration and IB
    monitoring commands in the shell. To view the list of commands,
    use "help" at rsh prompt.
    Use exit command at rsh prompt to revert back to
    ILOM shell.
    FabMan@hostname->
    

    The prompt changes from -> to FabMan@hostname->

  2. Check the general health of the switch:

    FabMan@ra1sw-iba-> showunhealthy
    OK - No unhealthy sensors
    
  3. Check the general environment.

    FabMan@ra1sw-iba-> env_test
    NM2 Environment test started:
    Starting Voltage test:
    Voltage ECB OK
    Measured 3.3V Main = 3.28 V
    Measured 3.3V Standby = 3.42 V
    Measured 12V = 12.06 V
         .
         .
         .
    

    The report should show that fans 1, 2, and 3 are present, and fans 0 and 4 are not present. All OK and Passed results indicate that the environment is normal.

  4. Determine the current InfiniBand subnet manager priority of the switch. Leaf switches must have an smpriority of 5, and spine switches must have a smpriority of 8. The sample output shown here indicates the correct priority for a leaf switch.

    FabMan@ra1sw-iba-> setsmpriority list
    Current SM settings:
    smpriority 5
    controlled_handover TRUE
    subnet_prefix 0xfe80000000000000
    
  5. If the priority setting is incorrect, then reset it:

    1. Disable the subnet manager:

      FabMan@ra1sw-iba->disablesm
      Stopping partitiond daemon.             [ OK ]
      Stopping IB Subnet Manager..            [ OK ]
      
    2. Reset the priority. This example sets the priority on a leaf switch:

      FabMan@ra1sw-iba->setsmpriority 5
      Current SM settings:
      smpriority
      5 controlled_handover TRUE
      subnet_prefix 0xfe80000000000000
      
    3. Restart the subnet manager:

      FabMan@ra1sw-iba->enablesm
      Starting IB Subnet Manager.             [ OK ]
      Starting partitiond daemon.             [ OK ]
      
  6. Log out of the Fabric Management shell and the Oracle ILOM shell:

    FabMan@ra1sw-iba-> exit
    -> exit
    
  7. Log in to Linux as root and restart the switch:

    localhost: root
    password: welcome1
    [root@localhost ~]# reboot
    
  8. Disconnect your laptop from the InfiniBand switch.

  9. Repeat these procedures for the second InfiniBand leaf switch.

Setting a Spine Switch as the Subnet Manager Master

The InfiniBand switch located in rack unit 1 (U1) is the spine switch. Recovery Appliance has a spine switch only when it is connected to another Recovery Appliance. It is not included as a basic component of the rack.

Perform these steps after the racks are cabled together

The spine switch is the Subnet Manager Master for the InfiniBand subnet. The Subnet Manager Master has priority 8.

To verify the priority setting of the spine switch:

  1. Log in to the spine switch as the root user.

  2. Run the setsmpriority list command.

    The command should show that smpriority has a value of 8. If smpriority has a different value, then do the following:

    1. Use the disablesm command to stop the Subnet Manager.

    2. Use the setsmpriority 8 command to set the priority to 8.

    3. Use the enablesm command to restart the Subnet Manager.

The other two InfiniBand switches are the leaf switches. The leaf switches are located in rack units 20 and 22 (U20 and U22). They are the Standby Subnet Managers with a priority of 5. You can verify the status using the preceding procedure, substituting a value of 5 in the command shown in step 22.b.

To determine the Subnet Manager Master:

  1. Log in as the root user on any InfiniBand switch.

  2. Display the location of the Subnet Manager Master.

    # getmaster
    20100701 11:46:38 OpenSM Master on Switch : 0x0021283a8516a0a0 ports 36 Sun DCS 36
    QDR switch ra01sw-ib1.example.com enhanced port 0 lid 1 lmc 0
    

    The preceding output shows the proper configuration. The Subnet Master Manager is running on spine switch ra01sw-ib1.example.com.

If the spine switch is not the Subnet Manager Master, then reset the Subnet Manager Master:

  1. Use the getmaster command to identify the current location of the Subnet Manager Master.

  2. Log in as the root user on the leaf switch that is the Subnet Manager Master.

  3. Disable Subnet Manager on the switch. The Subnet Manager Master relocates to another switch.

    See Also:

    "Disable the Subnet Manager" in Sun Datacenter InfiniBand Switch 36 User's Guide at

    http://docs.oracle.com/cd/E19197-01/835-0784-05/z4001de61813698.html#z40003f12047367

  4. Use the getmaster command to identify the current location of the Subnet Manager Master. If the spine switch is not Subnet Manager Master, then repeat steps 2 and 3 until the spine switch is the Subnet Manager Master.

  5. Enable Subnet Manager on the leaf switches that were disabled during this procedure.

    See Also:

    "Enable the Subnet Manager" in Sun Datacenter InfiniBand Switch 36 User's Guide at

    http://docs.oracle.com/cd/E19197-01/835-0784-05/z4001de61707660.html#z40003f12047359

Note:

If the InfiniBand network consists of four or more racks cabled together, then only the spine switches run Subnet Manager. Disable the Subnet Manager on the leaf switches.

Configuring the Cisco Nexus 9336C-FX2 Switch

The RoCE Network Fabric switch supplied with the engineered system rack is minimally configured during installation.

During initial system configuration, you can reset and configure the switch.

  1. Connect from the RoCE Network Fabric switch serial console to a laptop or similar device using the available RJ45 cable.
  2. Ensure the terminal session is recorded on the laptop by logging the output.
    The output can be used as a reference that the switch has been configured correctly.
  3. Power on the switch.
  4. Log in as the admin user.
    User Access Verification
    dbm0sw-rocea0 login: admin
    Password: ********
    

    Note:

    If you do not have the password for the admin user, then contact Oracle Support Services.
  5. Erase the existing configuration.
    dbm0sw-rocea0# write erase
    
    Warning: This command will erase the startup-configuration.
    
    Do you wish to proceed anyway? (y/n)  [n] y
  6. Restart the system so you can perform the automated setup.
    dbm0sw-rocea0# reload
    
    This command will reboot the system. (y/n)?  [n] y
    
    2017 Aug 31 01:09:00 dbm0sw-rocea0 %$ VDC-1 %$ %PLATFORM-2-PFM_SYSTEM_RESET: Manual system restart from Command Line Interface
    
    
    CISCO SWITCH Ver7.59
    Device detected on 0:1:2 after 0 msecs  
    ...
  7. Switch to normal setup and, when asked if you want to enforce secure password standard, enter no, then enter a new password for the admin user.
    Running S93thirdparty-script...
    
    Populating conf files for hybrid sysmgr ...
    Starting hybrid sysmgr ...
    inserting /isan/lib/modules/klm_cisco_nb.o ... done
    
    Abort Auto Provisioning and continue with normal setup ? (yes/no) [n]: yes
    
             ---- System Admin Account Setup ----
    
    Do you want to enforce secure password standard (yes/no) [y]: no
    
      Enter the password for "admin": 
      Confirm the password for "admin": 
    
  8. When the Basic System Configuration Dialog appears, choose to enter the basic configuration dialog.
            ---- Basic System Configuration Dialog VDC: 1 ----
    
    This setup utility will guide you through the basic configuration of
    the system. Setup configures only enough connectivity for management
    of the system.
    
    Please register Cisco Nexus9000 Family devices promptly with your
    supplier. Failure to register may affect response times for initial
    service calls. Nexus9000 devices must be registered to receive 
    entitled support services.
    
    Press Enter at anytime to skip a dialog. Use ctrl-c at anytime
    to skip the remaining dialogs.
    
    Would you like to enter the basic configuration dialog (yes/no): yes
    
  9. In the basic configuration, you can use the default inputs until asked to enter the switch name.

    In this example, the switch has a name of test123sw-rocea0.

      Create another login account (yes/no) [n]: 
      Configure read-only SNMP community string (yes/no) [n]: 
      Configure read-write SNMP community string (yes/no) [n]: 
      Enter the switch name : test123sw-rocea0
    
    
  10. Respond yes when asked to configure Out-of-band management configuration, and specify appropriate network addresses when prompted.
    Continue with Out-of-band (mgmt0) management configuration? (yes/no) [y]: yes
         Mgmt0 IPv4 address : 100.104.10.21
         Mgmt0 IPv4 netmask : 255.255.248.0
      Configure the default gateway? (yes/no) [y]:
         IPv4 address of the default gateway : 100.104.10.1
  11. Respond yes when asked to configure advanced IP options.
    Configure advanced IP options? (yes/no) [n]: yes
  12. Respond yes when asked to configure static route (this can be changed later).
    Configure static route? (yes/no) [n]: yes
  13. Enter the destination prefix and mask, and other values as prompted.
       Destination prefix : 10.100.100.0
    
       Destination prefix mask : 255.255.255.0
    
       Next hop IPv4 address : 10.100.100.1
    
  14. Configure the DNS IPv4 addresses.
    Configure the DNS IPv4 address? (yes/no) [n]: yes
       DNS IP address: 10.100.100.2
  15. Skip configuring the default domain name (this will be configured later).
    Configure the default domain name? (yes/no) [n]: no
    
  16. Accept the default responses until asked to configure SSH and the NTP server.
    Enable the telnet service? (yes/no) [n]: no
    Enable the ssh service? (yes/no) [y]: yes
       Type of ssh key you would like to generate (dsa/rsa) [rsa]: rsa
       Number of rsa key bits <1024-2048> [1024]: 1024
     
    Configure the NTP server? (yes/no) [n]: yes
         NTP server IPv4 address : 10.100.100.3
  17. Accept the default responses until asked to specify the CoPP system profile. Enter strict.
     Configure default interface layer (L3/L2) [L2]: 
     Configure default switchport interface state (shut/noshut) [noshut]: 
     Configure CoPP system profile (strict/moderate/lenient/dense) [strict]: strict
  18. After reviewing the configuration, save the configuration.
    The following configuration will be applied:
       no password strength-check
       switchname test123sw-rocea0
       ip route 100.104.8.0 255.255.248.0 100.104.10.1
       vrf context management
       ip route 0.0.0.0/0 100.104.10.1
       exit
        no feature telnet
        ssh key rsa 1024 force
        feature ssh
        ntp server 100.104.10.1
        system default switchport
        no system default switchport shutdown
        copp profile strict
       interface mgmt0
       ip address 100.104.10.21 255.255.248.0
       no shutdown
    
    Would you like to edit the configuration? (yes/no) [n]: 
    
    Use this configuration and save it? (yes/no) [y]: yes
    
    [########################################] 100%
    Copy complete.
  19. Enable the scp server feature on the switch.
    test123sw-rocea0# feature scp-server
  20. Save the running configuration to flash.
    test123sw-rocea0# copy running-config startup-config
    [########################################] 100%
    Copy complete.
    
  21. Apply the golden configuration on the switch.
    1. Delete the configuration file on the switch for the target configuration.

      Note:

      If you do not remove the file you are replacing, then when you attempt to overwrite the file you will get a 'permission denied' error.

      Log in to the switch, enter configuration mode, then run a command similar to the following:

      test123sw-rocea0# delete bootflash:roce_leaf_switch.cfg
      Do you want to delete "/roce_leaf_switch.cfg" ? (yes/no/abort) [y] y
      test123sw-rocea0# 
    2. Log in to a server that has SSH access to the switch, and contains the latest RDMA Network Fabric patch ZIP file.

      To find the available RDMA Network Fabric patches, search for 'RDMA network switch' in My Oracle Support document 888828.1. Download and use the latest patch for your Oracle Exadata System Software release.

    3. Unzip the RDMA Network Fabric patch ZIP file and change directories to the location of the patchmgr utility.
    4. Locate the golden configuration files in the RDMA Network Fabric patch bundle.

      The files are located within the roce_switch_templates directory.

      The golden configuration files are as follows:

      • Single rack leaf: roce_leaf_switch.cfg
      • Multi-rack leaf: roce_leaf_switch_multi.cfg
      • Multi-rack spine: roce_spine_switch_multi.cfg
      • Single rack leaf with Secure Fabric support: roce_sf_leaf_switch.cfg
      • Multi-rack leaf with Secure Fabric support: roce_sf_leaf_switch_multi.cfg
      • Single rack leaf configured with 23 host ports: roce_leaf_switch_23hosts.cfg
      • Multi-rack leaf configured with 23 host ports: roce_leaf_switch_23hosts_multi.cfg
      • Multi-rack leaf configured with 14 inter-switch links: roce_leaf_switch_14uplinks_multi.cfg
      • Multi-rack leaf configured with 14 inter-switch links and with Secure Fabric support: roce_sf_leaf_switch_14uplinks_multi.cfg
      • Multi-rack leaf configured with 23 host ports and 14 inter-switch links: roce_leaf_switch_23hosts_13uplinks_multi.cfg
    5. Copy the golden configuration file to the switch.

      In the following example, 100.104.10.21 represents the IP address of the switch you are configuring.

      # scp roce_leaf_switch.cfg admin@100.104.10.21:/
      User Access Verification
      Password:
      roce_leaf_switch.cfg 100% 23KB 23.5KB/s 00:00
    6. Apply the golden configuration file on the switch.
      Use the run-script command while connected directly to the switch.
      test123sw-rocea0# run-script bootflash:roce_leaf_switch.cfg | grep 'none'

      Note:

      This command may take up to 1-2 minutes on a single-rack switch and up to 3-4 minutes on a multi-rack switch.
    7. Verify the switch configuration.
      Use the patchmgr utility on the server that has SSH access to the switch, and contains the latest RDMA Network Fabric patch bundle.

      In the following command, roceswitch.lst is a file that contains the switch host name or IP address.

      # ./patchmgr --roceswitches roceswitch.lst --verify-config
  22. Backup up the switch configuration.

    Follow the steps in Backing Up Settings on the ROCE Switch, in Oracle Exadata Database Machine Maintenance Guide.

  23. Optional: Set the clock, using the same procedure as in Setting the Clock on the Cisco 93108-1G or 9348 Ethernet Switch.

Configuring the Cisco Ethernet Switch

The Cisco Catalyst 4948 Ethernet switch supplied with Recovery Appliance has IPBASEK9-MZ firmware. The switch is minimally configured during installation. These procedures configure the Cisco Ethernet switch into one large virtual LAN.

Configuring the Cisco Catalyst 4948 Ethernet Switch

The Cisco Catalyst 4948 Ethernet switch supplied with ZDLRA Rack is minimally configured during installation.

The minimal configuration disables IP routing, and sets the following:

  • Host name
  • IP address setup
  • Subnet mask
  • Default gateway
  • Domain name
  • Name server
  • NTP server
  • Time
  • Time zone

Before configuring the switch, note the following:

  • The Cisco Ethernet switch should not be connected until the running configuration has been verified, and any necessary changes have been made by the network administrator.

  • The Cisco Ethernet switch should not be connected to the customer network until the IP addresses on all components have been configured in ZDLRA Rack. This is to prevent any duplicate IP address conflicts which are possible due to the default addresses set in the components when shipped.

Note that the Cisco 4948E-F switch supports multiple uplinks to the customer network by utilizing ports 49 - 52. This is a more complicated switch setup due to the redundant connectivity, and should be performed by the customer's network administrator.

The following procedure describes how to configure the Cisco Ethernet switch. Configuration should be done with the network administrator.

  1. Connect a serial cable from the Cisco switch console to a laptop or similar device. An Oracle supplied rollover cable is pre-installed on the Cisco serial console port. Obtain the appropriate adapter and connect it at the end of the rollover cable. An Oracle P/N 530-3100 RJ45-DB9 adapter as used on ILOM ports will also work, connected at the end of the network cable.

  2. Ensure the terminal session is recorded on the laptop by logging the output. The output can be used as a reference that the switch has been configured correctly. The default serial port speed is 9600 baud, 8 bits, no parity, 1 stop bit, and no handshake.

    Switch con0 is now available
    Press RETURN to get started.
    
  3. Change to the enable mode.

    Switch> enable
    Password: ******
    Switch# 

    Note:

    If you do not have the password, then contact Oracle Support Services.
  4. Check the current version on the switch.

    Switch# show version 
    Cisco IOS Software, Catalyst 4500 L3 Switch Software (cat4500e-
    IPBASEK9-M), Version 15.2(3)E2, RELEASE SOFTWARE (fc1)
    Technical Support: http://www.cisco.com/techsupport
    Copyright (c) 1986-2014 by Cisco Systems, Inc.
    Compiled Tue 11-Mar-14 18:28 by prod_rel_team
    
    ROM: 12.2(44r)SG12
    zdlra1sw-ip uptime is 1 minute
    System returned to ROM by reload
    System image file is "bootflash:cat4500e-ipbasek9-mz.152-3.E2.bin"
    Hobgoblin Revision 22, Fortooine Revision 1.40
    ...
    
    Configuration register is 0x2102
    
    Switch#

    The version of the Cisco 4948E-F switch firmware purchased and shipped by Oracle with Recovery Appliance X6 is IPBASEK9-MZ, which includes telnet and ssh support. Currently the full release version string is cat4500e-ipbasek9-mz.152-3.E2.bin.

  5. Configure the network for a single VLAN. The following example assumes you are using IPv4 addressing.

    Switch# configure terminal
    Enter configuration commands,one per line.End with CNTL/Z.
    Switch(config)# interface vlan 1
    Switch(config-if)# ip address 10.7.7.34 255.255.255.0
    Switch(config-if)# end
    Switch# *Sep 15 14:12:06.309:%SYS-5-CONFIG_I:Configured from console by console
    Switch# write memory
    Building configuration...
    Compressed configuration from 2474 bytes to 1066 bytes [OK ]
    
  6. If IP routing is required on the switch, then leave the IP routing setting as the default, and configure the default gateway. Replace 10.7.7.1 with the IP address of the gateway for the installation:

    Switch#configure terminal
    Enter configuration commands,one per line.End with CNTL/Z.
    Switch(config)#ip route 0.0.0.0 0.0.0.0 10.7.7.1
    Switch(config)#end
    *Sep 15 14:13:26.013:%SYS-5-CONFIG_I:Configured from console by console
    Switch#write memory
    Building configuration...
    Compressed configuration from 2502 bytes to 1085 bytes [OK ]
  7. Set the host name of the switch.

    This example sets the name to ra1sw-ip:

    The system host name is used as the prompt name.

  8. Configure up to three DNS servers. Replace the domain name and IP addresses used in this example with the values for the installation:

    ra1sw-ip#configure terminal
    Enter configuration commands,one per line.End with CNTL/Z.
    ra1sw-ip(config)#ip domain-name example.com
    ra1sw-ip(config)#ip name-server 10.7.7.3
    ra1sw-ip(config)#ip name-server 198.51.100.5 
    ra1sw-ip(config)#ip name-server 10.8.160.1
    ra1sw-ip(config)#end 
    *Sep 15 14:26:37.045:%SYS-5-CONFIG_I:Configured from console by console
    ra1sw-ip#write memory
    Building configuration...
    Compressed configuration from 2603 bytes to 1158 bytes [OK ]
    
    

    If you do not have DNS service available, you must still set the domain-name so that you can configure the SSH keys.

  9. (Optional) Set the password.

    ra1sw-ip# configure terminal
    Enter configuration commands,one per line. End with CNTL/Z.
    ra1sw-ip(config)# enable password password
    ra1sw-ip(config)# enable secret password 
    ra1sw-ip(config)# end
    ra1sw-ip# write memory 
    *Sep 15 14:25:05.893:%SYS-5-CONFIG_I:Configured from console by console
    Building configuration...
    Compressed configuration from 2502 bytes to 1085 bytes [OK ]
    
  10. Verify telnet access is disabled. Telnet is not secure, and should not be enabled unless there is a compelling reason. To enable telnet, set a password. To disable it, remove the password.

    ra1sw-ip#configure terminal
    Enter configuration commands,one per line. End with CNTL/Z.
    ra1sw-ip(config)#line vty 0 15
    ra1sw-ip(config)#login
    % Login disabled on line 1, until 'password' is set
    % Login disabled on line 2, until 'password' is set
     ...
    % Login disabled on line 16, until 'password' is set
    ra1sw-ip(config)#end
    

    If the login command returns output as shown above, then telnet access has been disabled. If instead you get a prompt, then telnet access is not yet disabled so should be disabled now.

    ra1sw-ip(config-line)#no password
    ra1sw-ip(config-line)#end
    ra1sw-ip#write memory 
    Building configuration...
    Compressed configuration from 3786 bytes to 1468 bytes [OK ]
    
  11. To configure a secure shell (SSH) on the Ethernet switch:

    ra1sw-ip# configure terminal
    Enter configuration commands, one per line. End with CNTL/Z.
    ra1sw-ip(config)# crypto key generate rsa
    % You already have RSA keys defined named ra1sw-ip.example.com.
    % Do you really want to replace them? [yes/no]: yes
    Choose the size of the key modulus in the range of 360 to 2048 for
    your General Purpose Keys. Choosing a key modulus greater than 512
    may take a few minutes.
    How many bits in the modulus [512]: 768
    
    % Generating 768 bit RSA keys, keys will be non-exportable...[OK]
    ra1sw-ip(config)# username admin password 0 welcome1
    ra1sw-ip(config)# line vty 0 15
    ra1sw-ip(config-line)# transport input ssh
    ra1sw-ip(config-line)# exit
    ra1sw-ip(config)# aaa new-model
    
    ra1sw-ip(config)# ip ssh time-out 60
    ra1sw-ip(config)# ip ssh authentication-retries 3
    ra1sw-ip(config)# ip ssh version 2
    ra1sw-ip(config)# end
    *Sep 15 14:26:37.045: %SYS-5-CONFIG_I: Configured from console by console
    ra1sw-ip# write memory
    Building configuration...
    Compressed configuration from 2603 bytes to 1158 bytes[OK]
  12. Set the clock and time zone. The switch keeps internal time in Coordinated Universal Time (UTC) format.

    • To use UTC, use the following command:

      no clock timezone global configuration
      
    • To use a time zone, use the following command:

      clock timezone zone hours-offset [minutes-offset]
      

      In the preceding command, zone is the time zone to display when standard time in effect, hours-offset is the hours offset from UTC, and minutes-offset is the minutes offset from UTC.

    • Daylight savings time (or summer time) is disabled by default. To set summer time hours, use the following command:

      clock summer-time zone recurring [week day monthhh:mm week day month \
      hh:mm[offset]]
      

      In the preceding command, zone is the time zone to be displayed when summer time is in effect (EDT, for example), week is the week of the month (1 to 5 or last), day is the day of the week (Sunday, Monday, ...), month is the month (January, February, ...), hh:mm is the hours and minutes in 24-hour format, and offset is the number of minutes to add during summer time. The default offset is 60 minutes.

    • To manually set the clock to any time use the following command, where the time specified is relative to the configured time zone:

      clock set hh:mm:ss month day year
      

      In the preceding command, hh:mm:ss is the time in 24-hour format, day is the day by date in the month, month is the name of the month, and year is the 4-digit year.

    The ordering of commands is important when setting the local time and time zone. For example, to set the local time to US Eastern time:

    ra1sw-ip# configure terminal
    Enter configuration commands,one per line. End with CNTL/Z.
    ra1sw-ip(config)# clock timezone EST -5 
    ra1sw-ip(config)# clock summer-time EDT recurring
    ra1sw-ip(config)# end
    ra1sw-ip# clock set 21:00:00 August 09 2018
    ra1sw-ip# write memory
    Building configuration...
    Compressed configuration from 3784 bytes to 1465 bytes [OK ]
    ra1sw-ip# show clock
    21:00:06.643 EST Mon Aug 9 2018
    
  13. After setting the local time zone, you can configure up to two NTP servers. Replace the IP addresses used in this example with the values for the installation:

    ra1sw-ip# configure terminal
    Enter configuration commands,one per line. End with CNTL/Z.
    ra1sw-ip(config)# ntp server 10.7.7.32 prefer
    ra1sw-ip(config)# ntp server 198.51.100.19
    ra1sw-ip(config)# end
    *Sep 15 14:51:08.665:%SYS-5-CONFIG_I:Configured from console by console
    ra1sw-ip# write memory
    Building configuration...
    Compressed configuration from 2654 bytes to 1163 bytes [OK ]
    ra1sw-ip# show ntp status
    <output will vary per network>
         .
    ra1sw-ip# show clock
    21:00:23.175 EST Mon Aug 9 2018
    

    The NTP server is synchronized to local time when you connect the Cisco switch to the network and it has access to NTP.

    Symbols that precede the show clock display indicate that the time is the following:

    • * Not authoritative
    • . Authoritative, but NTP is not synchronized.
    • Authoritative (blank space).
  14. Verify the Ethernet configuration using the following command:

    ra1sw-ip# show running-config
    Building configuration...
    Current configuration : 3923 bytes
    !
    version 15.2
    no service pad
    service timestamps debug datetime msec
    service timestamps log datetime msec
    no service password-encryption
    service compress-config
         .
         .
         .
    
    

    Note:

    If any setting is incorrect, then repeat the appropriate step. To erase a setting, enter no in front of the same command. For example, to erase the default gateway, use the following commands:

    ra1sw-ip#configure terminal
    Enter configuration commands, one per line. End with CNTL/Z.
    ra1sw-ip(config)# no ip default-gateway 10.7.7.1
    ra1sw-ip(config)# end
    ra1sw-ip#
    *Sep 15 14:13:26.013: %SYS-5-CONFIG_I: Configured from console by console
    ra1sw-ip(config)# write memory
    Building configuration...
    Compressed configuration from 2502 bytes to 1085 bytes[OK]
    
  15. Save the current configuration.

    ra1sw-ip#copy running-config startup-config
    Destination filename [startup-config]?
    Building configuration...
    Compressed configuration from 2654 bytes to 1189 bytes[OK]
  16. Exit from the session using the following command:

    ra1sw-ip# exit
    
    ra1sw-ip con0 is now available
    
    Press RETURN to get started.
  17. Disconnect the cable from the Cisco console.

    The Cisco switch must not be connected to the management network at this stage. The switch will be connected later after Oracle has configured the systems with the necessary IP addresses and you have worked with the field service engineer to make any additional changes necessary for connecting to the network.

  18. To check the Cisco switch, attach a laptop computer to port 48, and ping the IP address of the internal management network to check the configuration.

    Do not connect the switch to the management network.

Configuring the Cisco Nexus 93108-1G or 9348 Ethernet Switch

The Cisco Nexus 93108-1G or 9348 Ethernet switch supplied with ZDLRA Rack is minimally configured during installation.

Note that the Cisco Nexus 93108-1G or 9348 switch supports multiple uplinks to the customer network by utilizing the QSFP+ ports. This is a more complicated switch setup due to the redundant connectivity, and should be performed by the customer's network administrator.

Whether you are configuring the switch for the first time, or configuring a replacement switch, use the following procedures:

Performing the Initial Switch Configuration for the Cisco Nexus 93108-1G or 9348 Ethernet Switch

During the initial configuration, you reset the switch and use the Basic System Configuration Dialog to configure the switch.

Before configuring the switch, note the following:

  • The Cisco Ethernet switch should not be connected until the running configuration has been verified, and any necessary changes have been made by the network administrator.

  • The Cisco Ethernet switch should not be connected to the customer network until the IP addresses on all components have been configured in ZDLRA Rack. This is to prevent any duplicate IP address conflicts which are possible due to the default addresses set in the components when shipped.

Configuration should be done with the network administrator.

  1. Connect from the Cisco switch serial console to a laptop or similar device using the available RJ45 cable.
  2. Ensure the terminal session is recorded on the laptop by logging the output.
    The output can be used as a reference that the switch has been configured correctly. The default serial port speed is 9600 baud, 8 bits, no parity, 1 stop bit, and no handshake.
  3. Power on the switch.
  4. Log in as the admin user.
    User Access Verification
    exadatax7-adm0 login: admin
    Password: ********
    

    Note:

    If you do not have the password for the admin user, then contact Oracle Support Services.
  5. Erase the existing configuration.
    exadatax7-adm0# write erase
    
    Warning: This command will erase the startup-configuration.
    
    Do you wish to proceed anyway? (y/n)  [n] y
  6. Restart the system so you can perform the automated setup.
    exadatax7-adm0# reload
    
    This command will reboot the system. (y/n)?  [n] y
    
    2017 Aug 31 01:09:00 exadatax7-adm0 %$ VDC-1 %$ %PLATFORM-2-PFM_SYSTEM_RESET: Manual system restart from Command Line Interface
    
    
    CISCO SWITCH Ver7.59
    Device detected on 0:1:2 after 0 msecs  
    ...
  7. Switch to normal setup and, when asked if you want to enforce secure password standard, enter no, then enter a new password for the admin user.
    Running S93thirdparty-script...
    
    Populating conf files for hybrid sysmgr ...
    Starting hybrid sysmgr ...
    inserting /isan/lib/modules/klm_cisco_nb.o ... done
    
    Abort Auto Provisioning and continue with normal setup ? (yes/no) [n]: yes
    
             ---- System Admin Account Setup ----
    
    Do you want to enforce secure password standard (yes/no) [y]: no
    
      Enter the password for "admin": 
      Confirm the password for "admin": 
    
  8. When the Basic System Configuration Dialog appears, choose to enter the basic configuration dialog.
            ---- Basic System Configuration Dialog VDC: 1 ----
    
    This setup utility will guide you through the basic configuration of
    the system. Setup configures only enough connectivity for management
    of the system.
    
    Please register Cisco Nexus9000 Family devices promptly with your
    supplier. Failure to register may affect response times for initial
    service calls. Nexus9000 devices must be registered to receive 
    entitled support services.
    
    Press Enter at anytime to skip a dialog. Use ctrl-c at anytime
    to skip the remaining dialogs.
    
    Would you like to enter the basic configuration dialog (yes/no): yes
    
  9. In the basic configuration, you can use the default inputs until asked to enter the switch name.

    In this example, the switch has a name of test123sw-adm0.

      Create another login account (yes/no) [n]: 
      Configure read-only SNMP community string (yes/no) [n]: 
      Configure read-write SNMP community string (yes/no) [n]: 
      Enter the switch name : test123sw-adm0
    
    
  10. Respond no when asked to configure Out-of-band management configuration.
    Continue with Out-of-band (mgmt0) management configuration? (yes/no) [y]: no
  11. Respond yes when asked to configure advanced IP options.
    Configure advanced IP options? (yes/no) [n]: yes
  12. Respond no when asked to configure static route (this will be configured later).
    Configure static route? (yes/no) [n]: no
  13. Enter the destination prefix and mask, and other values as prompted.
       Destination prefix : 10.100.100.0
    
       Destination prefix mask : 255.255.255.0
    
       Next hop IPv4 address : 10.100.100.1
    
  14. Skip configuring the DNS IPv4 addresses (this will be configured later).
    Configure the DNS IPv4 address? (yes/no) [n]: no
    
  15. Skip configuring the default domain name (this will be configured later).
    Configure the default domain name? (yes/no) [n]: no
    
  16. Accept the default responses until asked to configure SSH and the NTP server.
    Enable the telnet service? (yes/no) [n]: no
    Enable the ssh service? (yes/no) [y]: yes
       Type of ssh key you would like to generate (dsa/rsa) [rsa]: rsa
       Number of rsa key bits <1024-2048> [1024]: 1024
     
    Configure the ntp server? (yes/no) [n]: yes
         NTP server IPv4 address : 10.100.100.3
  17. Accept the default responses until asked to specify the CoPP system profile. Enter lenient.
     Configure default interface layer (L3/L2) [L2]: 
     Configure default switchport interface state (shut/noshut) [noshut]: 
     Configure CoPP system profile (strict/moderate/lenient/dense) [strict]: lenient
  18. After reviewing the configuration, save the configuration.
    The following configuration will be applied:
       no password strength-check
       switchname test123sw-adm0
      ...
    
    Would you like to edit the configuration? (yes/no) [n]: 
    
    Use this configuration and save it? (yes/no) [y]: yes
    
    [########################################] 100%
    Copy complete.
  19. Add the VLAN 1 IP address.
    test123sw-adm0(config)# feature interface-vlan
    test123sw-adm0(config)# interface vlan 1
    test123sw-adm0(config-if)# ip address 10.100.100.110/24
    test123sw-adm0(config-if)# no shutdown
    test123sw-adm0(config-if)# exit
  20. Set the spanning tree port type for ports 1-47.
    test123sw-adm0(config)# interface E1/1-47
    test123sw-adm0(config-if)# spanning-tree port type edge
    test123sw-adm0(config-if)# exit
  21. Set switchport on all 48 ports and set port 48 to a network port (instead of a host port).
    test123sw-adm0(config)# interface E1/1-48
    test123sw-adm0(config-if)# switchport
    test123sw-adm0(config-if)# exit
    test123sw-adm0(config)# interface E1/48
    test123sw-adm0(config-if)# spanning-tree port type network
    test123sw-adm0(config-if)# ip route 0.0.0.0/0 10.100.100.1
  22. Configure the DNS information.
    test123sw-adm0(config)# ip domain-name example.com
    test123sw-adm0(config)# ip name-server 10.100.100.2
    test123sw-adm0(config)# exit
  23. Save the current configuration.
    test123sw-adm0# copy running-config startup-config
    [########################################] 100%
    Copy complete.
    
  24. Optional: Set the clock, as described in the next topic.
Setting the Clock on the Cisco 93108-1G or 9348 Ethernet Switch

After you have performed the initial configuration, you can adjust the time used by the switch.

  1. Log in as the admin user.
  2. View the current time.
    test123sw-adm0(config)# show clock
    20:44:52.986 UTC Thu Aug 31 2017
    Time source is NTP
  3. Set the timezone appropriately.
    test123sw-adm0(config)# clock timezone PST -8 0
    
  4. View the modified time.
    test123sw-adm0(config)# show clock
    12:46:22.692 PST Thu Aug 31 2017
    Time source is NTP
  5. Save the configuration.
    test123sw-adm0# copy running-config startup-config 
    [########################################] 100%
    Copy complete.

Disabling Spanning Tree on the Ethernet Switch

Spanning tree is enabled by default on Cisco switches. If you add a switch with spanning tree enabled to the network, then you might cause network problems. As a precaution, you can disable spanning tree from the uplink port VLAN before connecting the switch to the network. Alternatively, you can turn on spanning tree protocol with specific protocol settings either before or after connecting to the network.

To disable spanning tree on the uplink port VLAN:

  1. Disable spanning tree on the uplink port VLAN:
    rasw-ip# configure terminal
    Enter configuration commands, one per line. End with CNTL/Z.
    rasw-ip(config)# no spanning-tree vlan 1
    rasw-ip(config)# end
    rasw-ip# write memory
    Building configuration...
    Compressed configuration from 2654 bytes to 1163 bytes[OK]
    
  2. Verify that spanning tree is disabled:
    rasw-ip# show spanning-tree vlan 1
    Spanning tree instance(s) for vlan 1 does not exist.
    

To re-enable spanning tree protocol with the default protocol settings:

  • Use the commands shown in this example:

    ra1sw-ip# configure terminal
    Enter configuration commands, one per line. End with CNTL/Z.
    ra1sw-ip(config)# spanning-tree vlan 1
    ra1sw-ip(config)# end
    ra1sw-ip# write memory

See Also:

Cisco Switch Configuration Guide to enable spanning tree protocol with the specific protocol settings required by the data center Ethernet network

Configuring the Power Distribution Units

The power distribution units (PDUs) are configured with static IP addresses to connect to the network for monitoring.

Assigning Network Addresses to the PDUs

To configure the PDU network addresses:

  1. Use an RS-232 cable to connect your laptop to the Cisco Ethernet switch SER MGT port.
  2. Configure your laptop's terminal emulator to use these settings:
    • 9600 baud

    • 8 bit

    • 1 stop bit

    • No parity bit

    • No flow control

  3. Log in to the PDU metering unit as the admin user with password welcome1.

    Change this password after configuring the network.

  4. Enter the network settings for the IP address, subnet mask, and default gateway:
    pducli -> set net_ipv4_dhcp=Off
    set OK
    pducli -> set net_ipv4_ipaddr=ip_address
    set OK
    pducli -> set net_ipv4_subnet=subnet_mask
    set OK
    pducli -> set net_ipv4_gateway=default_gateway
    set OK
    
  5. (Optional) Configure the PDU with the DNS server IP addresses:
    pducli -> set net_ipv4_dns1=domain_name_1
    set OK
    pducli -> set net_ipv4_dns2=domain_name_2
    set OK
    
  6. Reset the PDU metering unit:
    pducli -> reset=yes
    set OK
    
  7. Remove the RS-232 cable from the SER MGT port.
  8. Repeat these steps for the second PDU metering unit.

Configuring the PDU System Time Settings

To configure the PDUs:

  1. Connect your laptop to the Ethernet switch.
  2. Open a browser and connect to the PDU, using its IP address:
    https://pdu_ip_address
    

    Accept the security note. The Metering Overview page is displayed.

  3. Click Net Configuration in the upper left, and log in as user admin with the password welcome1.
  4. Select the System Time tab.
  5. Configure Manual Settings with the current date and time, and then click Submit.
  6. Configure NTP Server Settings, and then click Submit:
    • Select the Enable option.

    • Enter an NTP server IP address, which is listed on the Installation Template.

    • Select Time Zone from the menu.

  7. Select the PDU Information tab.
  8. Enter these values, and then click Submit:
    • Name: PDU host name, provided in the Installation Template; for example, ra5sw-pdua0

    • Product Identifier (case sensitive): ZDLRA X5

    • Rack Serial Number: Serial number similar to AK12345678

    • Location (optional): Site identifier

  9. On the Metering Overview page, select Module Info.
  10. Confirm that the firmware version is 2.01 or higher. If it is not, then upgrade the firmware after you finish this procedure.
  11. Click Logout to log out of the PDU.
  12. Repeat these steps for the second PDU metering unit.
  13. Disconnect the PDU metering units from the Cisco Ethernet switch, and connect them to the data center management network.

Upgrading the PDU Firmware

If the PDU firmware is out of date, with a version earlier than 2.01, then download and install the current version.

To upgrade the PDU firmware:

  1. Download the current firmware for Enhanced PDUs from My Oracle Support to your laptop.
  2. Unzip the downloaded file on your laptop.
  3. Open a browser and connect to the PDU, using its IP address:
    https://pdu_ip_address
    

    Accept the security note. The Metering Overview page is displayed.

  4. Click Net Configuration in the upper left, and log in as user admin with the password welcome1.
  5. Select the Firmware Update tab.
  6. Click Browse, and select MKAPP_V2.x.DL from the unzipped, downloaded firmware files on your laptop.
  7. Click Submit to update the firmware.

    The PDU reboots automatically when the update is complete.

  8. Reconnect your browser to the PDU.
  9. Click Module Info on the Metering Overview page, and confirm that the firmware was updated successfully.
  10. Click Logout to log out of the PDU.
  11. Repeat these steps for the second PDU metering unit.

Checking the Health of the Compute Servers

To check the two compute servers in U16 and U17:

  1. Power on both compute servers if they are no up already, and wait while they initialize the BIOS and load the Linux operating system.

  2. Use a serial cable to connect your laptop to the first compute server's serial MGT port.

  3. Configure your laptop's terminal emulator to use these settings:

    • 9600 baud

    • 8 bit

    • 1 stop bit

    • No parity bit

    • No handshake

    • No flow control

  4. Log in as the root user with the welcome1 password.

    • On the first compute server (which is connected to your laptop), open the Oracle ILOM console, and then log in:

      -> start /SP/console
      
    • On the second compute server, use SSH to log in. The default factory IP address is 192.168.1.109.

  5. Verify that the rack master and host serial numbers are set correctly. The first number must match the rack serial number, and the second number must match the SysSN label on the front panel of the server.

    # ipmitool sunoem cli "show /System" | grep serial
         serial_number = AK12345678
         component_serial_number = 1234NM567H
    
  6. Verify that the model and rack serial numbers are set correctly:

    # ipmitool sunoem cli "show /System" | grep model
         model = ZDLRA X5
    # ipmitool sunoem cli "show /System" | grep ident
         system_identifier = Oracle Zero Data Loss Recovery Appliance X5 AK12345678
    
  7. Verify that the management network is working:

    # ethtool eth0 | grep det
    Link detected: yes
    
  8. Verify that the ILOM management network is working:

    # ipmitool sunoem cli 'show /SP/network' | grep ipadd
    ipaddress = 192.168.1.108
    pendingipaddress = 192.168.1.108
    
  9. Verify that Oracle ILOM can detect the optional QLogic PCIe cards, if they are installed:

    # ipmitool sunoem cli "show /System/PCI_Devices/Add-on/Device_1"
    Connected. Use ^D to exit.
    -> show /System/PCI_Devices/Add-on/Device_1
      /System/PCI_Devices/Add-on/Device_1
      Targets:
    
      Properties:
        part_number = 7101674
        description = Sun Storage 16 Gb Fibre Channel PCIe Universal FC HBA,
                      Qlogic
        location = PCIE1 (PCIe Slot 1)
        pci_vendor_id = 0x1077
        pci_device_id = 0x2031
        pci_subvendor_id = 0x1077
        pci_subdevice_id = 0x024d
    
      Commands:
        cd
        show
    
    -> Session closed
    Disconnected
    

    See "Installing the Tape Hardware" for information about the QLogic PCIe cards.

  10. Verify that all memory is present (256 GB):

    # grep MemTotal /proc/meminfo
    MemTotal: 264232892 kB
    [

    The value might vary slightly, depending on the BIOS version. However, if the value is smaller, then use the Oracle ILOM event logs to identify the faulty memory.

  11. Verify that the four disks are visible, online, and numbered from slot 0 to slot 3:

    # cd /opt/MegaRAID/MegaCli/
    # ./MegaCli64 -Pdlist -a0 | grep "Slot\|Firmware state"
    Slot Number: 0
    Firmware state: Online, Spun Up
    Slot Number: 1
    Firmware state: Online, Spun Up
    Slot Number: 2
    Firmware state: Online, Spun Up
    Slot Number: 3
    Firmware state: Online, Spun Up
    
  12. Verify that the hardware logical volume is set up correctly. Look for Virtual Disk 0 as RAID5 with four drives and no hot spares:

    [root@db01 ~]# cd /opt/MegaRAID/MegaCli
    [root@db01 MegaCli]# ./MegaCli64 -LdInfo -lAll -a0
    Adapter 0 -- Virtual Drive Information:
    Virtual Drive: 0 (Target Id: 0)
    Name :DBSYS
    RAID Level : Primary-5, Secondary-0, RAID Level Qualifier-3
    Size : 1.633 TB
    Physical Sector Size: 512
    Logical Sector Size : 512
    VD has Emulated PD : No
    Parity Size : 557.861 GB
    State : Optimal
    Strip Size : 1.0 MB
    Number Of Drives : 4
    Span Depth : 1
         .
         .
         .
    
  13. Verify that the hardware profile is operating correctly:

    # /opt/oracle.SupportTools/CheckHWnFWProfile
    [SUCCESS] The hardware and firmware matches supported profile for
    server=ORACLE_SERVER_X5-2
    

    The previous output shows correct operations. However, the following response indicates a problem that you must correct before continuing:

    [WARNING] The hardware and firmware are not supported. See details below
    [InfinibandHCAPCIeSlotWidth]
    Requires:
    x8
    Found:
    x4
    [WARNING] The hardware and firmware are not supported. See details above
    

    Use the --help argument to review the available options, such as obtaining more detailed output.

  14. When connected to the first compute server only:

    1. Verify the IP address of the first compute server:

      # ifconfig eth0
      eth0 Link encap:Ethernet HWaddr 00:10:E0:3C:EA:B0
           inet addr:172.16.2.44 Bcast:172.16.2.255 Mask:255.255.255.0
           inet6 addr: fe80::210:e0ff:fe3c:eab0/64 Scope:Link
           UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
           RX packets:7470193 errors:0 dropped:0 overruns:0 frame:0
           TX packets:4318201 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:872195171 (831.7 MiB) TX bytes:2444529519 (2.2 GiB)
      
    2. Verify the IP address of the second compute server:

      # ibhosts
      Ca : 0x0010e0000159c61c ports 2 "node4 elasticNode 172.16.2.40,172.16.2.40 ETH0"
      Ca : 0x0010e000015a46f0 ports 2 "node10 elasticNode 172.16.2.46,172.16.2.46 ETH0"
      Ca : 0x0010e0000159d96c ports 2 "node1 elasticNode 172.16.2.37,172.16.2.37 ETH0"
      Ca : 0x0010e0000159c51c ports 2 "node2 elasticNode 172.16.2.38,172.16.2.38 ETH0"
      Ca : 0x0010e000015a5710 ports 2 "node8 elasticNode 172.16.2.44,172.16.2.44 ETH0"
  15. Disconnect from the server:

    • First compute server: exit

    • Second compute server: logout

  16. Repeat these steps for the second compute server.

Checking the Health of the Storage Servers

A Recovery Appliance X5 and higher versions have three to 18 storage servers, and a Recovery Appliance X4 rack has three to 14 storage servers. Begin at the bottom of the rack and check each server.

To check a storage server:

  1. Power on all storage servers if they are not already on, and wait while the servers initialize the BIOS and load the Linux operating system.
  2. Use SSH to connect your laptop to the first storage server. Use its factory IP address.
  3. Log in as the root user with the welcome1 password.

    The terminal emulation settings are the same as for the compute servers. See "Checking the Health of the Compute Servers".

  4. Verify that the rack master and host serial numbers are set correctly. The first number must match the rack serial number, and the second number must match the SysSN label on the front panel of the server.
    # ipmitool sunoem cli "show /System" | grep serial
         serial_number = AK01234567
         component_serial_number = 1234NM5678
    
  5. Verify that the model and rack serial numbers are set correctly:
    # ipmitool sunoem cli "show /System" | grep model
         model = ZDLRA X5
    # ipmitool sunoem cli "show /System" | grep ident
         system_identifier = Oracle Zero Data Loss Recovery Appliance X5 AK01234567
    
  6. Verify that the management network is working:
    # ethtool eth0 | grep det
    Link detected: yes
    
  7. Verify that the ILOM management network is working:
    # ipmitool sunoem cli 'show /SP/network' | grep ipadd
    ipaddress = 192.168.1.101
    pendingipaddress = 192.168.1.101
    
  8. Verify that all memory is present. X5 has 96 GB, while X8 has 384 GB:
    # grep MemTotal /proc/meminfo
    MemTotal: 98757064 kB
    [

    If the value is smaller, then use the Oracle ILOM event logs to identify the faulty memory.

  9. Verify that the hardware profile is operating correctly:
    # /opt/oracle.SupportTools/CheckHWnFWProfile
    [SUCCESS] The hardware and firmware matches supported profile for
    server=ORACLE_SERVER_X5-2L_EXADATA_HIGHCAPACITY
    

    The previous output shows correct operations. However, the following response indicates a problem that you must correct before continuing:

    [WARNING] The hardware and firmware are not supported. See details below
    [InfinibandHCAPCIeSlotWidth]
    Requires:
    x8
    Found:
    x4
    [WARNING] The hardware and firmware are not supported. See details above
    

    Use the --help argument to review the available options, such as obtaining more detailed output.

  10. Verify that 12 disks are visible, online, and numbered from slot 0 to slot 11:
    # cd /opt/MegaRAID/MegaCli
    # ./MegaCli64 -Pdlist -a0 | grep "Slot\|Firmware state" 
    Slot Number: 0
    Firmware state: Online, Spun Up
    Slot Number: 1
    Firmware state: Online, Spun Up
         .
         .
         .
    
  11. Verify that there are four NVME logical devices:
    # ls -l /dev | grep nvme | grep brw
    brw-rw---- 1 root disk 259, 0 Nov 12 19:10 nvme0n1
    brw-rw---- 1 root disk 259, 1 Nov 12 19:10 nvme1n1
    brw-rw---- 1 root disk 259, 2 Nov 12 19:10 nvme2n1
    brw-rw---- 1 root disk 259, 3 Nov 12 19:10 nvme3n1
    
  12. Confirm the healthy status of the AIC card:
    # nvmecli --identify --all | grep -i indicator
    Health Indicator      : Healthy
    Health Indicator      : Healthy
    Health Indicator      : Healthy
    Health Indicator      : Healthy
    
  13. Verify that the boot order is USB (Oracle Unigen), RAID, and PXE:
    [# ubiosconfig export all > /tmp/bios.xml
    [# grep -m1 -A20 boot_order /tmp/bios.xml
    <boot_order>
      <boot_device>
        <description>USB:USBIN0:ORACLE SSM UNIGEN-UFD PMAP</description>
        <instance>1</instance>
      </boot_device>
      <boot_device>
        <description>RAID:PCIE6:(Bus 50 Dev 00)PCI RAID Adapter</description>
        <instance>1</instance>
      </boot_device>
      <boot_device>
        <description>PXE:NET0:IBA XE Slot 3A00 v2320</description>
        <instance>1</instance>
      </boot_device>
      <boot_device>
        <description>PXE:NET1:IBA XE Slot 4001 v2196</description>
        <instance>1</instance>
      </boot_device>
    
  14. If the boot order is wrong, then restart the server and fix the order in the BIOS setup:
    # ipmitool chassis bootdev bios
    # shutdown -r now
    
  15. Exit or log out of SSH.
  16. Repeat these steps for the next storage server until you have checked all of them.

Verifying the RoCE Network Fabric Configuration

This procedure describes how to verify the RoCE Network Fabric configuration.

  1. Verify the proper oracle-rdma-release software versions are being used on the database servers.
    [root@dbm01adm08 ~]# rpm -qa |grep oracle-rdma-release
    oracle-rdma-release-0.11.0-1.el7ora.x86_64

    The oracle-rdma-release software and adapter firmware versions are automatically maintained on the Recovery Appliance storage servers.

  2. Check the adapter firmware versions on the database servers.

    Use the CheckHWnFWProfile script to check firmware versions for the RDMA Network Fabric adapters.

    # /opt/oracle.SupportTools/CheckHWnFWProfile -action list
  3. Visually check all the RDMA Network Fabric cable connections within the rack.
    The port lights should be on, and the LEDs should be on. Do not press each connector to verify connectivity.
  4. Complete the steps described in My Oracle Support Doc ID 2587717.1

Verifying the InfiniBand Network Fabric Network

This procedure describes how to verify the InfiniBand Network Fabric network.

  1. Visually check all the RDMA Network Fabric cable connections within the rack. The port lights should be on, and the LEDs should be on. Do not press each connector to verify connectivity.

  2. Log in as the root user on any component in the rack.

  3. Verify the InfiniBand Network Fabric topology using the following commands:

    # cd /opt/oracle.SupportTools/ibdiagtools
    # ./verify-topology [-t rack_size]

    The following example shows the output when the network components are correct.

    [DB Machine Infiniband Cabling Topology Verification Tool ]
    Is every external switch connected to every internal switch......[SUCCESS ]
    Are any external switches connected to each other................[SUCCESS ]
    Are any hosts connected to spine switch..........................[SUCCESS ]
    Check if all hosts have 2 CAs to different switches..............[SUCCESS ]
    Leaf switch check:cardinality and even distribution..............[SUCCESS ]
    Check if each rack has an valid internal ring....................[SUCCESS ]
    

    In the preceding command, rack_size is the size of the rack. The -t rack_size option is needed if the rack is Recovery Appliance Half Rack or Recovery Appliance Quarter Rack. Use the value halfrack or quarterrack, if needed.

    The following example shows the output when there is a bad RDMA Network Fabric switch to cable connection:

    #./verify-topology
    [DB Machine Infiniband Cabling Topology Verification Tool ]
    Is every external switch connected to every internal switch......[SUCCESS ]
    Are any external switches connected to each other................[SUCCESS ]
    Are any hosts connected to spine switch..........................[SUCCESS ]
    Check if all hosts have 2 CAs to different switches..............[SUCCESS ]
    Leaf switch check:cardinality and even distribution..............[SUCCESS ]
    Check if each rack has an valid internal ring....................[ERROR ]
    
    Switches 0x21283a87cba0a0 0x21283a87b8a0a0 have 6 connections between them.
    They should have at least 7 links between them
    

    The following example shows the output when there is a bad RDMA Network Fabric cable on a database server:

    #./verify-topology
    [DB Machine Infiniband Cabling Topology Verification Tool ]
    Is every external switch connected to every internal switch......[SUCCESS ]
    Are any external switches connected to each other................[SUCCESS ]
    Are any hosts connected to spine switch..........................[SUCCESS ]
    Check if all hosts have 2 CAs to different switches..............[ERROR ]
    Node db01 has 1 endpoints.(Should be 2)
    Port 2 of this node is not connected to any switch
    --------fattree End Point Cabling verification failed-----
    Leaf switch check:cardinality and even distribution..............[ERROR ]
    Internal QDR Switch 0x21283a87b8a0a0 has fewer than 4 compute nodes
    It has only 3 links belonging to compute nodes                  [SUCCESS ]
    Check if each rack has an valid internal ring...................[SUCCESS ]
    

    The following example shows the output when there is a bad connection on the switch and the system:

    #./verify-topology
    [DB Machine Infiniband Cabling Topology Verification Tool ]
    Is every external switch connected to every internal switch......[SUCCESS ]
    Are any external switches connected to each other................[SUCCESS ]
    Are any hosts connected to spine switch..........................[SUCCESS ]
    Check if all hosts have 2 CAs to different switches..............[ERROR ]
    
    Node burxdb01 has 1 endpoints.(Should be 2) 
    Port 2 of this node is not connected to any switch
    --------fattree End Point Cabling verifation failed-----
    Leaf switch check:cardinality and even distribution..............[ERROR ]
    Internal QDR Switch 0x21283a87b8a0a0 has fewer than 4 compute nodes 
    It has only 3 links belonging to compute nodes...................[SUCCESS ]
    Check if each rack has an valid internal ring....................[ERROR ]
    
    Switches 0x21283a87cba0a0 0x21283a87b8a0a0 have 6 connections between them
    They should have at least 7 links between them

Setting the Subnet Manager Master on Oracle Exadata Database Machine Full Rack and Oracle Exadata Database Machine Half Rack

Recovery Appliance X3-2 systems and Recovery Appliance X2-2 systems have three Sun Datacenter InfiniBand Switch 36 switches. Starting with Recovery Appliance X4-2, Recovery Appliance Systems have two Sun Datacenter InfiniBand Switch 36 switches.

Note:

This procedure does not apply to Recovery Appliance X8M racks with RoCE Network Fabric.

The switch located in rack unit 1 (U1) is referred to as the spine switch. The other two switches are referred to as the leaf switches. The location of the leaf switches is as follows:

  • Recovery Appliance Two-Socket Systems (X3-2 and later): rack unit 20 (U20) and rack unit 22 (U22)

  • Recovery Appliance X2-2 racks: rack unit 20 (U20) and rack unit 24 (U24)

  • Recovery Appliance Eight-Socket Systems (X2-8 and later) Full Racks: Rack unit 21 (U21) and rack unit 23 (U23)

The spine switch is the Subnet Manager Master for the InfiniBand Network Fabric subnet. The Subnet Manager Master has priority 8, and can be verified using the following procedure:

  1. Log in to the spine switch as the root user.

  2. Run the setsmpriority list command.

    The command should show that smpriority has a value of 8. If smpriority has a different value, then do the following:

    1. Use the disablesm command to stop the Subnet Manager.

    2. Use the setsmpriority 8 command to set the priority to 8.

    3. Use the enablesm command to restart the Subnet Manager.

The leaf switches are the Standby Subnet Managers with a priority of 5. This can be verified using the preceding procedure, substituting a value of 5 in the setsmpriority command above.

Note:

Recovery Appliance Half Rack with Sun Fire X4170 Oracle Database Servers include two Sun Datacenter InfiniBand Switch 36 switches, which are set to priority 5.

To determine the Subnet Manager Master, log in as the root user on any InfiniBand Network Fabric switch, and run the getmaster command. The location of the Subnet Manager Master is displayed. The following is an example of the output from the getmaster command:

# getmaster
20100701 11:46:38 OpenSM Master on Switch : 0x0021283a8516a0a0 ports 36 Sun DCS 36
QDR switch dm01sw-ib1.example.com enhanced port 0 lid 1 lmc 0

The preceding output shows the proper configuration. The Subnet Master Manager is running on spine switch dm01sw-ib1.example.com.

If the spine switch is not the Subnet Manager Master, then do the following procedure to set the Subnet Manager Master:

  1. Use the getmaster command to identify the current location of the Subnet Manager Master.

  2. Log in as the root user on the leaf switch that is the Subnet Manager Master.

  3. Disable Subnet Manager on the switch. The Subnet Manager Master relocates to another switch.

  4. Use the getmaster command to identify the current location of the Subnet Manager Master. If the spine switch is not Subnet Manager Master, then repeat steps 2 and 3 until the spine switch is the Subnet Manager Master.

  5. Enable Subnet Manager on the leaf switches that were disabled during this procedure.

Note:

  • If the InfiniBand Network Fabric network consists of four or more racks cabled together, then only the spine switches should run Subnet Manager. The leaf switches should have Subnet Manager disabled on them.
  • Recovery Appliance Half Racks with Sun Fire X4170 Oracle Database Servers, and Recovery Appliance Quarter Racks have two Sun Datacenter InfiniBand Switch 36 switches, and both are set to priority 5. The master is the one with the lowest GUID.

See Also: