8 Configuring a Recovery Appliance Rack
This chapter describes how to configure the hardware components of a Recovery Appliance rack. It contains the following sections:
Note:
The procedures in this chapter use the files generated by Oracle Exadata Deployment Assistant. You must run this utility before doing the procedures in this chapter.
Supporting Auto Service Request
Auto Service Request is an optional component of Recovery Appliance. To configure Recovery Appliance for Auto Service Request, ASR Manager must be installed first.
Prerequisites for Using Auto Service Request
Verify that Auto Service Request was selected for use in Oracle Exadata Deployment Assistant. Recovery Appliance cannot also be used with Oracle Advanced Support Gateway or Oracle Platinum Gateway.
You must know the IP address and the root password of the ASR Manager host.
Checking an Existing ASR Manager Installation
If ASR Manager is already operating at the site, then verify that it is version 4.5 or higher. Otherwise, you must upgrade it.
To obtain the version number of ASR Manager:
-
On a Linux system:
# rpm -qa | grep SUNWswasr SUNWswasr-2.7-1
-
On a Solaris system:
# pkginfo -l SUNWswasr PKGINST: SUNWswasr NAME: SASM ASR Plugin CATEGORY: application ARCH: all VERSION: 2.6 BASEDIR: / VENDOR: Sun Microsystems, Inc. . . .
The output from the previous examples indicate that ASR Manager must be updated to 4.5 or higher.
Installing ASR Manager
If ASR Manager is not already installed, then follow the instructions in Setting Up Auto Service Request. After you register ASR Manager with the Oracle ASR back end, return to these instructions for configuring Recovery Appliance.
Installing the Tape Hardware
Oracle Secure Backup tape backup is an option to Recovery Appliance. You must install the QLogic ZLE8362 fiber cards and transceivers on site; they are not factory installed.
The QLogic fiber cards are shipped from Oracle as ride-alongs with the rack. The transceivers are shipped directly from the supplier.
To install the tape networking hardware:
See Also:
My Oracle Support Doc ID 1592317.1 for full instructions about replacing a PCIe card
Verifying the Network Configuration Prior to Configuring the Rack
Use the checkip.sh
script to ensure there are no IP address conflicts between the existing network and your new ZDLRA Rack.
The checkip.sh
script performs a pre-installation check to verify that the IP addresses and host names that you specified in Oracle Exadata Deployment Assistant (OEDA) are defined in the DNS, that the NTP servers and gateways are available, and that private addresses are not pingable. Running this script before the hardware arrives help to avoid additional delays that would be caused by misconfigured network services, such as Domain Name System (DNS) and NTP.
The checkip.sh
script is created in a format that matches the operating system of the client on which you ran OEDA. Because this script is run before the engineered system rack has arrived, you typically do not run this script on an engineered system server, but on a client. The client must have access to the same network where the engineered system will be deployed. The script is also available in the ZIP file generated by OEDA.
If there are conflicts that you are unable to resolve, then work with your assigned Oracle representative to correct the problems.
Configuring the RDMA Network Fabric Switch
You must perform an initial configuration of the RDMA Network Fabric switch.
Configuring the InfiniBand Switches
The two Sun Datacenter InfiniBand Switch 36 leaf switches are identified in Recovery Appliance as iba
and ibb
. Complete these configuration procedures for both switches:
Configuring an InfiniBand Switch
The default identifier for leaf switch 1 in U20 is iba
, and for leaf switch 2 in U22 is ibb
.
To configure a Sun Datacenter InfiniBand Switch 36 switch:
Setting the Serial Number on a Spine Switch
In a multirack configuration, set the rack master serial number in the ILOM of the spine switch. Skip this procedure when configuring the leaf switches.
To set the serial number on the spine switch:
Checking the Health of an InfiniBand Switch
To check the health of an InfiniBand switch:
-
Open the fabric management shell:
-> show /SYS/Fabric_Mgmt NOTE: show on Fabric_Mgmt will launch a restricted Linux shell. User can execute switch diagnosis, SM Configuration and IB monitoring commands in the shell. To view the list of commands, use "help" at rsh prompt. Use exit command at rsh prompt to revert back to ILOM shell. FabMan@hostname->
The prompt changes from -> to FabMan@hostname->
-
Check the general health of the switch:
FabMan@ra1sw-iba-> showunhealthy OK - No unhealthy sensors
-
Check the general environment.
FabMan@ra1sw-iba-> env_test NM2 Environment test started: Starting Voltage test: Voltage ECB OK Measured 3.3V Main = 3.28 V Measured 3.3V Standby = 3.42 V Measured 12V = 12.06 V . . .
The report should show that fans 1, 2, and 3 are present, and fans 0 and 4 are not present. All OK and Passed results indicate that the environment is normal.
-
Determine the current InfiniBand subnet manager priority of the switch. Leaf switches must have an smpriority of 5, and spine switches must have a smpriority of 8. The sample output shown here indicates the correct priority for a leaf switch.
FabMan@ra1sw-iba-> setsmpriority list Current SM settings: smpriority 5 controlled_handover TRUE subnet_prefix 0xfe80000000000000
-
If the priority setting is incorrect, then reset it:
-
Disable the subnet manager:
FabMan@ra1sw-iba->disablesm Stopping partitiond daemon. [ OK ] Stopping IB Subnet Manager.. [ OK ]
-
Reset the priority. This example sets the priority on a leaf switch:
FabMan@ra1sw-iba->setsmpriority 5 Current SM settings: smpriority 5 controlled_handover TRUE subnet_prefix 0xfe80000000000000
-
Restart the subnet manager:
FabMan@ra1sw-iba->enablesm Starting IB Subnet Manager. [ OK ] Starting partitiond daemon. [ OK ]
-
-
Log out of the Fabric Management shell and the Oracle ILOM shell:
FabMan@ra1sw-iba-> exit -> exit
-
Log in to Linux as root and restart the switch:
localhost: root password: welcome1 [root@localhost ~]# reboot
-
Disconnect your laptop from the InfiniBand switch.
-
Repeat these procedures for the second InfiniBand leaf switch.
Setting a Spine Switch as the Subnet Manager Master
The InfiniBand switch located in rack unit 1 (U1) is the spine switch. Recovery Appliance has a spine switch only when it is connected to another Recovery Appliance. It is not included as a basic component of the rack.
Perform these steps after the racks are cabled together
The spine switch is the Subnet Manager Master for the InfiniBand subnet. The Subnet Manager Master has priority 8.
To verify the priority setting of the spine switch:
-
Log in to the spine switch as the
root
user. -
Run the
setsmpriority list
command.The command should show that
smpriority
has a value of 8. Ifsmpriority
has a different value, then do the following:-
Use the
disablesm
command to stop the Subnet Manager. -
Use the
setsmpriority 8
command to set the priority to 8. -
Use the
enablesm
command to restart the Subnet Manager.
-
The other two InfiniBand switches are the leaf switches. The leaf switches are located in rack units 20 and 22 (U20 and U22). They are the Standby Subnet Managers with a priority of 5. You can verify the status using the preceding procedure, substituting a value of 5 in the command shown in step 22.b.
To determine the Subnet Manager Master:
-
Log in as the
root
user on any InfiniBand switch. -
Display the location of the Subnet Manager Master.
# getmaster 20100701 11:46:38 OpenSM Master on Switch : 0x0021283a8516a0a0 ports 36 Sun DCS 36 QDR switch ra01sw-ib1.example.com enhanced port 0 lid 1 lmc 0
The preceding output shows the proper configuration. The Subnet Master Manager is running on spine switch ra01sw-ib1.example.com.
If the spine switch is not the Subnet Manager Master, then reset the Subnet Manager Master:
-
Use the
getmaster
command to identify the current location of the Subnet Manager Master. -
Log in as the
root
user on the leaf switch that is the Subnet Manager Master. -
Disable Subnet Manager on the switch. The Subnet Manager Master relocates to another switch.
See Also:
"Disable the Subnet Manager" in Sun Datacenter InfiniBand Switch 36 User's Guide at
http://docs.oracle.com/cd/E19197-01/835-0784-05/z4001de61813698.html#z40003f12047367
-
Use the
getmaster
command to identify the current location of the Subnet Manager Master. If the spine switch is not Subnet Manager Master, then repeat steps 2 and 3 until the spine switch is the Subnet Manager Master. -
Enable Subnet Manager on the leaf switches that were disabled during this procedure.
See Also:
"Enable the Subnet Manager" in Sun Datacenter InfiniBand Switch 36 User's Guide at
http://docs.oracle.com/cd/E19197-01/835-0784-05/z4001de61707660.html#z40003f12047359
Note:
If the InfiniBand network consists of four or more racks cabled together, then only the spine switches run Subnet Manager. Disable the Subnet Manager on the leaf switches.
Configuring the Cisco Ethernet Switch
The Cisco Catalyst 4948 Ethernet switch supplied with Recovery Appliance has IPBASEK9-MZ firmware. The switch is minimally configured during installation. These procedures configure the Cisco Ethernet switch into one large virtual LAN.
Configuring the Cisco Catalyst 4948 Ethernet Switch
The Cisco Catalyst 4948 Ethernet switch supplied with ZDLRA Rack is minimally configured during installation.
The minimal configuration disables IP routing, and sets the following:
- Host name
- IP address setup
- Subnet mask
- Default gateway
- Domain name
- Name server
- NTP server
- Time
- Time zone
Before configuring the switch, note the following:
-
The Cisco Ethernet switch should not be connected until the running configuration has been verified, and any necessary changes have been made by the network administrator.
-
The Cisco Ethernet switch should not be connected to the customer network until the IP addresses on all components have been configured in ZDLRA Rack. This is to prevent any duplicate IP address conflicts which are possible due to the default addresses set in the components when shipped.
Note that the Cisco 4948E-F switch supports multiple uplinks to the customer network by utilizing ports 49 - 52. This is a more complicated switch setup due to the redundant connectivity, and should be performed by the customer's network administrator.
The following procedure describes how to configure the Cisco Ethernet switch. Configuration should be done with the network administrator.
-
Connect a serial cable from the Cisco switch console to a laptop or similar device. An Oracle supplied rollover cable is pre-installed on the Cisco serial console port. Obtain the appropriate adapter and connect it at the end of the rollover cable. An Oracle P/N 530-3100 RJ45-DB9 adapter as used on ILOM ports will also work, connected at the end of the network cable.
-
Ensure the terminal session is recorded on the laptop by logging the output. The output can be used as a reference that the switch has been configured correctly. The default serial port speed is 9600 baud, 8 bits, no parity, 1 stop bit, and no handshake.
Switch con0 is now available Press RETURN to get started.
-
Change to the enable mode.
Switch> enable Password: ****** Switch#
Note:
If you do not have the password, then contact Oracle Support Services. -
Check the current version on the switch.
Switch# show version Cisco IOS Software, Catalyst 4500 L3 Switch Software (cat4500e- IPBASEK9-M), Version 15.2(3)E2, RELEASE SOFTWARE (fc1) Technical Support: http://www.cisco.com/techsupport Copyright (c) 1986-2014 by Cisco Systems, Inc. Compiled Tue 11-Mar-14 18:28 by prod_rel_team ROM: 12.2(44r)SG12 zdlra1sw-ip uptime is 1 minute System returned to ROM by reload System image file is "bootflash:cat4500e-ipbasek9-mz.152-3.E2.bin" Hobgoblin Revision 22, Fortooine Revision 1.40 ... Configuration register is 0x2102 Switch#
The version of the Cisco 4948E-F switch firmware purchased and shipped by Oracle with Recovery Appliance X6 is IPBASEK9-MZ, which includes
telnet
andssh
support. Currently the full release version string iscat4500e-ipbasek9-mz.152-3.E2.bin
. -
Configure the network for a single VLAN. The following example assumes you are using IPv4 addressing.
Switch# configure terminal Enter configuration commands,one per line.End with CNTL/Z. Switch(config)# interface vlan 1 Switch(config-if)# ip address 10.7.7.34 255.255.255.0 Switch(config-if)# end Switch# *Sep 15 14:12:06.309:%SYS-5-CONFIG_I:Configured from console by console Switch# write memory Building configuration... Compressed configuration from 2474 bytes to 1066 bytes [OK ]
-
If IP routing is required on the switch, then leave the IP routing setting as the default, and configure the default gateway. Replace 10.7.7.1 with the IP address of the gateway for the installation:
Switch#configure terminal Enter configuration commands,one per line.End with CNTL/Z. Switch(config)#ip route 0.0.0.0 0.0.0.0 10.7.7.1 Switch(config)#end *Sep 15 14:13:26.013:%SYS-5-CONFIG_I:Configured from console by console Switch#write memory Building configuration... Compressed configuration from 2502 bytes to 1085 bytes [OK ]
-
Set the host name of the switch.
This example sets the name to ra1sw-ip:
The system host name is used as the prompt name.
-
Configure up to three DNS servers. Replace the domain name and IP addresses used in this example with the values for the installation:
ra1sw-ip#configure terminal Enter configuration commands,one per line.End with CNTL/Z. ra1sw-ip(config)#ip domain-name example.com ra1sw-ip(config)#ip name-server 10.7.7.3 ra1sw-ip(config)#ip name-server 198.51.100.5 ra1sw-ip(config)#ip name-server 10.8.160.1 ra1sw-ip(config)#end *Sep 15 14:26:37.045:%SYS-5-CONFIG_I:Configured from console by console ra1sw-ip#write memory Building configuration... Compressed configuration from 2603 bytes to 1158 bytes [OK ]
If you do not have DNS service available, you must still set the domain-name so that you can configure the SSH keys.
-
(Optional) Set the password.
ra1sw-ip# configure terminal Enter configuration commands,one per line. End with CNTL/Z. ra1sw-ip(config)# enable password password ra1sw-ip(config)# enable secret password ra1sw-ip(config)# end ra1sw-ip# write memory *Sep 15 14:25:05.893:%SYS-5-CONFIG_I:Configured from console by console Building configuration... Compressed configuration from 2502 bytes to 1085 bytes [OK ]
-
Verify telnet access is disabled. Telnet is not secure, and should not be enabled unless there is a compelling reason. To enable telnet, set a password. To disable it, remove the password.
ra1sw-ip#configure terminal Enter configuration commands,one per line. End with CNTL/Z. ra1sw-ip(config)#line vty 0 15 ra1sw-ip(config)#login % Login disabled on line 1, until 'password' is set % Login disabled on line 2, until 'password' is set ... % Login disabled on line 16, until 'password' is set ra1sw-ip(config)#end
If the login command returns output as shown above, then telnet access has been disabled. If instead you get a prompt, then telnet access is not yet disabled so should be disabled now.
ra1sw-ip(config-line)#no password ra1sw-ip(config-line)#end ra1sw-ip#write memory Building configuration... Compressed configuration from 3786 bytes to 1468 bytes [OK ]
-
To configure a secure shell (SSH) on the Ethernet switch:
ra1sw-ip# configure terminal Enter configuration commands, one per line. End with CNTL/Z. ra1sw-ip(config)# crypto key generate rsa % You already have RSA keys defined named ra1sw-ip.example.com. % Do you really want to replace them? [yes/no]: yes Choose the size of the key modulus in the range of 360 to 2048 for your General Purpose Keys. Choosing a key modulus greater than 512 may take a few minutes. How many bits in the modulus [512]: 768 % Generating 768 bit RSA keys, keys will be non-exportable...[OK] ra1sw-ip(config)# username admin password 0 welcome1 ra1sw-ip(config)# line vty 0 15 ra1sw-ip(config-line)# transport input ssh ra1sw-ip(config-line)# exit ra1sw-ip(config)# aaa new-model ra1sw-ip(config)# ip ssh time-out 60 ra1sw-ip(config)# ip ssh authentication-retries 3 ra1sw-ip(config)# ip ssh version 2 ra1sw-ip(config)# end *Sep 15 14:26:37.045: %SYS-5-CONFIG_I: Configured from console by console ra1sw-ip# write memory Building configuration... Compressed configuration from 2603 bytes to 1158 bytes[OK]
-
Set the clock and time zone. The switch keeps internal time in Coordinated Universal Time (UTC) format.
-
To use UTC, use the following command:
no clock timezone global configuration
-
To use a time zone, use the following command:
clock timezone zone hours-offset [minutes-offset]
In the preceding command, zone is the time zone to display when standard time in effect, hours-offset is the hours offset from UTC, and minutes-offset is the minutes offset from UTC.
-
Daylight savings time (or summer time) is disabled by default. To set summer time hours, use the following command:
clock summer-time zone recurring [week day monthhh:mm week day month \ hh:mm[offset]]
In the preceding command, zone is the time zone to be displayed when summer time is in effect (EDT, for example), week is the week of the month (1 to 5 or last), day is the day of the week (Sunday, Monday, ...), month is the month (January, February, ...), hh:mm is the hours and minutes in 24-hour format, and offset is the number of minutes to add during summer time. The default offset is 60 minutes.
-
To manually set the clock to any time use the following command, where the time specified is relative to the configured time zone:
clock set hh:mm:ss month day year
In the preceding command, hh:mm:ss is the time in 24-hour format, day is the day by date in the month, month is the name of the month, and year is the 4-digit year.
The ordering of commands is important when setting the local time and time zone. For example, to set the local time to US Eastern time:
ra1sw-ip# configure terminal Enter configuration commands,one per line. End with CNTL/Z. ra1sw-ip(config)# clock timezone EST -5 ra1sw-ip(config)# clock summer-time EDT recurring ra1sw-ip(config)# end ra1sw-ip# clock set 21:00:00 August 09 2018 ra1sw-ip# write memory Building configuration... Compressed configuration from 3784 bytes to 1465 bytes [OK ] ra1sw-ip# show clock 21:00:06.643 EST Mon Aug 9 2018
-
-
After setting the local time zone, you can configure up to two NTP servers. Replace the IP addresses used in this example with the values for the installation:
ra1sw-ip# configure terminal Enter configuration commands,one per line. End with CNTL/Z. ra1sw-ip(config)# ntp server 10.7.7.32 prefer ra1sw-ip(config)# ntp server 198.51.100.19 ra1sw-ip(config)# end *Sep 15 14:51:08.665:%SYS-5-CONFIG_I:Configured from console by console ra1sw-ip# write memory Building configuration... Compressed configuration from 2654 bytes to 1163 bytes [OK ] ra1sw-ip# show ntp status <output will vary per network> . ra1sw-ip# show clock 21:00:23.175 EST Mon Aug 9 2018
The NTP server is synchronized to local time when you connect the Cisco switch to the network and it has access to NTP.
Symbols that precede the
show clock
display indicate that the time is the following:*
Not authoritative.
Authoritative, but NTP is not synchronized.
-
Verify the Ethernet configuration using the following command:
ra1sw-ip# show running-config Building configuration... Current configuration : 3923 bytes ! version 15.2 no service pad service timestamps debug datetime msec service timestamps log datetime msec no service password-encryption service compress-config . . .
Note:
If any setting is incorrect, then repeat the appropriate step. To erase a setting, enter
no
in front of the same command. For example, to erase the default gateway, use the following commands:ra1sw-ip#configure terminal Enter configuration commands, one per line. End with CNTL/Z. ra1sw-ip(config)# no ip default-gateway 10.7.7.1 ra1sw-ip(config)# end ra1sw-ip# *Sep 15 14:13:26.013: %SYS-5-CONFIG_I: Configured from console by console ra1sw-ip(config)# write memory Building configuration... Compressed configuration from 2502 bytes to 1085 bytes[OK]
-
Save the current configuration.
ra1sw-ip#copy running-config startup-config Destination filename [startup-config]? Building configuration... Compressed configuration from 2654 bytes to 1189 bytes[OK]
-
Exit from the session using the following command:
ra1sw-ip# exit ra1sw-ip con0 is now available Press RETURN to get started.
-
Disconnect the cable from the Cisco console.
The Cisco switch must not be connected to the management network at this stage. The switch will be connected later after Oracle has configured the systems with the necessary IP addresses and you have worked with the field service engineer to make any additional changes necessary for connecting to the network.
-
To check the Cisco switch, attach a laptop computer to port 48, and ping the IP address of the internal management network to check the configuration.
Do not connect the switch to the management network.
Configuring the Cisco Nexus 93108-1G or 9348 Ethernet Switch
The Cisco Nexus 93108-1G or 9348 Ethernet switch supplied with ZDLRA Rack is minimally configured during installation.
Note that the Cisco Nexus 93108-1G or 9348 switch supports multiple uplinks to the customer network by utilizing the QSFP+ ports. This is a more complicated switch setup due to the redundant connectivity, and should be performed by the customer's network administrator.
Whether you are configuring the switch for the first time, or configuring a replacement switch, use the following procedures:
Performing the Initial Switch Configuration for the Cisco Nexus 93108-1G or 9348 Ethernet Switch
During the initial configuration, you reset the switch and use the Basic System Configuration Dialog to configure the switch.
Before configuring the switch, note the following:
-
The Cisco Ethernet switch should not be connected until the running configuration has been verified, and any necessary changes have been made by the network administrator.
-
The Cisco Ethernet switch should not be connected to the customer network until the IP addresses on all components have been configured in ZDLRA Rack. This is to prevent any duplicate IP address conflicts which are possible due to the default addresses set in the components when shipped.
Configuration should be done with the network administrator.
Disabling Spanning Tree on the Ethernet Switch
Spanning tree is enabled by default on Cisco switches. If you add a switch with spanning tree enabled to the network, then you might cause network problems. As a precaution, you can disable spanning tree from the uplink port VLAN before connecting the switch to the network. Alternatively, you can turn on spanning tree protocol with specific protocol settings either before or after connecting to the network.
To disable spanning tree on the uplink port VLAN:
To re-enable spanning tree protocol with the default protocol settings:
-
Use the commands shown in this example:
ra1sw-ip# configure terminal Enter configuration commands, one per line. End with CNTL/Z. ra1sw-ip(config)# spanning-tree vlan 1 ra1sw-ip(config)# end ra1sw-ip# write memory
See Also:
Cisco Switch Configuration Guide to enable spanning tree protocol with the specific protocol settings required by the data center Ethernet network
Configuring the Power Distribution Units
The power distribution units (PDUs) are configured with static IP addresses to connect to the network for monitoring.
Checking the Health of the Compute Servers
To check the two compute servers in U16 and U17:
-
Power on both compute servers if they are no up already, and wait while they initialize the BIOS and load the Linux operating system.
-
Use a serial cable to connect your laptop to the first compute server's serial MGT port.
-
Configure your laptop's terminal emulator to use these settings:
-
9600 baud
-
8 bit
-
1 stop bit
-
No parity bit
-
No handshake
-
No flow control
-
-
Log in as the
root
user with thewelcome1
password.-
On the first compute server (which is connected to your laptop), open the Oracle ILOM console, and then log in:
-> start /SP/console
-
On the second compute server, use SSH to log in. The default factory IP address is 192.168.1.109.
-
-
Verify that the rack master and host serial numbers are set correctly. The first number must match the rack serial number, and the second number must match the SysSN label on the front panel of the server.
# ipmitool sunoem cli "show /System" | grep serial serial_number = AK12345678 component_serial_number = 1234NM567H
-
Verify that the model and rack serial numbers are set correctly:
# ipmitool sunoem cli "show /System" | grep model model = ZDLRA X5 # ipmitool sunoem cli "show /System" | grep ident system_identifier = Oracle Zero Data Loss Recovery Appliance X5 AK12345678
-
Verify that the management network is working:
# ethtool eth0 | grep det Link detected: yes
-
Verify that the ILOM management network is working:
# ipmitool sunoem cli 'show /SP/network' | grep ipadd ipaddress = 192.168.1.108 pendingipaddress = 192.168.1.108
-
Verify that Oracle ILOM can detect the optional QLogic PCIe cards, if they are installed:
# ipmitool sunoem cli "show /System/PCI_Devices/Add-on/Device_1" Connected. Use ^D to exit. -> show /System/PCI_Devices/Add-on/Device_1 /System/PCI_Devices/Add-on/Device_1 Targets: Properties: part_number = 7101674 description = Sun Storage 16 Gb Fibre Channel PCIe Universal FC HBA, Qlogic location = PCIE1 (PCIe Slot 1) pci_vendor_id = 0x1077 pci_device_id = 0x2031 pci_subvendor_id = 0x1077 pci_subdevice_id = 0x024d Commands: cd show -> Session closed Disconnected
See "Installing the Tape Hardware" for information about the QLogic PCIe cards.
-
Verify that all memory is present (256 GB):
# grep MemTotal /proc/meminfo MemTotal: 264232892 kB [
The value might vary slightly, depending on the BIOS version. However, if the value is smaller, then use the Oracle ILOM event logs to identify the faulty memory.
-
Verify that the four disks are visible, online, and numbered from slot 0 to slot 3:
# cd /opt/MegaRAID/MegaCli/ # ./MegaCli64 -Pdlist -a0 | grep "Slot\|Firmware state" Slot Number: 0 Firmware state: Online, Spun Up Slot Number: 1 Firmware state: Online, Spun Up Slot Number: 2 Firmware state: Online, Spun Up Slot Number: 3 Firmware state: Online, Spun Up
-
Verify that the hardware logical volume is set up correctly. Look for Virtual Disk 0 as RAID5 with four drives and no hot spares:
[root@db01 ~]# cd /opt/MegaRAID/MegaCli [root@db01 MegaCli]# ./MegaCli64 -LdInfo -lAll -a0 Adapter 0 -- Virtual Drive Information: Virtual Drive: 0 (Target Id: 0) Name :DBSYS RAID Level : Primary-5, Secondary-0, RAID Level Qualifier-3 Size : 1.633 TB Physical Sector Size: 512 Logical Sector Size : 512 VD has Emulated PD : No Parity Size : 557.861 GB State : Optimal Strip Size : 1.0 MB Number Of Drives : 4 Span Depth : 1 . . .
-
Verify that the hardware profile is operating correctly:
# /opt/oracle.SupportTools/CheckHWnFWProfile [SUCCESS] The hardware and firmware matches supported profile for server=ORACLE_SERVER_X5-2
The previous output shows correct operations. However, the following response indicates a problem that you must correct before continuing:
[WARNING] The hardware and firmware are not supported. See details below [InfinibandHCAPCIeSlotWidth] Requires: x8 Found: x4 [WARNING] The hardware and firmware are not supported. See details above
Use the
--help
argument to review the available options, such as obtaining more detailed output. -
When connected to the first compute server only:
-
Verify the IP address of the first compute server:
# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:10:E0:3C:EA:B0 inet addr:172.16.2.44 Bcast:172.16.2.255 Mask:255.255.255.0 inet6 addr: fe80::210:e0ff:fe3c:eab0/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:7470193 errors:0 dropped:0 overruns:0 frame:0 TX packets:4318201 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:872195171 (831.7 MiB) TX bytes:2444529519 (2.2 GiB)
-
Verify the IP address of the second compute server:
# ibhosts Ca : 0x0010e0000159c61c ports 2 "node4 elasticNode 172.16.2.40,172.16.2.40 ETH0" Ca : 0x0010e000015a46f0 ports 2 "node10 elasticNode 172.16.2.46,172.16.2.46 ETH0" Ca : 0x0010e0000159d96c ports 2 "node1 elasticNode 172.16.2.37,172.16.2.37 ETH0" Ca : 0x0010e0000159c51c ports 2 "node2 elasticNode 172.16.2.38,172.16.2.38 ETH0" Ca : 0x0010e000015a5710 ports 2 "node8 elasticNode 172.16.2.44,172.16.2.44 ETH0"
-
-
Disconnect from the server:
-
First compute server:
exit
-
Second compute server:
logout
-
-
Repeat these steps for the second compute server.
Checking the Health of the Storage Servers
A Recovery Appliance X5 and higher versions have three to 18 storage servers, and a Recovery Appliance X4 rack has three to 14 storage servers. Begin at the bottom of the rack and check each server.
To check a storage server:
Verifying the RoCE Network Fabric Configuration
This procedure describes how to verify the RoCE Network Fabric configuration.
Verifying the InfiniBand Network Fabric Network
This procedure describes how to verify the InfiniBand Network Fabric network.
-
Visually check all the RDMA Network Fabric cable connections within the rack. The port lights should be on, and the LEDs should be on. Do not press each connector to verify connectivity.
-
Log in as the
root
user on any component in the rack. -
Verify the InfiniBand Network Fabric topology using the following commands:
# cd /opt/oracle.SupportTools/ibdiagtools # ./verify-topology [-t rack_size]
The following example shows the output when the network components are correct.
[DB Machine Infiniband Cabling Topology Verification Tool ] Is every external switch connected to every internal switch......[SUCCESS ] Are any external switches connected to each other................[SUCCESS ] Are any hosts connected to spine switch..........................[SUCCESS ] Check if all hosts have 2 CAs to different switches..............[SUCCESS ] Leaf switch check:cardinality and even distribution..............[SUCCESS ] Check if each rack has an valid internal ring....................[SUCCESS ]
In the preceding command, rack_size is the size of the rack. The
-t rack_size
option is needed if the rack is Recovery Appliance Half Rack or Recovery Appliance Quarter Rack. Use the valuehalfrack
orquarterrack
, if needed.The following example shows the output when there is a bad RDMA Network Fabric switch to cable connection:
#./verify-topology [DB Machine Infiniband Cabling Topology Verification Tool ] Is every external switch connected to every internal switch......[SUCCESS ] Are any external switches connected to each other................[SUCCESS ] Are any hosts connected to spine switch..........................[SUCCESS ] Check if all hosts have 2 CAs to different switches..............[SUCCESS ] Leaf switch check:cardinality and even distribution..............[SUCCESS ] Check if each rack has an valid internal ring....................[ERROR ] Switches 0x21283a87cba0a0 0x21283a87b8a0a0 have 6 connections between them. They should have at least 7 links between them
The following example shows the output when there is a bad RDMA Network Fabric cable on a database server:
#./verify-topology [DB Machine Infiniband Cabling Topology Verification Tool ] Is every external switch connected to every internal switch......[SUCCESS ] Are any external switches connected to each other................[SUCCESS ] Are any hosts connected to spine switch..........................[SUCCESS ] Check if all hosts have 2 CAs to different switches..............[ERROR ] Node db01 has 1 endpoints.(Should be 2) Port 2 of this node is not connected to any switch --------fattree End Point Cabling verification failed----- Leaf switch check:cardinality and even distribution..............[ERROR ] Internal QDR Switch 0x21283a87b8a0a0 has fewer than 4 compute nodes It has only 3 links belonging to compute nodes [SUCCESS ] Check if each rack has an valid internal ring...................[SUCCESS ]
The following example shows the output when there is a bad connection on the switch and the system:
#./verify-topology [DB Machine Infiniband Cabling Topology Verification Tool ] Is every external switch connected to every internal switch......[SUCCESS ] Are any external switches connected to each other................[SUCCESS ] Are any hosts connected to spine switch..........................[SUCCESS ] Check if all hosts have 2 CAs to different switches..............[ERROR ] Node burxdb01 has 1 endpoints.(Should be 2) Port 2 of this node is not connected to any switch --------fattree End Point Cabling verifation failed----- Leaf switch check:cardinality and even distribution..............[ERROR ] Internal QDR Switch 0x21283a87b8a0a0 has fewer than 4 compute nodes It has only 3 links belonging to compute nodes...................[SUCCESS ] Check if each rack has an valid internal ring....................[ERROR ] Switches 0x21283a87cba0a0 0x21283a87b8a0a0 have 6 connections between them They should have at least 7 links between them
Setting the Subnet Manager Master on Oracle Exadata Database Machine Full Rack and Oracle Exadata Database Machine Half Rack
Recovery Appliance X3-2 systems and Recovery Appliance X2-2 systems have three Sun Datacenter InfiniBand Switch 36 switches. Starting with Recovery Appliance X4-2, Recovery Appliance Systems have two Sun Datacenter InfiniBand Switch 36 switches.
Note:
This procedure does not apply to Recovery Appliance X8M racks with RoCE Network Fabric.The switch located in rack unit 1 (U1) is referred to as the spine switch. The other two switches are referred to as the leaf switches. The location of the leaf switches is as follows:
-
Recovery Appliance Two-Socket Systems (X3-2 and later): rack unit 20 (U20) and rack unit 22 (U22)
-
Recovery Appliance X2-2 racks: rack unit 20 (U20) and rack unit 24 (U24)
-
Recovery Appliance Eight-Socket Systems (X2-8 and later) Full Racks: Rack unit 21 (U21) and rack unit 23 (U23)
The spine switch is the Subnet Manager Master for the InfiniBand Network Fabric subnet. The Subnet Manager Master has priority 8, and can be verified using the following procedure:
-
Log in to the spine switch as the
root
user. -
Run the
setsmpriority list
command.The command should show that
smpriority
has a value of 8. Ifsmpriority
has a different value, then do the following:-
Use the
disablesm
command to stop the Subnet Manager. -
Use the
setsmpriority 8
command to set the priority to 8. -
Use the
enablesm
command to restart the Subnet Manager.
-
The leaf switches are the Standby Subnet Managers with a priority of 5. This can be verified using the preceding procedure, substituting a value of 5 in the setsmpriority
command above.
Note:
Recovery Appliance Half Rack with Sun Fire X4170 Oracle Database Servers include two Sun Datacenter InfiniBand Switch 36 switches, which are set to priority 5.To determine the Subnet Manager Master, log in as the root
user on any InfiniBand Network Fabric switch, and run the getmaster
command. The location of the Subnet Manager Master is displayed. The following is an example of the output from the getmaster
command:
# getmaster
20100701 11:46:38 OpenSM Master on Switch : 0x0021283a8516a0a0 ports 36 Sun DCS 36
QDR switch dm01sw-ib1.example.com enhanced port 0 lid 1 lmc 0
The preceding output shows the proper configuration. The Subnet Master Manager is running on spine switch dm01sw-ib1.example.com
.
If the spine switch is not the Subnet Manager Master, then do the following procedure to set the Subnet Manager Master:
-
Use the
getmaster
command to identify the current location of the Subnet Manager Master. -
Log in as the
root
user on the leaf switch that is the Subnet Manager Master. -
Disable Subnet Manager on the switch. The Subnet Manager Master relocates to another switch.
-
Use the
getmaster
command to identify the current location of the Subnet Manager Master. If the spine switch is not Subnet Manager Master, then repeat steps 2 and 3 until the spine switch is the Subnet Manager Master. -
Enable Subnet Manager on the leaf switches that were disabled during this procedure.
Note:
- If the InfiniBand Network Fabric network consists of four or more racks cabled together, then only the spine switches should run Subnet Manager. The leaf switches should have Subnet Manager disabled on them.
- Recovery Appliance Half Racks with Sun Fire X4170 Oracle Database Servers, and Recovery Appliance Quarter Racks have two Sun Datacenter InfiniBand Switch 36 switches, and both are set to priority 5. The master is the one with the lowest GUID.
See Also:
- "Enable the Subnet Manager" in Sun Datacenter InfiniBand Switch 36 User's Guide
- "Disable the Subnet Manager" in Sun Datacenter InfiniBand Switch 36 User's Guide
- Oracle Exadata Database Machine System Overview for hardware component information
- Cabling tables in Oracle Exadata Database Machine System Overview