10 Configuring a Recovery Appliance Rack
This chapter describes how to configure the hardware components of a Recovery Appliance rack. It contains the following sections:
Note:
The procedures in this chapter use the files generated by Oracle Exadata Deployment Assistant. You must run this utility before doing the procedures in this chapter.
Supporting Auto Service Request
Auto Service Request is an optional component of Recovery Appliance. To configure Recovery Appliance for Auto Service Request, ASR Manager must be installed first.
Prerequisites for Using Auto Service Request
Verify that Auto Service Request was selected for use in Oracle Exadata Deployment Assistant. Recovery Appliance cannot also be used with Oracle Advanced Support Gateway or Oracle Platinum Gateway.
You must know the IP address and the root password of the ASR Manager host.
Checking an Existing ASR Manager Installation
If ASR Manager is already operating at the site, then verify that it is version 4.5 or higher. Otherwise, you must upgrade it.
To obtain the version number of ASR Manager:
-
On a Linux system:
# rpm -qa | grep SUNWswasr SUNWswasr-2.7-1
-
On a Solaris system:
# pkginfo -l SUNWswasr PKGINST: SUNWswasr NAME: SASM ASR Plugin CATEGORY: application ARCH: all VERSION: 2.6 BASEDIR: / VENDOR: Sun Microsystems, Inc. . . .
The output from the previous examples indicate that ASR Manager must be updated to 4.5 or higher.
Installing ASR Manager
If ASR Manager is not already installed, then follow the instructions in Setting Up Auto Service Request. After you register ASR Manager with the Oracle ASR back end, return to these instructions for configuring Recovery Appliance.
Installing the Tape Hardware
Oracle Secure Backup tape backup is an option to Recovery Appliance. You must install the QLogic ZLE8362 fiber cards and transceivers on site; they are not factory installed.
The QLogic fiber cards are shipped from Oracle as ride-alongs with the rack. The transceivers are shipped directly from the supplier.
To install the tape networking hardware:
See Also:
My Oracle Support Doc ID 1592317.1 for full instructions about replacing a PCIe card
Connecting Recovery Appliance to Your Networks
Before configuring the individual devices in the Recovery Appliance rack, ensure that there are no IP address conflicts between the factory settings for the rack and the existing network.
To prepare the Recovery Appliance rack for configuration:
See Also:
Configuring the InfiniBand Switches
The two Sun Datacenter InfiniBand Switch 36 leaf switches are identified in Recovery Appliance as iba
and ibb
. Complete these configuration procedures for both switches:
Configuring an InfiniBand Switch
The default identifier for leaf switch 1 in U20 is iba
, and for leaf switch 2 in U22 is ibb
.
To configure a Sun Datacenter InfiniBand Switch 36 switch:
Setting the Serial Number on a Spine Switch
In a multirack configuration, set the rack master serial number in the ILOM of the spine switch. Skip this procedure when configuring the leaf switches.
To set the serial number on the spine switch:
Checking the Health of an InfiniBand Switch
To check the health of an InfiniBand switch:
-
Open the fabric management shell:
-> show /SYS/Fabric_Mgmt NOTE: show on Fabric_Mgmt will launch a restricted Linux shell. User can execute switch diagnosis, SM Configuration and IB monitoring commands in the shell. To view the list of commands, use "help" at rsh prompt. Use exit command at rsh prompt to revert back to ILOM shell. FabMan@hostname->
The prompt changes from -> to FabMan@hostname->
-
Check the general health of the switch:
FabMan@ra1sw-iba-> showunhealthy OK - No unhealthy sensors
-
Check the general environment.
FabMan@ra1sw-iba-> env_test NM2 Environment test started: Starting Voltage test: Voltage ECB OK Measured 3.3V Main = 3.28 V Measured 3.3V Standby = 3.42 V Measured 12V = 12.06 V . . .
The report should show that fans 1, 2, and 3 are present, and fans 0 and 4 are not present. All OK and Passed results indicate that the environment is normal.
-
Determine the current InfiniBand subnet manager priority of the switch. Leaf switches must have an smpriority of 5, and spine switches must have a smpriority of 8. The sample output shown here indicates the correct priority for a leaf switch.
FabMan@ra1sw-iba-> setsmpriority list Current SM settings: smpriority 5 controlled_handover TRUE subnet_prefix 0xfe80000000000000
-
If the priority setting is incorrect, then reset it:
-
Disable the subnet manager:
FabMan@ra1sw-iba->disablesm Stopping partitiond daemon. [ OK ] Stopping IB Subnet Manager.. [ OK ]
-
Reset the priority. This example sets the priority on a leaf switch:
FabMan@ra1sw-iba->setsmpriority 5 Current SM settings: smpriority 5 controlled_handover TRUE subnet_prefix 0xfe80000000000000
-
Restart the subnet manager:
FabMan@ra1sw-iba->enablesm Starting IB Subnet Manager. [ OK ] Starting partitiond daemon. [ OK ]
-
-
Log out of the Fabric Management shell and the Oracle ILOM shell:
FabMan@ra1sw-iba-> exit -> exit
-
Log in to Linux as root and restart the switch:
localhost: root password: welcome1 [root@localhost ~]# reboot
-
Disconnect your laptop from the InfiniBand switch.
-
Repeat these procedures for the second InfiniBand leaf switch.
Setting a Spine Switch as the Subnet Manager Master
The InfiniBand switch located in rack unit 1 (U1) is the spine switch. Recovery Appliance has a spine switch only when it is connected to another Recovery Appliance. It is not included as a basic component of the rack.
Perform these steps after the racks are cabled together
The spine switch is the Subnet Manager Master for the InfiniBand subnet. The Subnet Manager Master has priority 8.
To verify the priority setting of the spine switch:
-
Log in to the spine switch as the
root
user. -
Run the
setsmpriority list
command.The command should show that
smpriority
has a value of 8. Ifsmpriority
has a different value, then do the following:-
Use the
disablesm
command to stop the Subnet Manager. -
Use the
setsmpriority 8
command to set the priority to 8. -
Use the
enablesm
command to restart the Subnet Manager.
-
The other two InfiniBand switches are the leaf switches. The leaf switches are located in rack units 20 and 22 (U20 and U22). They are the Standby Subnet Managers with a priority of 5. You can verify the status using the preceding procedure, substituting a value of 5 in the command shown in step 22.b.
To determine the Subnet Manager Master:
-
Log in as the
root
user on any InfiniBand switch. -
Display the location of the Subnet Manager Master.
# getmaster 20100701 11:46:38 OpenSM Master on Switch : 0x0021283a8516a0a0 ports 36 Sun DCS 36 QDR switch ra01sw-ib1.example.com enhanced port 0 lid 1 lmc 0
The preceding output shows the proper configuration. The Subnet Master Manager is running on spine switch ra01sw-ib1.example.com.
If the spine switch is not the Subnet Manager Master, then reset the Subnet Manager Master:
-
Use the
getmaster
command to identify the current location of the Subnet Manager Master. -
Log in as the
root
user on the leaf switch that is the Subnet Manager Master. -
Disable Subnet Manager on the switch. The Subnet Manager Master relocates to another switch.
See Also:
"Disable the Subnet Manager" in Sun Datacenter InfiniBand Switch 36 User's Guide at
http://docs.oracle.com/cd/E19197-01/835-0784-05/z4001de61813698.html#z40003f12047367
-
Use the
getmaster
command to identify the current location of the Subnet Manager Master. If the spine switch is not Subnet Manager Master, then repeat steps 2 and 3 until the spine switch is the Subnet Manager Master. -
Enable Subnet Manager on the leaf switches that were disabled during this procedure.
See Also:
"Enable the Subnet Manager" in Sun Datacenter InfiniBand Switch 36 User's Guide at
http://docs.oracle.com/cd/E19197-01/835-0784-05/z4001de61707660.html#z40003f12047359
Note:
If the InfiniBand network consists of four or more racks cabled together, then only the spine switches run Subnet Manager. Disable the Subnet Manager on the leaf switches.
Configuring the Cisco Ethernet Switch
The Cisco Catalyst 4948 Ethernet switch supplied with Recovery Appliance has IPBASEK9-MZ firmware. The switch is minimally configured during installation. These procedures configure the Cisco Ethernet switch into one large virtual LAN.
The Cisco Ethernet switch configuration consists of these topics and procedures:
Scope of the Configuration
The minimal configuration disables IP routing, and sets the following:
-
Host name
-
IP address
-
Subnet mask
-
Default gateway
-
Domain name
-
Name server
-
NTP server
-
Time
-
Time zone
Prerequisites for Configuring the Ethernet Switch
To avoid disrupting the customer network, observe these prerequisites:
-
Do not connect the Cisco Ethernet switch until the network administrator has verified the running configuration and has made any necessary changes.
-
Do not connect the Cisco Ethernet switch until the IP addresses on all components of Recovery Appliance are configured. This sequence prevents any duplicate IP address conflicts, which might occur with the factory settings.
-
Configure the Cisco Ethernet switch with the network administrator.
Configuring the Ethernet Switch on the Data Center Network
The following procedure describes how to configure the Cisco Ethernet switch. Configuration should be done with the network administrator.
Disabling Telnet Connections
Telnet is not secure, and should not be enabled unless there is a compelling reason. To enable telnet, set a password. To disable it, remove the password.
To disable Telnet connections:
Configuring SSH on the Ethernet Switch
To configure a secure shell (SSH) on the Ethernet switch:
-
Enter the commands shown in this example:
ra1sw-ip# configure terminal Enter configuration commands, one per line. End with CNTL/Z. ra1sw-ip(config)# crypto key generate rsa % You already have RSA keys defined named ra1sw-ip.example.com. % Do you really want to replace them? [yes/no]: yes Choose the size of the key modulus in the range of 360 to 2048 for your General Purpose Keys. Choosing a key modulus greater than 512 may take a few minutes. How many bits in the modulus [512]: 768 % Generating 768 bit RSA keys, keys will be non-exportable...[OK] ra1sw-ip(config)# username admin password 0 welcome1 ra1sw-ip(config)# line vty 0 15 ra1sw-ip(config-line)# transport input ssh ra1sw-ip(config-line)# exit ra1sw-ip(config)# aaa new-model ra1sw-ip(config)# ip ssh time-out 60 ra1sw-ip(config)# ip ssh authentication-retries 3 ra1sw-ip(config)# ip ssh version 2 ra1sw-ip(config)# end *Sep 15 14:26:37.045: %SYS-5-CONFIG_I: Configured from console by console ra1sw-ip# write memory Building configuration... Compressed configuration from 2603 bytes to 1158 bytes[OK]
Setting the Clock and Time Zone on the Ethernet Switch
To set the time on the Cisco Ethernet switch:
Disabling Spanning Tree on the Ethernet Switch
Spanning tree is enabled by default on Cisco switches. If you add a switch with spanning tree enabled to the network, then you might cause network problems. As a precaution, you can disable spanning tree from the uplink port VLAN before connecting the switch to the network. Alternatively, you can turn on spanning tree protocol with specific protocol settings either before or after connecting to the network.
To disable spanning tree on the uplink port VLAN:
To re-enable spanning tree protocol with the default protocol settings:
-
Use the commands shown in this example:
ra1sw-ip# configure terminal Enter configuration commands, one per line. End with CNTL/Z. ra1sw-ip(config)# spanning-tree vlan 1 ra1sw-ip(config)# end ra1sw-ip# write memory
See Also:
Cisco Switch Configuration Guide to enable spanning tree protocol with the specific protocol settings required by the data center Ethernet network
Configuring the Power Distribution Units
The power distribution units (PDUs) are configured with static IP addresses to connect to the network for monitoring.
Checking the Health of the Compute Servers
To check the two compute servers in U16 and U17:
-
Power on both compute servers if they are no up already, and wait while they initialize the BIOS and load the Linux operating system.
-
Use a serial cable to connect your laptop to the first compute server's serial MGT port.
-
Configure your laptop's terminal emulator to use these settings:
-
9600 baud
-
8 bit
-
1 stop bit
-
No parity bit
-
No handshake
-
No flow control
-
-
Log in as the
root
user with thewelcome1
password.-
On the first compute server (which is connected to your laptop), open the Oracle ILOM console, and then log in:
-> start /SP/console
-
On the second compute server, use SSH to log in. The default factory IP address is 192.168.1.109.
-
-
Verify that the rack master and host serial numbers are set correctly. The first number must match the rack serial number, and the second number must match the SysSN label on the front panel of the server.
# ipmitool sunoem cli "show /System" | grep serial serial_number = AK12345678 component_serial_number = 1234NM567H
-
Verify that the model and rack serial numbers are set correctly:
# ipmitool sunoem cli "show /System" | grep model model = ZDLRA X5 # ipmitool sunoem cli "show /System" | grep ident system_identifier = Oracle Zero Data Loss Recovery Appliance X5 AK12345678
-
Verify that the management network is working:
# ethtool eth0 | grep det Link detected: yes
-
Verify that the ILOM management network is working:
# ipmitool sunoem cli 'show /SP/network' | grep ipadd ipaddress = 192.168.1.108 pendingipaddress = 192.168.1.108
-
Verify that Oracle ILOM can detect the optional QLogic PCIe cards, if they are installed:
# ipmitool sunoem cli "show /System/PCI_Devices/Add-on/Device_1" Connected. Use ^D to exit. -> show /System/PCI_Devices/Add-on/Device_1 /System/PCI_Devices/Add-on/Device_1 Targets: Properties: part_number = 7101674 description = Sun Storage 16 Gb Fibre Channel PCIe Universal FC HBA, Qlogic location = PCIE1 (PCIe Slot 1) pci_vendor_id = 0x1077 pci_device_id = 0x2031 pci_subvendor_id = 0x1077 pci_subdevice_id = 0x024d Commands: cd show -> Session closed Disconnected
See "Installing the Tape Hardware" for information about the QLogic PCIe cards.
-
Verify that all memory is present (256 GB):
# grep MemTotal /proc/meminfo MemTotal: 264232892 kB [
The value might vary slightly, depending on the BIOS version. However, if the value is smaller, then use the Oracle ILOM event logs to identify the faulty memory.
-
Verify that the four disks are visible, online, and numbered from slot 0 to slot 3:
# cd /opt/MegaRAID/MegaCli/ # ./MegaCli64 -Pdlist -a0 | grep "Slot\|Firmware state" Slot Number: 0 Firmware state: Online, Spun Up Slot Number: 1 Firmware state: Online, Spun Up Slot Number: 2 Firmware state: Online, Spun Up Slot Number: 3 Firmware state: Online, Spun Up
-
Verify that the hardware logical volume is set up correctly. Look for Virtual Disk 0 as RAID5 with four drives and no hot spares:
[root@db01 ~]# cd /opt/MegaRAID/MegaCli [root@db01 MegaCli]# ./MegaCli64 -LdInfo -lAll -a0 Adapter 0 -- Virtual Drive Information: Virtual Drive: 0 (Target Id: 0) Name :DBSYS RAID Level : Primary-5, Secondary-0, RAID Level Qualifier-3 Size : 1.633 TB Physical Sector Size: 512 Logical Sector Size : 512 VD has Emulated PD : No Parity Size : 557.861 GB State : Optimal Strip Size : 1.0 MB Number Of Drives : 4 Span Depth : 1 . . .
-
Verify that the hardware profile is operating correctly:
# /opt/oracle.SupportTools/CheckHWnFWProfile [SUCCESS] The hardware and firmware matches supported profile for server=ORACLE_SERVER_X5-2
The previous output shows correct operations. However, the following response indicates a problem that you must correct before continuing:
[WARNING] The hardware and firmware are not supported. See details below [InfinibandHCAPCIeSlotWidth] Requires: x8 Found: x4 [WARNING] The hardware and firmware are not supported. See details above
Use the
--help
argument to review the available options, such as obtaining more detailed output. -
When connected to the first compute server only:
-
Verify the IP address of the first compute server:
# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:10:E0:3C:EA:B0 inet addr:172.16.2.44 Bcast:172.16.2.255 Mask:255.255.255.0 inet6 addr: fe80::210:e0ff:fe3c:eab0/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:7470193 errors:0 dropped:0 overruns:0 frame:0 TX packets:4318201 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:872195171 (831.7 MiB) TX bytes:2444529519 (2.2 GiB)
-
Verify the IP address of the second compute server:
# ibhosts Ca : 0x0010e0000159c61c ports 2 "node4 elasticNode 172.16.2.40,172.16.2.40 ETH0" Ca : 0x0010e000015a46f0 ports 2 "node10 elasticNode 172.16.2.46,172.16.2.46 ETH0" Ca : 0x0010e0000159d96c ports 2 "node1 elasticNode 172.16.2.37,172.16.2.37 ETH0" Ca : 0x0010e0000159c51c ports 2 "node2 elasticNode 172.16.2.38,172.16.2.38 ETH0" Ca : 0x0010e000015a5710 ports 2 "node8 elasticNode 172.16.2.44,172.16.2.44 ETH0"
-
-
Disconnect from the server:
-
First compute server:
exit
-
Second compute server:
logout
-
-
Repeat these steps for the second compute server.
Checking the Health of the Storage Servers
A Recovery Appliance X5 rack has three to 18 storage servers, and a Recovery Appliance X4 rack has three to 14 storage servers. Begin at the bottom of the rack and check each server.
To check a storage server:
Configuring the IP Addresses of the Servers
Before installing Recovery Appliance software, you must run a script to configure the compute and storage servers with proper IP addresses. Otherwise, the install.sh script will fail when it tries to configure the networks.
Follow the instructions in My Oracle Support Doc ID 1953915.1 to configure the IP addresses.