This chapter describes how to configure the hardware components of a Recovery Appliance rack. It contains the following sections:
Note:
The procedures in this chapter use the files generated by Oracle Exadata Deployment Assistant. You must run this utility before doing the procedures in this chapter.
Auto Service Request is an optional component of Recovery Appliance. To configure Recovery Appliance for Auto Service Request, ASR Manager must be installed first.
Verify that Auto Service Request was selected for use in Oracle Exadata Deployment Assistant. Recovery Appliance cannot also be used with Oracle Advanced Support Gateway or Oracle Platinum Gateway.
You must know the IP address and the root password of the ASR Manager host.
If ASR Manager is already operating at the site, then verify that it is version 4.5 or higher. Otherwise, you must upgrade it.
To obtain the version number of ASR Manager:
On a Linux system:
# rpm -qa | grep SUNWswasr
SUNWswasr-2.7-1
On a Solaris system:
# pkginfo -l SUNWswasr
PKGINST: SUNWswasr
NAME: SASM ASR Plugin
CATEGORY: application
ARCH: all
VERSION: 2.6
BASEDIR: /
VENDOR: Sun Microsystems, Inc.
.
.
.
The output from the previous examples indicate that ASR Manager must be updated to 4.5 or higher.
If ASR Manager is not already installed, then follow the instructions in Setting Up Auto Service Request. After you register ASR Manager with the Oracle ASR back end, return to these instructions for configuring Recovery Appliance.
Oracle Secure Backup tape backup is an option to Recovery Appliance. You must install the QLogic ZLE8362 fiber cards and transceivers on site; they are not factory installed.
The QLogic fiber cards are shipped from Oracle as ride-alongs with the rack. The transceivers are shipped directly from the supplier.
To install the tape networking hardware:
See Also:
My Oracle Support Doc ID 1592317.1 for full instructions about replacing a PCIe card
Before configuring the individual devices in the Recovery Appliance rack, ensure that there are no IP address conflicts between the factory settings for the rack and the existing network.
To prepare the Recovery Appliance rack for configuration:
See Also:
The two Sun Datacenter InfiniBand Switch 36 leaf switches are identified in Recovery Appliance as iba
and ibb
. Complete these configuration procedures for both switches:
The default identifier for leaf switch 1 in U20 is iba
, and for leaf switch 2 in U22 is ibb
.
To configure a Sun Datacenter InfiniBand Switch 36 switch:
In a multirack configuration, set the rack master serial number in the ILOM of the spine switch. Skip this procedure when configuring the leaf switches.
To set the serial number on the spine switch:
To check the health of an InfiniBand switch:
Open the fabric management shell:
-> show /SYS/Fabric_Mgmt NOTE: show on Fabric_Mgmt will launch a restricted Linux shell. User can execute switch diagnosis, SM Configuration and IB monitoring commands in the shell. To view the list of commands, use "help" at rsh prompt. Use exit command at rsh prompt to revert back to ILOM shell. FabMan@hostname->
The prompt changes from -> to FabMan@hostname->
Check the general health of the switch:
FabMan@ra1sw-iba-> showunhealthy
OK - No unhealthy sensors
Check the general environment.
FabMan@ra1sw-iba-> env_test
NM2 Environment test started:
Starting Voltage test:
Voltage ECB OK
Measured 3.3V Main = 3.28 V
Measured 3.3V Standby = 3.42 V
Measured 12V = 12.06 V
.
.
.
The report should show that fans 1, 2, and 3 are present, and fans 0 and 4 are not present. All OK and Passed results indicate that the environment is normal.
Determine the current InfiniBand subnet manager priority of the switch. Leaf switches must have an smpriority of 5, and spine switches must have a smpriority of 8. The sample output shown here indicates the correct priority for a leaf switch.
FabMan@ra1sw-iba-> setsmpriority list
Current SM settings:
smpriority 5
controlled_handover TRUE
subnet_prefix 0xfe80000000000000
If the priority setting is incorrect, then reset it:
Disable the subnet manager:
FabMan@ra1sw-iba->disablesm
Stopping partitiond daemon. [ OK ]
Stopping IB Subnet Manager.. [ OK ]
Reset the priority. This example sets the priority on a leaf switch:
FabMan@ra1sw-iba->setsmpriority 5
Current SM settings:
smpriority
5 controlled_handover TRUE
subnet_prefix 0xfe80000000000000
Restart the subnet manager:
FabMan@ra1sw-iba->enablesm
Starting IB Subnet Manager. [ OK ]
Starting partitiond daemon. [ OK ]
Log out of the Fabric Management shell and the Oracle ILOM shell:
FabMan@ra1sw-iba-> exit -> exit
Log in to Linux as root and restart the switch:
localhost: root password: welcome1 [root@localhost ~]# reboot
Disconnect your laptop from the InfiniBand switch.
Repeat these procedures for the second InfiniBand leaf switch.
The InfiniBand switch located in rack unit 1 (U1) is the spine switch. Recovery Appliance has a spine switch only when it is connected to another Recovery Appliance. It is not included as a basic component of the rack.
Perform these steps after the racks are cabled together
The spine switch is the Subnet Manager Master for the InfiniBand subnet. The Subnet Manager Master has priority 8.
To verify the priority setting of the spine switch:
Log in to the spine switch as the root
user.
Run the setsmpriority list
command.
The command should show that smpriority
has a value of 8. If smpriority
has a different value, then do the following:
Use the disablesm
command to stop the Subnet Manager.
Use the setsmpriority 8
command to set the priority to 8.
Use the enablesm
command to restart the Subnet Manager.
The other two InfiniBand switches are the leaf switches. The leaf switches are located in rack units 20 and 22 (U20 and U22). They are the Standby Subnet Managers with a priority of 5. You can verify the status using the preceding procedure, substituting a value of 5 in the command shown in step 22.b.
To determine the Subnet Manager Master:
Log in as the root
user on any InfiniBand switch.
Display the location of the Subnet Manager Master.
# getmaster
20100701 11:46:38 OpenSM Master on Switch : 0x0021283a8516a0a0 ports 36 Sun DCS 36
QDR switch ra01sw-ib1.example.com enhanced port 0 lid 1 lmc 0
The preceding output shows the proper configuration. The Subnet Master Manager is running on spine switch ra01sw-ib1.example.com.
If the spine switch is not the Subnet Manager Master, then reset the Subnet Manager Master:
Use the getmaster
command to identify the current location of the Subnet Manager Master.
Log in as the root
user on the leaf switch that is the Subnet Manager Master.
Disable Subnet Manager on the switch. The Subnet Manager Master relocates to another switch.
See Also:
"Disable the Subnet Manager" in Sun Datacenter InfiniBand Switch 36 User's Guide at
http://docs.oracle.com/cd/E19197-01/835-0784-05/z4001de61813698.html#z40003f12047367
Use the getmaster
command to identify the current location of the Subnet Manager Master. If the spine switch is not Subnet Manager Master, then repeat steps 2 and 3 until the spine switch is the Subnet Manager Master.
Enable Subnet Manager on the leaf switches that were disabled during this procedure.
See Also:
"Enable the Subnet Manager" in Sun Datacenter InfiniBand Switch 36 User's Guide at
http://docs.oracle.com/cd/E19197-01/835-0784-05/z4001de61707660.html#z40003f12047359
Note:
If the InfiniBand network consists of four or more racks cabled together, then only the spine switches run Subnet Manager. Disable the Subnet Manager on the leaf switches.
The Cisco Catalyst 4948 Ethernet switch supplied with Recovery Appliance has IPBASEK9-MZ firmware. The switch is minimally configured during installation. These procedures configure the Cisco Ethernet switch into one large virtual LAN.
The Cisco Ethernet switch configuration consists of these topics and procedures:
The minimal configuration disables IP routing, and sets the following:
Host name
IP address
Subnet mask
Default gateway
Domain name
Name server
NTP server
Time
Time zone
To avoid disrupting the customer network, observe these prerequisites:
Do not connect the Cisco Ethernet switch until the network administrator has verified the running configuration and has made any necessary changes.
Do not connect the Cisco Ethernet switch until the IP addresses on all components of Recovery Appliance are configured. This sequence prevents any duplicate IP address conflicts, which might occur with the factory settings.
Configure the Cisco Ethernet switch with the network administrator.
The following procedure describes how to configure the Cisco Ethernet switch. Configuration should be done with the network administrator.
Telnet is not secure, and should not be enabled unless there is a compelling reason. To enable telnet, set a password. To disable it, remove the password.
To disable Telnet connections:
To configure a secure shell (SSH) on the Ethernet switch:
Enter the commands shown in this example:
ra1sw-ip# configure terminal Enter configuration commands, one per line. End with CNTL/Z. ra1sw-ip(config)# crypto key generate rsa % You already have RSA keys defined named ra1sw-ip.example.com. % Do you really want to replace them? [yes/no]: yes Choose the size of the key modulus in the range of 360 to 2048 for your General Purpose Keys. Choosing a key modulus greater than 512 may take a few minutes. How many bits in the modulus [512]: 768 % Generating 768 bit RSA keys, keys will be non-exportable...[OK] ra1sw-ip(config)# username admin password 0 welcome1 ra1sw-ip(config)# line vty 0 15 ra1sw-ip(config-line)# transport input ssh ra1sw-ip(config-line)# exit ra1sw-ip(config)# aaa new-model ra1sw-ip(config)# ip ssh time-out 60 ra1sw-ip(config)# ip ssh authentication-retries 3 ra1sw-ip(config)# ip ssh version 2 ra1sw-ip(config)# end *Sep 15 14:26:37.045: %SYS-5-CONFIG_I: Configured from console by console ra1sw-ip# write memory Building configuration... Compressed configuration from 2603 bytes to 1158 bytes[OK]
To set the time on the Cisco Ethernet switch:
Spanning tree is enabled by default on Cisco switches. If you add a switch with spanning tree enabled to the network, then you might cause network problems. As a precaution, you can disable spanning tree from the uplink port VLAN before connecting the switch to the network. Alternatively, you can turn on spanning tree protocol with specific protocol settings either before or after connecting to the network.
To disable spanning tree on the uplink port VLAN:
To re-enable spanning tree protocol with the default protocol settings:
Use the commands shown in this example:
ra1sw-ip# configure terminal Enter configuration commands, one per line. End with CNTL/Z. ra1sw-ip(config)# spanning-tree vlan 1 ra1sw-ip(config)# end ra1sw-ip# write memory
See Also:
Cisco Switch Configuration Guide to enable spanning tree protocol with the specific protocol settings required by the data center Ethernet network
The power distribution units (PDUs) are configured with static IP addresses to connect to the network for monitoring.
To check the two compute servers in U16 and U17:
Power on both compute servers if they are no up already, and wait while they initialize the BIOS and load the Linux operating system.
Use a serial cable to connect your laptop to the first compute server's serial MGT port.
Configure your laptop's terminal emulator to use these settings:
9600 baud
8 bit
1 stop bit
No parity bit
No handshake
No flow control
Log in as the root
user with the welcome1
password.
On the first compute server (which is connected to your laptop), open the Oracle ILOM console, and then log in:
-> start /SP/console
On the second compute server, use SSH to log in. The default factory IP address is 192.168.1.109.
Verify that the rack master and host serial numbers are set correctly. The first number must match the rack serial number, and the second number must match the SysSN label on the front panel of the server.
# ipmitool sunoem cli "show /System" | grep serial
serial_number = AK12345678
component_serial_number = 1234NM567H
Verify that the model and rack serial numbers are set correctly:
# ipmitool sunoem cli "show /System" | grep model model = ZDLRA X5 # ipmitool sunoem cli "show /System" | grep ident system_identifier = Oracle Zero Data Loss Recovery Appliance X5 AK12345678
Verify that the management network is working:
# ethtool eth0 | grep det
Link detected: yes
Verify that the ILOM management network is working:
# ipmitool sunoem cli 'show /SP/network' | grep ipadd
ipaddress = 192.168.1.108
pendingipaddress = 192.168.1.108
Verify that Oracle ILOM can detect the optional QLogic PCIe cards, if they are installed:
# ipmitool sunoem cli "show /System/PCI_Devices/Add-on/Device_1"
Connected. Use ^D to exit.
-> show /System/PCI_Devices/Add-on/Device_1
/System/PCI_Devices/Add-on/Device_1
Targets:
Properties:
part_number = 7101674
description = Sun Storage 16 Gb Fibre Channel PCIe Universal FC HBA,
Qlogic
location = PCIE1 (PCIe Slot 1)
pci_vendor_id = 0x1077
pci_device_id = 0x2031
pci_subvendor_id = 0x1077
pci_subdevice_id = 0x024d
Commands:
cd
show
-> Session closed
Disconnected
See "Installing the Tape Hardware" for information about the QLogic PCIe cards.
Verify that all memory is present (256 GB):
# grep MemTotal /proc/meminfo
MemTotal: 264232892 kB
[
The value might vary slightly, depending on the BIOS version. However, if the value is smaller, then use the Oracle ILOM event logs to identify the faulty memory.
Verify that the four disks are visible, online, and numbered from slot 0 to slot 3:
# cd /opt/MegaRAID/MegaCli/ # ./MegaCli64 -Pdlist -a0 | grep "Slot\|Firmware state" Slot Number: 0 Firmware state: Online, Spun Up Slot Number: 1 Firmware state: Online, Spun Up Slot Number: 2 Firmware state: Online, Spun Up Slot Number: 3 Firmware state: Online, Spun Up
Verify that the hardware logical volume is set up correctly. Look for Virtual Disk 0 as RAID5 with four drives and no hot spares:
[root@db01 ~]# cd /opt/MegaRAID/MegaCli [root@db01 MegaCli]# ./MegaCli64 -LdInfo -lAll -a0 Adapter 0 -- Virtual Drive Information: Virtual Drive: 0 (Target Id: 0) Name :DBSYS RAID Level : Primary-5, Secondary-0, RAID Level Qualifier-3 Size : 1.633 TB Physical Sector Size: 512 Logical Sector Size : 512 VD has Emulated PD : No Parity Size : 557.861 GB State : Optimal Strip Size : 1.0 MB Number Of Drives : 4 Span Depth : 1 . . .
Verify that the hardware profile is operating correctly:
# /opt/oracle.SupportTools/CheckHWnFWProfile
[SUCCESS] The hardware and firmware matches supported profile for
server=ORACLE_SERVER_X5-2
The previous output shows correct operations. However, the following response indicates a problem that you must correct before continuing:
[WARNING] The hardware and firmware are not supported. See details below [InfinibandHCAPCIeSlotWidth] Requires: x8 Found: x4 [WARNING] The hardware and firmware are not supported. See details above
Use the --help
argument to review the available options, such as obtaining more detailed output.
When connected to the first compute server only:
Verify the IP address of the first compute server:
# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:10:E0:3C:EA:B0 inet addr:172.16.2.44 Bcast:172.16.2.255 Mask:255.255.255.0 inet6 addr: fe80::210:e0ff:fe3c:eab0/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:7470193 errors:0 dropped:0 overruns:0 frame:0 TX packets:4318201 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:872195171 (831.7 MiB) TX bytes:2444529519 (2.2 GiB)
Verify the IP address of the second compute server:
# ibhosts
Ca : 0x0010e0000159c61c ports 2 "node4 elasticNode 172.16.2.40,172.16.2.40 ETH0"
Ca : 0x0010e000015a46f0 ports 2 "node10 elasticNode 172.16.2.46,172.16.2.46 ETH0"
Ca : 0x0010e0000159d96c ports 2 "node1 elasticNode 172.16.2.37,172.16.2.37 ETH0"
Ca : 0x0010e0000159c51c ports 2 "node2 elasticNode 172.16.2.38,172.16.2.38 ETH0"
Ca : 0x0010e000015a5710 ports 2 "node8 elasticNode 172.16.2.44,172.16.2.44 ETH0"
Disconnect from the server:
First compute server: exit
Second compute server: logout
Repeat these steps for the second compute server.
A Recovery Appliance X5 rack has three to 18 storage servers, and a Recovery Appliance X4 rack has three to 14 storage servers. Begin at the bottom of the rack and check each server.
To check a storage server:
Before installing Recovery Appliance software, you must run a script to configure the compute and storage servers with proper IP addresses. Otherwise, the install.sh script will fail when it tries to configure the networks.
Follow the instructions in My Oracle Support Doc ID 1953915.1 to configure the IP addresses.