Checking the Health of the Compute Servers

To check the two compute servers in U16 and U17:

  1. Power on both compute servers if they are no up already, and wait while they initialize the BIOS and load the Linux operating system.

  2. Use a serial cable to connect your laptop to the first compute server's serial MGT port.

  3. Configure your laptop's terminal emulator to use these settings:

    • 9600 baud

    • 8 bit

    • 1 stop bit

    • No parity bit

    • No handshake

    • No flow control

  4. Log in as the root user with the welcome1 password.

    • On the first compute server (which is connected to your laptop), open the Oracle ILOM console, and then log in:

      -> start /SP/console
      
    • On the second compute server, use SSH to log in. The default factory IP address is 192.168.1.109.

  5. Verify that the rack master and host serial numbers are set correctly. The first number must match the rack serial number, and the second number must match the SysSN label on the front panel of the server.

    # ipmitool sunoem cli "show /System" | grep serial
         serial_number = AK12345678
         component_serial_number = 1234NM567H
    
  6. Verify that the model and rack serial numbers are set correctly:

    # ipmitool sunoem cli "show /System" | grep model
         model = ZDLRA X5
    # ipmitool sunoem cli "show /System" | grep ident
         system_identifier = Oracle Zero Data Loss Recovery Appliance X5 AK12345678
    
  7. Verify that the management network is working:

    # ethtool eth0 | grep det
    Link detected: yes
    
  8. Verify that the ILOM management network is working:

    # ipmitool sunoem cli 'show /SP/network' | grep ipadd
    ipaddress = 192.168.1.108
    pendingipaddress = 192.168.1.108
    
  9. Verify that Oracle ILOM can detect the optional QLogic PCIe cards, if they are installed:

    # ipmitool sunoem cli "show /System/PCI_Devices/Add-on/Device_1"
    Connected. Use ^D to exit.
    -> show /System/PCI_Devices/Add-on/Device_1
      /System/PCI_Devices/Add-on/Device_1
      Targets:
    
      Properties:
        part_number = 7101674
        description = Sun Storage 16 Gb Fibre Channel PCIe Universal FC HBA,
                      Qlogic
        location = PCIE1 (PCIe Slot 1)
        pci_vendor_id = 0x1077
        pci_device_id = 0x2031
        pci_subvendor_id = 0x1077
        pci_subdevice_id = 0x024d
    
      Commands:
        cd
        show
    
    -> Session closed
    Disconnected
    

    See "Installing the Tape Hardware" for information about the QLogic PCIe cards.

  10. Verify that all memory is present (256 GB):

    # grep MemTotal /proc/meminfo
    MemTotal: 264232892 kB
    [

    The value might vary slightly, depending on the BIOS version. However, if the value is smaller, then use the Oracle ILOM event logs to identify the faulty memory.

  11. Verify that the four disks are visible, online, and numbered from slot 0 to slot 3:

    # cd /opt/MegaRAID/MegaCli/
    # ./MegaCli64 -Pdlist -a0 | grep "Slot\|Firmware state"
    Slot Number: 0
    Firmware state: Online, Spun Up
    Slot Number: 1
    Firmware state: Online, Spun Up
    Slot Number: 2
    Firmware state: Online, Spun Up
    Slot Number: 3
    Firmware state: Online, Spun Up
    
  12. Verify that the hardware logical volume is set up correctly. Look for Virtual Disk 0 as RAID5 with four drives and no hot spares:

    [root@db01 ~]# cd /opt/MegaRAID/MegaCli
    [root@db01 MegaCli]# ./MegaCli64 -LdInfo -lAll -a0
    Adapter 0 -- Virtual Drive Information:
    Virtual Drive: 0 (Target Id: 0)
    Name :DBSYS
    RAID Level : Primary-5, Secondary-0, RAID Level Qualifier-3
    Size : 1.633 TB
    Physical Sector Size: 512
    Logical Sector Size : 512
    VD has Emulated PD : No
    Parity Size : 557.861 GB
    State : Optimal
    Strip Size : 1.0 MB
    Number Of Drives : 4
    Span Depth : 1
         .
         .
         .
    
  13. Verify that the hardware profile is operating correctly:

    # /opt/oracle.SupportTools/CheckHWnFWProfile
    [SUCCESS] The hardware and firmware matches supported profile for
    server=ORACLE_SERVER_X5-2
    

    The previous output shows correct operations. However, the following response indicates a problem that you must correct before continuing:

    [WARNING] The hardware and firmware are not supported. See details below
    [InfinibandHCAPCIeSlotWidth]
    Requires:
    x8
    Found:
    x4
    [WARNING] The hardware and firmware are not supported. See details above
    

    Use the --help argument to review the available options, such as obtaining more detailed output.

  14. When connected to the first compute server only:

    1. Verify the IP address of the first compute server:

      # ifconfig eth0
      eth0 Link encap:Ethernet HWaddr 00:10:E0:3C:EA:B0
           inet addr:172.16.2.44 Bcast:172.16.2.255 Mask:255.255.255.0
           inet6 addr: fe80::210:e0ff:fe3c:eab0/64 Scope:Link
           UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
           RX packets:7470193 errors:0 dropped:0 overruns:0 frame:0
           TX packets:4318201 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:872195171 (831.7 MiB) TX bytes:2444529519 (2.2 GiB)
      
    2. Verify the IP address of the second compute server:

      # ibhosts
      Ca : 0x0010e0000159c61c ports 2 "node4 elasticNode 172.16.2.40,172.16.2.40 ETH0"
      Ca : 0x0010e000015a46f0 ports 2 "node10 elasticNode 172.16.2.46,172.16.2.46 ETH0"
      Ca : 0x0010e0000159d96c ports 2 "node1 elasticNode 172.16.2.37,172.16.2.37 ETH0"
      Ca : 0x0010e0000159c51c ports 2 "node2 elasticNode 172.16.2.38,172.16.2.38 ETH0"
      Ca : 0x0010e000015a5710 ports 2 "node8 elasticNode 172.16.2.44,172.16.2.44 ETH0"
  15. Disconnect from the server:

    • First compute server: exit

    • Second compute server: logout

  16. Repeat these steps for the second compute server.