2 Troubleshooting and Diagnostics

This section includes maintenance-related information and procedures that you can use to troubleshoot and repair server hardware issues.

This section includes information about diagnosing and troubleshooting hardware component faults for Exadata Server X10M. For more information about server diagnostics and troubleshooting, refer to Oracle x86 Servers Diagnostics and Troubleshooting Guide at Oracle x86 Servers Administration, Diagnostics, and Applications Documentation.

Diagnosing Server Component Hardware Faults

This section contains maintenance-related information and procedures that you can use to troubleshoot and repair server hardware issues.

When a server hardware fault event occurs, the system lights the Fault-Service Required LED and captures the event in the Oracle ILOM event log. If you set up notifications through Oracle ILOM, you also receive an alert through the notification method you choose. When you become aware of a hardware fault, address it immediately. For details, refer to Oracle ILOM Documentation.

Use the following process to address a hardware fault.

  1. Identify the server subsystem containing the fault.

    See Server Status Indicator LEDs.

    You can use Oracle ILOM to identify a failed component. See Accessing Oracle ILOM.

  2. Review Exadata Server X10M Product Information and Known Issues for any late-breaking information about the server. Refer to Oracle AMD-Based Cloud Servers Product Notes. Review up-to-date information about the server, including hardware-related known issues.

  3. Prepare the server for service using Oracle ILOM.

    If you determined that the hardware fault requires service (physical access to the server), use Oracle ILOM to take the server offline, activate the Locate button/LED, and if necessary, power off the server. See Accessing Oracle ILOM. See Preparing for Service.

  4. Prepare the service workspace.

    Before servicing the server, prepare the workspace, ensuring Electrostatic Discharge Safety (ESD) protection for the server and components. See Preparing for Service.

  5. Service the components.

    To service replaceable components, see the removal, installation, and replacement procedures in this document.

    Note:

    Server components must be replaced by Oracle Service personnel. Contact Oracle Service.
  6. Clear the fault in Oracle ILOM.

    Depending on the component, you might need to clear the fault in Oracle ILOM. Generally, components that have a FRU ID, clear the fault automatically. For details, refer to Oracle Integrated Lights Out Manager (ILOM) documentation at Oracle ILOM Documentation.

Troubleshoot Hardware Faults Using Oracle ILOM CLI

This procedure uses the basic troubleshooting steps described in Basic Troubleshooting Process.

Use this procedure to troubleshoot hardware faults using the Oracle ILOM command-line interface (CLI) and, if necessary, prepare the server for service.

  1. Open a terminal and using a secure method, such as a secure shell, log into the SP using the user name (with administrator privileges) and SP IP address or hostname. For example:

    ssh username@hostname

  2. When prompted, enter the password.
  3. At the Oracle ILOM prompt (->), enter the command to show any faults. For example:
    -> show faulty
    Target | Property | Value
    -------------------------+------------------------------------+-------------------
    /SP/faultmgmt/0 | fru | /SYS/MB/P0
    /SP/faultmgmt/0/faults/0 | class | fault.cpu.cache.uncorrectable.error
                                  

    In the above example, the displayed fault shows that Processor 0 encountered an uncorrectable cache error.

  4. To get more information, enter the command to view Open Problems:
    -> show System/Open_Problems
    
    Open Problems (1)
    Date/Time                 Subsystems          Component
    ------------------------  ------------------  ------------
    Wed May 16 18:00:39 2023  Processor, Last Level Cache, P0(CPU 0)
            A non-recoverable cache failure was detected by the device while
            performing a command. (Probability:100,
            UUID:f9c9d6d6-5c42-6f7d-c2c0-857962de2ce5,
            Resource:/SYS/MB/P0, Part Number:N/A, Serial Number:N/A,
            Reference Document:http://support.oracle.com/msg/ISTOR-1234-5H)

    The Open Problems listing provides detailed information, such as the time the event occurred, the component and subsystem name, and a description of the issue. It also includes a link to an Oracle KnowledgeBase article that includes possible problem resolution steps.

    Tip:

    The System Log provides a chronological list of all the system events and faults that have occurred since the log was last reset and includes additional information, such as severity levels and error counts. To access the System Log, type: System/Log
  5. Before accessing the physical server, review Known Issues for information related to the issue or the component.

    The Oracle AMD-Based Cloud Server Product Notes contain up-to-date information about the server, including hardware-related issues. In addition to checking the product notes, the customer should follow the link to the Oracle KnowledgeBase article.

  6. To prepare the server for service, see Preparing for Service.
  7. Service the component.

    After servicing the component, you might need to clear the fault in Oracle ILOM. For more information, refer the service procedures for the component. See Monitoring Component Health and Faults Using Oracle ILOM and Oracle ILOM Documentation.

Troubleshoot Hardware Faults Using Oracle ILOM Web Interface

Use this procedure to troubleshoot hardware faults using the Oracle ILOM web interface and, if necessary, prepare the server for service. This procedure uses the basic troubleshooting steps described in Diagnosing Server Component Hardware Faults.

Note:

This procedure provides one basic approach to troubleshooting hardware faults. It uses the Oracle ILOM web interface. However, you can perform the procedure using the Oracle ILOM command-line interface (CLI). For more information about the Oracle ILOM web interface and CLI, refer to Oracle ILOM Documentation.
  1. Log in to the server SP Oracle ILOM web interface.

    Open a browser and direct it using the IP address of the server SP. On the Login screen, enter a user name (with administrator privileges) and password. The Summary Information page appears. The Status section of the Summary Information page provides information about the server subsystems, including:

    • Processors
    • Memory
    • Power
    • Cooling
    • Storage
    • Networking
    • PCI_Devices
    • Firmware
  2. In the Status section of the Oracle ILOM Summary Information page, identify the server subsystem that requires service.
    An image showing Oracle ILOM web interface.

    For example, if a hardware component in the subsystem is in a fault state, the Status column notes the status as Service Required.

  3. To identify the faulty component, click the component in the Status section.

    The Oracle ILOM page showing the faulty component appears.

  4. To get more information, click the Open Problems link.

    The Open Problems page provides detailed information, such as the time the event occurred, the component and subsystem name, and a description of the issue. It also includes a link to an Oracle Knowledge Base article.

    Tip:

    The System Log provides a chronological list of all the system events and faults that occurred since the log was last reset and includes additional information, such as severity levels and error counts. The System Log also includes information on the devices not reported in the Status section. To access the System Log, in the left panel, click System Log.
  5. Before going to the server, review Product Information and Known Issues for any late-breaking information about the server and for information related to the issue or the component. Review up-to-date information about server hardware-related known issues.

    Refer to Oracle AMD-Based Cloud Servers Product Notes.

  6. Prepare the server for service.

    After servicing the component, you might need to clear the fault in Oracle ILOM. For more information, refer to the service procedure for the component. For details, refer to Oracle Integrated Lights Out Manager (ILOM) documentation at Oracle ILOM Documentation.

  7. Service the component.

    To service replaceable components, see the removal, installation, and replacement procedures in this document.

  8. Return the Server to Operation.

Troubleshoot Power Issues

If your server does not power on, use the information in the following table to troubleshoot the issue.

Table 2-1 Server Power Issues

Power Issue Description Action Prevention

AC Power Connection

The AC power cords are the direct connection between the server power supplies and the power sources. The server power supplies need separate stable AC circuits.

Insufficient voltage levels or fluctuations in power can cause server power problems. The power supplies operate at a particular voltage and within an acceptable range of voltage fluctuations. Refer to Electrical Requirements.

Verify that both AC power cords are connected to the server Verify that the correct power is present at the outlets and monitor the power to verify that it is within the acceptable range.

Verify proper connection and operation by verifying the power supply (PS) indicator panels, which are located at the back of the server on the power supplies. Lit green AC OK indicators indicate a properly functioning power supply. An amber AC OK indicator indicates that the AC power to the power supply is insufficient.

Use the AC power cord Velcro retaining clips and position the cords to minimize the risk of accidental disconnection. Ensure that the AC circuits that supply power to the server are stable and not overburdened.

Power Supplies (PS)

The server power supplies (PS0, PS1) provide the necessary server voltages from the AC power outlets. If the power supplies are inoperable, unplugged, or disengaged from the internal connectors, the server cannot power on.

Note: Use the Velcro straps on the back of the server to secure the power cord connectors to the back of the power supplies. The Velcro retaining straps minimize the risk of accidental disconnection.

Verify that the AC cables are connected to both power supplies. Verify that the power supplies are operational (the PS indicator panel must have a lit green AC OK indicator).

Ensure that the power supply is properly installed. A power supply that is not fully engaged with its internal connector does not have power applied and does not have a lit green AC OK indicator

When installing a power supply, ensure that it is fully seated and engaged with its connector inside the drive bay. A properly installed power supply has a lit green AC OK indicator.

When a power supply fails, replace it immediately. To ensure redundancy, the server has two power supplies. This redundant configuration prevents server downtime, or an unexpected shutdown, due to a failed power supply.

Redundancy allows the server to continue to operate if one of the power supplies fails. However, when a server is being powered by a single power supply, the redundancy no longer exists, and the risk for downtime or an unexpected shutdown increases.

Top Cover

The server top cover maintains the air pressures inside the server, prevents accidental exposure to hazardous voltages, and protects internal components from physical and environmental damage.

Do not operate the server without the top cover installed unless you are hot-plugging a fan module, and then ensure that you complete the operation and replace the cover within 60 seconds. See Servicing Fan Modules and Install the Server Top Cover.

Be careful to avoid bending or otherwise warping the top cover.

Troubleshoot System Cooling Issues

Maintaining the proper internal operating temperature of the server is crucial to the health of the server. To prevent server shutdown and damage to components, you need to address overtemperature and hardware-related issues as soon as they occur. If your server has a temperature-related fault, use the information in the following table to troubleshoot the issue.

Table 2-2 Server Cooling Issues

Cooling Issue Description Action Prevention

External Ambient Temperature Too High

The server fans pull cool air into the server from its external environment. If the ambient temperature is too high, the internal temperature of the server and its components increases. This can cause poor performance and component failure.

Verify the ambient temperature of the server space against the environmental specifications for the server. If the temperature is not within the required operating range, remedy the situation immediately.

Periodically verify the ambient temperature of the server space to ensure that it is within the required range, especially if you made any changes to the server space (for example, added additional servers). The temperature must be consistent and stable.

Airflow Blockage

The server cooling system uses fans to pull cool air in from the server front intake vents and exhaust warm air out the server back panel vents.

If the front or back vents are blocked, the airflow through the server is disrupted and the cooling system fails to function properly causing the server internal temperature to rise.

Inspect the server front and back panel vents for blockage from dust or debris. Inspect the server interior for improperly installed components or cables that can block the flow of air through the server.

Periodically inspect and clean the server vents using an ESD certified vacuum cleaner.

Ensure that all components, such as cards, cables, fans, air baffles and dividers are properly installed. Never operate the server without the top cover installed.

Cooling Areas Compromised

The air baffle, component filler panels, and server top cover maintain and direct the flow of cool air through the server. These server components must be in place for the server to function as a sealed system.

If these components are not installed correctly, the airflow inside the server can become chaotic and non-directional, which can cause server components to overheat and fail.

Inspect the server interior to ensure that the air baffle is properly installed. Ensure that all external-facing slots (storage drive, PCIe) are occupied with either a component or a component filler panel. Ensure that the server top cover is in place and sits flat and snug on top of the server.

When servicing the server, ensure that the air baffle is installed correctly and that the server has no unoccupied external-facing slots. Never operate the server without the top cover installed.

Hardware Component Failure

Components, such as power supplies and fan modules, are an integral part of the server cooling system.

When one of these components fails, the server internal temperature can rise. This rise in temperature can cause other components to enter into an over-temperature state. Some components, such as processors, might overheat when they are failing, which can also generate an over-temperature event.

To reduce the risk related to component failure, power supplies and fan modules are installed in pairs to provide redundancy. Redundancy ensures that if one component in the pair fails, the other functioning component can continue to maintain the subsystem.

Investigate the cause of the overtemperature event, and replace failed components immediately. See Diagnosing Server Component Hardware Faults.

Component redundancy is provided to allow for component failure in critical subsystems, such as the cooling subsystem.

However, once a component in a redundant system fails, the redundancy no longer exists, and the risk for server shutdown and component failures increases. Therefore, it is important to maintain redundant systems and replace failed components immediately.

Troubleshoot With Diagnostic Tools

The server and its accompanying software and firmware contain diagnostic tools and features that can help you isolate component problems, monitor the status of a functioning system, and exercise one or more subsystem to disclose more subtle or intermittent hardware-related problems.

Each diagnostic tool has its own specific strength and application. Review the tools listed in this section and determine which tool might be best to use for your situation. After you determine the tool to use, you can access it locally, while at the server, or remotely. The selection of diagnostic tools available for your server range in complexity from a comprehensive validation test suite (Oracle VTS) to a chronological event log (Oracle ILOM event Log). The selection of diagnostic tools also includes standalone software packages, firmware-based tests, and hardware-based LED indicators.

The following table summarizes the diagnostic tools that you can use when troubleshooting or monitoring your server.

Table 2-3 Diagnostic Tool Selection

Diagnostic Tool Type What It Does Accessibility Remote Capability

Oracle ILOM

Oracle ILOM Documentation

SP firmware

Oracle Integrated Lights Out Manager (ILOM) management software

Oracle ILOM event Log. Monitors environmental condition and component functionality sensors, generates alerts, performs fault isolation, and provides remote access.

Can function in either Standby power mode or Main power mode and is not OS dependent.

Local Oracle ILOM command-line access using a serial connection

Support for Ethernet access to the SP through a dedicated management port (NET MGT) and optionally through the host NET0 Ethernet port (sideband management)

Remote and local access.

IPMI 2.0-compliant remote management capabilities

Support for remote KVMS (keyboard, video, mouse, and storage) over IP

Hardware-based LED indicators

Server Status Indicator LEDs

System indicators and sensors

Hardware and SP firmware

Indicates status of overall system and particular components.

Available when system power is available.

Local, but sensor and indicators are accessible from Oracle ILOM web interface or command-line interface (CLI).

Power-On Self-Test (POST)

Oracle x86 Servers Administration, Diagnostics, and Applications Documentation

Host firmware

Tests core components of system: CPUs, memory, and motherboard I/O bridge integrated circuits.

Runs on startup. Available when the operating system is not running.

Local, but can be accessed through Oracle ILOM Remote System Console Plus.

UEFI Diagnostics

Oracle x86 Servers Diagnostics and Troubleshooting Guide

SP firmware

Tests and detects problems on all processors, memory, disk drives, and network ports.

Use either the Oracle ILOM web interface or the command-line interface (CLI) to run UEFI diagnostics.

Remote access through Oracle ILOM Remote System Console Plus.

Oracle ILOM SP/Diag shell

Oracle x86 Servers Diagnostics and Troubleshooting Guide

SP firmware

Allows you to run HWdiag commands to check the status of a system and its components, and access HWdiag logs.

Can function on Standby power and when operating system is not running.

Local, but remote serial access is possible if the SP serial port is connected to a network-accessible terminal server.

Oracle Linux commands

Operating system software

Displays system information.

Requires operating system.

Local, and over network.

Attaching Devices to the Server

Attach devices to the server so you can access diagnostic tools when troubleshooting and servicing the server.

Attach Devices to the Server

This procedure explains how to connect devices to the server, so that you can locally and remotely interact with the service processor (SP) and the server console. See Back Panel Connector Locations.

  1. Attach local Oracle ILOM command-line access using a serial connection.

    To access the Oracle ILOM service processor command-line interface (CLI) locally, connect a serial null modem cable to the RJ-45 serial port labeled SER MGT.

    To access the system console, connect the RJ-45 cable to a terminal or terminal emulator, log on to Oracle ILOM, and type start /HOST/console.

    Note: The serial management port does not support network connections.

  2. Attach Ethernet access to the SP through a dedicated management port (NET MGT).

    To connect to the Oracle ILOM service processor over the network remotely, connect an Ethernet cable to the Ethernet port labeled NET MGT.

  3. Attach Ethernet access optionally through the host NET0 Ethernet port (sideband management).

    Connect an Ethernet cable to the 1GbE Gigabit Ethernet port connector labeled NET 0 as needed for remote OS support. Refer to Oracle ILOM Documentation.

Back Panel Connector Locations

The following illustration shows and describes the locations of the back panel connectors. Use this information to set up the server, so that you can access diagnostic tools and manage the server during service.

Figure showing back panel cable connections and ports.
Callout Cable Port Description

1

Power supply 0 input power

Power supply 1 input power

The server has two power supply connectors AC0 and AC1, one for each power supply. Do not attach power cables to the power supplies until you finish connecting the data cables to the server.

The server goes into Standby power mode, and the Oracle ILOM service processor initializes when the AC power cables are connected to the power source. System messages might be lost after 60 seconds if the server is not connected to a terminal, PC, or workstation.

Note: Oracle ILOM signals a fault on any installed power supply that is not connected to an AC power source, as it might indicate a loss of redundancy.

2

Ethernet port (NET 0)

The Ethernet port enables you to connect the system to the network. The Gigabit Ethernet port uses an RJ-45 cable for a 1GbE 100/1000BASE-T connection.

3

Network management port (NET MGT)

The service processor NET MGT port is the optional connection to the Oracle ILOM service processor. The service processor NET MGT port uses an RJ-45 cable for a 1GbE 100/1000BASE-T connection.

4

USB port

One USB 3.1 port on Exadata Server X10M back panel. The USB port supports hot-plugging. You can connect and disconnect a USB cable or a peripheral device while the server is running without affecting system operations.

5

Serial management port (SER MGT)

Local Oracle ILOM command-line access using a serial connection: The service processor SER MGT port uses an RJ-45 cable and terminal (or emulator) to provide access to the Oracle ILOM command-line interface (CLI). Using Oracle ILOM, you can configure it to connect to the system console. Refer to Oracle ILOM Documentation.

Note: The serial management port does not support network connections.

See Server Status Indicator LEDs.

Configuring Serial Port Sharing

By default, the service processor (SP) controls the serial management (SER MGT) port and uses it to redirect the host serial console output. Using Oracle ILOM, you can assign the host console (COM1) as owner of the SER MGT port output, which allows the host console to output information directly to the SER MGT port. Serial port sharing is useful for Windows kernel debugging, because you can view non-ASCII character traffic output from the host console.

Set up the network on the SP before attempting to change the serial port owner to the host server. If the network is not set up first, and you switch the serial port owner to the host server, you cannot connect using the CLI or web interface to change the serial port owner back to the SP. To return the serial port owner setting to the SP, restore access to the serial port on the server.

If you accidentally lose access to Oracle ILOM, contact Oracle Service and follow the process to return the serial port ownership back to the SP.

You can assign serial port output using either the Oracle ILOM CLI interface or web interface:

Oracle ILOM CLI interface

  1. Open an SSH session, and at the command line, log in to the SP Oracle ILOM CLI.

    Log in as a user with root or administrator privileges. For example: ssh root@ ipaddress Where ipaddress is the IP address of the server SP.

    The Oracle ILOM CLI prompt (->) appears.

  2. To set the serial port owner, type: -> set /SP/serial/portsharing owner=host

    Note:

    The serial port sharing value, by default, is owner=SP.
  3. Connect a serial host to the server.

Oracle ILOM web interface

  1. Log in to the SP Oracle ILOM web interface.

    Open a web browser and direct it using the IP address of the server SP. Log in as root or a user with administrator privileges.

  2. On the Summary Information page, select ILOM Administration → Connectivity from the navigation menu on the left side of the screen.
  3. Select the Serial Port tab.

    Note:

    The serial port sharing setting, by default, is Service Processor.
  4. On the Serial Port Settings page, select Host Server as the serial port owner.
  5. Click Save for the changes to take effect.
  6. Connect a serial host to the server.

For details, refer to Oracle ILOM Documentation.

Ethernet Device Naming

This topic contains information about the device naming for the 1GbE 100/1000BASE-T Gigabit Ethernet (GbE) port (labeled NET 0) on the back panel of the server. See Back Panel Connector Locations and Ethernet Port Status Indicators.

The device naming for the Ethernet interface is reported differently by different interfaces and operating systems. The following table shows the BIOS (physical) and operating system (logical) naming convention for the interface. The device naming convention might vary, depending on the conventions of your operating system and which devices are installed in the server.

Note:

Naming used by the interfaces might be different from the names in the following table, depending on which devices are installed in the system.

Table 2-4 Ethernet Device Naming

Ethernet Port Oracle Linux 8 and 9 Oracle Solaris Windows (example default name, see note below)

Net 0

eno1

igb0

Ethernet

MAC Address Mapping to Ethernet Ports

A system serial label that displays the MAC ID (and the associated barcode) for the server is attached to the top, front left side of the Exadata Server X10M server disk cage bezel.

This MAC ID (and barcode) corresponds to a hexadecimal (base 16) MAC address for a sequence of six consecutive MAC addresses. These six MAC addresses correspond to the server network ports, as shown in the following table.

Table 2-5 Ethernet Port MAC Address Map

Base MAC Address Corresponding Ethernet Port

“base” + 0

NET 0

“base” + 1

Unassigned

“base” + 2

Unassigned

“base” + 3

Unassigned

“base” + 4

SP (NET MGT)

“base” + 5

Used only when Network Controller-Sideband Interface (NC-SI) sideband management is configured.

Net 0 Gigabit Ethernet Port

The server has one auto-negotiating 100/1000BASE-T Gigabit Ethernet (GbE) system domain port labeled NET 0 that uses a standard RJ-45 connector.

Figure showing a NET 0 GbE Ethernet port.

NET 0 GbE Ethernet port transfer rates are shown in the following table.

Table 2-6 Ethernet Port Transfer Rates

Connection Type IEEE Terminology Transfer Rate

Fast Ethernet

100BASE-T

100 Mbps

Gigabit Ethernet

1000BASE-T

1,000 Mbps

Network Management Port

The server has one 100/1000BASE-T RJ-45 Oracle Integrated Lights Out Manager (ILOM) service processor (SP) network management Ethernet port labeled NET MGT. The auto-negotiating 100/1000BASE-T Ethernet management domain interface port is configured by default to use Dynamic Host Configuration Protocol (DHCP).

SPs with Oracle Integrated Lights Out Manager (ILOM) include support for Ethernet access to the SP through a dedicated Network Management port (NET MGT). Access to the SP is available through a rear panel RJ-45 100/1000BASE-T port labeled NET MGT. The NET MGT port is connected to a system rack switch and can be used for accessing the SP remotely over an Ethernet connection. See Accessing Oracle ILOM.

Figure showing NET MGT port.

See Network Management Port Status Indicators. For information on configuring this port for managing the server with Oracle ILOM, refer to Oracle ILOM Documentation.

Serial Management Port

The serial management connector, labeled SER MGT, is an RJ-45 connector that can be accessed from the back panel. This port is the default connection to the server Oracle ILOM SP. Use only the SER MGT port for server management. Refer to Oracle ILOM Documentation.

The following figure and table describe the SER MGT port pin signals.

Figure showing the serial management port.

Table 2-7 Serial Management Port Signals

Pin Signal Description Pin Signal Description

1

Request to Send

5

Ground

2

Data Terminal Ready

6

Receive Data

3

Transmit Data

7

Data Set Ready

4

Ground

8

Clear to Send

Table 2-8 Default Serial Connections for Serial Port

Parameter Setting

Connector

SER MGT

Rate

115200 baud

Parity

None

Stop bits

1

Data bits

8

If you need to connect to the SER MGT port using a cable with either a DB-9 or a DB-25 connector, use the pin descriptions in the following tables to create a crossover adapter appropriate for your serial connection.

Table 2-9 RJ-45 to DB-9 Adapter Crossovers Wiring Reference

Serial Port (RJ-45 Connector) Pin Serial Port (RJ-45 Connector) Signal Description DB-25 Adapter Pin DB-25 Adapter Signal Description

1

RTS

8

CTS

2

DTR

6

DSR

3

TXD

2

RXD

4

Signal ground

5

Signal ground

5

Signal ground

5

Signal ground

6

RXD

3

TXD

7

DSR

4

DTR

8

CTS

7

RTS

Table 2-10 RJ-45 to DB-25 Adapter Crossovers Wiring Reference

Serial Port (RJ-45 Connector) Pin Serial Port (RJ-45 Connector) Signal Description DB-25 Adapter Pin DB-25 Adapter Signal Description

1

RTS

5

CTS

2

DTR

6

DSR

3

TXD

3

RXD

4

Signal ground

7

Signal ground

5

Signal ground

7

Signal ground

6

RXD

2

TXD

7

DSR

20

DTR

8

CTS

4

RTS

USB Port

The server has a single USB port located on the server back panel for attaching supported USB 3.1–compliant devices.

The USB port supports hot-plugging. You can connect and disconnect a USB cable or a peripheral device while the server is running without affecting system operations.

The USB port is for service operation only and should not be connected 24x7. Enable the USB port in BIOS, as required.

Manually Resetting a Server's Service Processor

You need a non-conductive stylus no more than 1.5 mm in diameter.

Caution:

Using a conductive tool, such as a metal paper clip or graphite pencil, can cause a short that can cause an immediate host power off, and/or circuit damage.

This section shows the location of the service processor (SP) reset button on the front panel of the server. The button is recessed to prevent accidental pressing. If the service processor becomes inaccessible, you can use a non-conductive stylus to press the SP reset button.

If the Oracle ILOM SP stops running and you cannot reset it using the Oracle ILOM web interface or the Oracle ILOM CLI, use the following procedure to reset the SP from the server back panel.

  1. Locate the SP reset pinhole button on the server back panel.

    Figure showing the location of the SP Reset pinhole switch on the server back panel.
    Callout Description Icon

    1

    SP Reset button

    X10M SP Reset icon
  2. Insert a non-conductive stylus straight into the SP reset pinhole no more than 6.5 mm (the distance to reach and depress the reset button).

    Care should be taken to not insert the stylus at an angle, to over penetrate, or accidentally touch the sensitive electrical components near the button. The stylus must be non-conductive with a diameter no more than 1.5 mm. To depress the pinhole button, the stylus must reach 6.5 mm into the chassis.

  3. After initiating the SP reset, the OK LED will fast blink while the SP is rebooting.

    This can take a few minutes. The host will still operate normally.

    Note:

    Any Oracle ILOM user sessions running on the SP will be terminated during SP reset. Once the SP has rebooted successfully, you will be able to log into Oracle ILOM.

  4. After the SP successfully boots, the LED will remain steady on.

    You can confirm that the SP is working by logging into Oracle ILOM for that system.

    For details, refer to Oracle ILOM Documentation.

Accessing Oracle ILOM

You can connect to Oracle ILOM using one of these methods:

  • Serial remote host console – Access the host console remotely

  • Serial connection to SER MGT port (CLI only) – Oracle ILOM command-line interface (CLI) locally using the RJ-45 serial management port (SER MGT)

  • Dedicated remote network management connection – Oracle ILOM CLI or Oracle ILOM web interface remotely using a network port on the server (NET MGT)

  • Sideband network management connection – Refer to "Sideband Network Management Connection" in Oracle ILOM Administrator's Guide for Configuration and Maintenance.

  • Host-to-ILOM interconnect – Refer to "Dedicated Interconnect SP Management" in Oracle ILOM Administrator's Guide for Configuration and Maintenance.

Oracle ILOM Documentation

Prerequisites: You need to know the IP address or host name of the service processor (SP) to log in to Oracle ILOM CLI or web interface remotely using one of the network ports on the server.

Note:

To enable first time login and access to Oracle ILOM, a default Administrator account and its password are provided with the system. To build a secure environment, change the default password (changeme) for the default Administrator account (root) after your initial login. If this default Administrator account has since been changed, contact your system administrator for an Oracle ILOM user account with Administrator privileges.

To prevent unauthorized access to Oracle ILOM, create user accounts for each user. For procedures to change the root password and create user accounts, refer to Oracle ILOM Documentation.

If you do not know the IP address of the SP, reset the Oracle ILOM SP. See Manually Resetting a Server's Service Processor. Two issues might occur on the Oracle ILOM service processor (SP):

  • You need to reset the Oracle ILOM SP to complete an upgrade or to clear an error. Resetting the server SP automatically disconnects any current Oracle ILOM sessions and renders the SP unmanageable until the reset process is complete.

  • As the system administrator, you forgot the root account password and need to recover it.

Access Serial Remote Host Console

To access the host console remotely.

  1. Configure serial host console properties in Oracle ILOM as required.

    Before you access the host console, you can configure properties in Oracle ILOM to make the serial host console easier to view and to enable logging. Refer to the Oracle ILOM Administrator's Guide for Configuration and Maintenance and Configuring Serial Port Sharing.

  2. Establish a connection to the Oracle ILOM CLI. Log in to the Oracle ILOM CLI using an account with Administrator privileges.

    1. Ensure that the server is cabled for a local serial connection to Oracle ILOM. See Back Panel Connector Locations.

    2. Press Enter on the terminal device that is connected to the server.

    3. At the Oracle ILOM login prompt, type your user name, and press Enter.

    4. At the password prompt, type the password associated with your user name, and press Enter.

      Oracle ILOM displays a default command prompt (->), indicating that you successfully logged in.

  3. At the Oracle ILOM command prompt (->), type: start /HOST/console

    The serial console output appears on the screen.

    Note:

    If the serial console is in use, stop and restart it using the stop /HOST/console command followed by the start /HOST/console command.

  4. To return to the Oracle ILOM console, press Esc and then press Shift 9 to enter the open parenthesis ( character.

Oracle ILOM command-line interface (CLI)

To establish a connection to the Oracle ILOM CLI locally using the RJ-45 serial management port (SER MGT):

  1. Ensure that the server is cabled for a local serial connection to Oracle ILOM. See Back Panel Connector Locations.

  2. Press Enter on the terminal device that is connected to the server.

  3. At the Oracle ILOM login prompt, type your user name, and press Enter.

  4. At the password prompt, type the password associated with your user name, and press Enter.

    Oracle ILOM displays a default command prompt (->), indicating that you successfully logged in.

Oracle ILOM CLI remotely using a server network port

To establish a connection to the Oracle ILOM CLI:

  1. Ensure that the server is cabled for a remote network management connection to Oracle ILOM. See Connecting Cables and Applying Power.

  2. From the command line, initiate a secure shell session, type: ssh username@hostname

    Where username is the user name of an Oracle ILOM account with Administrator privileges, and hostname is either the IP address or hostname (when using DNS) of the server SP.

    The Oracle ILOM password prompt appears. Password:

  3. At the Oracle ILOM password prompt, type your password and press Enter. For example: Password: changeme

    Oracle ILOM displays the default command prompt ->, indicating that you successfully logged in to the Oracle ILOM CLI.

Oracle ILOM web interface remotely using a server network port

To establish a connection to the Oracle ILOM web interface:

  1. Type the IP address of the server in the address field of your web browser and press Enter.

  2. On the Oracle ILOM login screen, type your user name and password, and click Log In.

    The Summary Information page appears, indicating that you successfully logged in to the Oracle ILOM web interface.

Service Processor Connection and Login

This procedure describes how to cable the server to access its Root-of-Trust circuitry and service processor (SP).

  1. For dedicated access to the server Root-of-Trust circuitry, connect an RJ-45 Ethernet cable to the TOR/NET port on the card located on the server back panel. Attach the other end of the cable or connector to your end point.

    This provides command-line interface (CLI) access over the network.

    Note:

    This port is for service use only.
  2. For access to the server Oracle ILOM service processor, do one of the following:
    • For dedicated network access to Oracle ILOM, connect an RJ-45 Ethernet cable to the 10/100/1000 Ethernet management port (labeled NET MGT) on the card located on the server back panel. Attach the other end of the RJ-45 Ethernet cable to your switch. This provides command-line interface (CLI) access over the network.

    • For access to either Oracle ILOM or the RoT circuitry, connect an RJ-45 serial console cable to the RJ-45 serial port (labeled TOR/SER) on the card located on the server back panel. Attach the other end of the RJ-45 serial console cable or connector to your end point. This provides CLI access over a local serial connection or terminal server.

  3. Ensure that AC power is plugged into the server power supplies.

    Standby power is required for SP access, Main power and a running host is not required.

  4. Access Oracle ILOM.
    If this is the first time you are connecting to Oracle ILOM, do one of the following:
    • Access Oracle ILOM using a local serial connection to the command-line interface:

      1. From the terminal device connected to the server's SER MGT port, press Enter to obtain a prompt.

      2. At the Oracle ILOM login prompt, type root as the username, and then press Enter.

      3. At the Oracle ILOM password prompt, enter the default password. Type:

        Password: changeme

        Oracle ILOM displays the default command prompt (->), indicating that you successfully logged in to the Oracle ILOM CLI.

    • Access Oracle ILOM using an Ethernet connection to the command-line interface:

      From a terminal device with network access to the server's NET MGT port, initiate a Secure Shell session and log into the Oracle ILOM root account. Type:

      # ssh root@hostname

      Where hostname is either the IP address or host name (when using DNS) of the server SP.

      Note:

      By default, Oracle ILOM is configured for DHCP. The DHCP server should list the IP address for your Oracle ILOM host name or SP MAC address.
  5. At the Oracle ILOM password prompt, enter the default password.

    Type:

    Password: changeme

    Oracle ILOM displays the default command prompt (->), indicating that you successfully logged in to the Oracle ILOM CLI.

  6. Once logged in, change default the root password for security.

    Note:

    To enable first-time login and access to Oracle ILOM, a default Administrator account (root) and its password are provided with the system. To build a secure environment and enforce user authentication and authorization in Oracle ILOM, you must change the default password (changeme) for the default Administrator account (root) after your initial login to Oracle ILOM. If this default Administrator account has since been changed, contact your system administrator for an Oracle ILOM user account with Administrator privileges.

    For more information on changing account information, refer to Oracle ILOM Documentation.

Test the IPv4 or IPv6 Network Configuration

  1. Use either the Oracle ILOM CLI or web interface to test the IPv4 or IPv6 network configuration.
    • From the Oracle ILOM CLI:

      1. At the CLI prompt, type the show command to view the network test targets and properties.

        For example, the following output shows the test target properties:

        -> show
        /SP/network/test
        Targets:
        
        Properties:
        ping = (Cannot show property)
        ping6 = (Cannot show property)
        Commands:
        cd
        set
        show
      2. Use the set ping or ping6 command to send a network test from the device to a network destination specified in the following table.

        Property Set Property Value Description

        ping

        set ping=<IPv4_address>

        Type the set ping=command at the command prompt followed by the IPv4 test destination IPv4 address. For example:-> set ping=192.168.10.106

        Ping of 192.168.10.106 succeeded

        ping

        set ping6=<IPv6_address>

        Type the set ping6=command at the command prompt followed by the IPv6 test destination IPv6 address. For example:-> set ping6=2001::db8:5dff:febe:5000

        Ping of 2001::db8:5dff:febe:5000 succeeded

    • From the Oracle ILOM web interface:

      1. Click ILOM Administration → Connectivity → Network.

      2. On the Connectivity page, click the Tools button. The Network Tools dialog box appears.


        Figure showing the Network Configuration Test screen, from which you can issue a Ping or Ping6 test.
      3. In the Network Tools dialog box, in the Test Type list box, select Ping (for an IPv4 configuration) or Ping6 (for an IPv6 network configuration).

        If the test was successful, the message Ping of ip_address succeeded message appears below the Destination field in the Network Configuration Test screen.

      4. Type the IPv4 or IPv6 test destination address in the Destination field and click Test.

Set the Mouse Mode

In Oracle ILOM, you can set the Mouse Mode property to optimize mouse movement in the Oracle ILOM Remote System Console Plus. The mouse mode can be set to either Absolute or Relative and must be set according to the requirements of the operating system that you are using to connect to Oracle ILOM.

Read the following guidelines to determine the appropriate mouse mode for your system.

Table 2-11 Mouse Mode Properties

Operating Systems Mouse Mode

Oracle Linux

Absolute

Oracle VM

Not applicable

VMware ESXi Software

Not applicable

Windows Server

Absolute

For more information about selecting a mouse mode, refer to Oracle ILOM Administrator's Guide for Configuration and Maintenance in Oracle ILOM Documentation.

To set the mouse mode, perform the following steps:

  1. Log in to the Oracle ILOM web interface.
  2. Navigate to the Remote Control → KVMS page, and then select a mouse mode from the Mouse Mode drop-down list.
  3. Click Save.

Reset the Server Using Oracle ILOM

  1. Log in to the Oracle ILOM web interface or command-line interface (CLI) using an account with admin (a) role privileges.
  2. Reset the server:
    • From Oracle ILOM CLI, type: -> reset /System. When prompted, type y to confirm:

      Are you sure you want to reset /System (y/n)? y

      Performing hard reset on /System

    • From Oracle ILOM web interface, select Host Management → Power Control, and in the Select Action list box, select Graceful Reset, Reset, or Power On. Click Save, and then click OK.

    • From the local server, press the On/Standby button on the front panel of the server for approximately 1 second to power off the server, and then press the On/Standby button again to power on the server.

      The power-on self-test (POST) sequence begins.

Log Out of Oracle ILOM

See the following procedure to log out of Oracle ILOM CLI or web interface.

  1. To end an Oracle ILOM session:
    • From the Oracle ILOM CLI – Type exit at the CLI prompt.

    • Oracle ILOM web interface – Click the Log Out button in the top-right corner of the screen.

Monitoring Component Health and Faults Using Oracle ILOM

Oracle ILOM interfaces provide easy-to-view information about the health status of system components. From the Oracle ILOM command-line interface (CLI) or web interface, you can collect system-specific information about the server, determine the health state of discrete components, and view any open problems on the server. Oracle ILOM automatically detects system hardware faults and environmental conditions on the server. If a problem occurs on the server, Oracle ILOM automatically does the following:

  • Identify the faulted component in the Open Problems table. Open problems detected on a host server or system chassis are viewable from either the /System/Open_problems CLI target or Open Problems web page. To view server open problems, type show /System/Open_Problems. Refer to View Open Problems Detected on a Managed Device in the Oracle ILOM User's Guide for System Monitoring and Diagnostics.

  • Record system information about the faulted component or condition in the Oracle ILOM event log. See Exadata Server X10M Events. Refer to Managing ILOM Log Entries in the Oracle ILOM User's Guide for System Monitoring and Diagnostics.

    • For the event log, type: show /SP/Logs/event/list

    • For the system log, type: show /System/Log/list

    • For the audit log, type: show /SP/Logs/audit/list

      To scroll through a list, press any key except the q key.

  • Illuminate the Fault-Service Required LED status indicator on the server front and back panels. See Server Status Indicator LEDs.

    See Server Components for status and fault messages.

To collect system-level information or to verify the system health status from the CLI, type show /System. To access subsystem and component health details from the CLI, type show /System/subsystem-name.
  • PROCESSORS
  • MEMORY
  • POWER
  • COOLING
  • STORAGE
  • NETWORKING
  • PCI_DEVICES
  • FIRMWARE

For further information about administering open problems that are detected and reported by Oracle ILOM, refer to Administering Open Problems in the Oracle ILOM Administrator's Guide for Configuration and Maintenance Firmware. Refer to Oracle ILOM Documentation.

Getting Help

The following sections describe how to get additional help to resolve server-related problems.

Contacting Support

My Oracle Support

If the troubleshooting procedures in this chapter do not solve your problem, use the following table to collect information that you might need to communicate to Oracle Support.

Table 2-12 System Configuration Information

System Configuration Information Needed Your Information

Service contract number

 

System model

 

Operating environment

 

System serial number

 

Peripherals attached to the system

 

Email address and phone number for you and a secondary contact

 

Street address where the system is located

 

Superuser password

 

Summary of the problem and the work being done when the problem occurred

 

Other Useful Information

 

IP address

 

Server name (system host name)

 

Network or internet domain name

 

Proxy server configuration

 

Locating the Chassis Serial Number

You might need your server serial number when you ask for service on your system. Record this number for future use. Use one of the following resources or methods to locate your server serial number.

  • The serial number is located on the Radio-frequency Identification (RFID) label on the bottom left side of the front panel bezel, below the general status LEDs. For illustrations of the server front panel, see Front Panel Components.

  • The serial number is recorded on a label that is attached to the top front surface of the system.

  • The serial number is recorded on the yellow Customer Information Sheet (CIS) that is attached to your server packaging.

  • Using Oracle ILOM:

    • From the command-line interface (CLI), type the command: show /System

    • From the web interface, view the serial number on the System Information screen.

Auto Service Requests

Oracle Auto Service Requests (ASR) is available at no additional cost to customers with Oracle Premier Support. Oracle ASR is the fastest way to restore system availability if a hardware fault occurs. Oracle ASR software is secure and customer installable, with the software and documentation downloadable at My Oracle Support. When you log in to My Oracle Support, refer to the "Oracle Auto Service Request" knowledge article document (ID 1185493.1) for instructions on downloading the Oracle ASR software.

When a hardware fault is detected, Oracle ASR opens a service request with Oracle and transfers electronic fault telemetry data to help expedite the diagnostic process. Oracle diagnostics analyze the telemetry data for known issues and delivers immediate corrective actions. For security, the electronic diagnostic data sent to Oracle includes only what is needed to solve the problem. The software does not use any incoming Internet connections and does not include any remote access mechanisms.

For more information about Oracle ASR, go to Oracle Premier Support.