Using Oracle ILOM to Monitor a System and Diagnose Components

When something goes wrong with a system, diagnostic tools can help you to determine what caused the problem. However, this approach is inherently reactive. It means waiting until a component fails. Oracle ILOM provides diagnostic tools that allow you to be more proactive by monitoring the system while it is still “healthy.” Monitoring tools give you early warning of imminent failure, thereby allowing planned maintenance and better system availability. Remote monitoring is also a convenient way to check the status of many machines from one centralized location.

Using Oracle ILOM, you can view detailed information about the overall health of a system and the status of system components. In addition, you can monitor open problems and close fault status. Oracle ILOM also provides access to informational system management log files.

To monitor a system or diagnose components, see:

View System-Level Information and Health Status (Web)

The system-level health status properties for a server are viewable from the Summary Information page in the web interface.

  1. To view system-level health status details, click System InformationSummary.

    The Summary Information page appears.

  2. To collect system information about the system, review the entries in the General Information table.

    Information in the General Information table includes the model number, serial number, system type, firmware currently installed, primary operating system installed, host MAC address, IP address for the SP, and MAC address for the SP.

    Note:

    The property value for the Primary Operating System installed on the server shows only when the Oracle ILOM Hardware Management Pack is installed on the server.
  3. To identify problems detected on the system or to view the total problem count, review the entries in the Status table.

    The overall health status and total problem count appear at the top of the table.

    To view additional information about a component category reported in the Status table, click the link in the Subsystem column.

  4. To view the current firmware on the system, click System InformationFirmware.

View System-Level Information and Health Status (CLI)

You can view the host system-level health status properties from the command-line interface (CLI) under the /System target.

  1. To collect system-level information or to verify the system health status, type show /System.

    For example:

        Properties:
            health = OK
            health_details = -
            open_problems_count = 0
            type = Rack Mount
            model = ORACLE SERVER X9-2
            qpart_id = Q13015
            part_number = 7336847-B2
            serial_number = 1715XC4010A
            rfid_serial_number = changeme
            system_identifier = (none)
            system_fw_version = 5.0.0.21
            primary_operating_system = Not Available
            primary_operating_system_detail = Comprehensive System monitoring is not available. Ensure the host is running with the Hardware Management
                                              Pack. For details go to http://www.oracle.com/goto/ilom-redirect/hmp
            host_primary_mac_address = 00:10:e0:b5:df:ba
            ilom_address = 10.129.129.183
            ilom_mac_address = 00:10:E0:B5:DF:BE
            locator_indicator = Off
            power_state = Off
            actual_power_consumption = 66 watts
            action = (Cannot show property)
    

    Note:

    The property value for the primary operating system installed on the managed device is shown only when the Oracle ILOM Hardware Management Pack is installed on the managed device.

View Subsystem and Component Information and Health Status (Web)

The subsystem and component health status properties for a server are viewable from the Summary Information page in the web interface.

Installation of Oracle Hardware Management Pack is required for the following:

  • To view health and inventory status properties on the Networking page for InfiniBand network controllers.

  • To view the majority of the health and inventory status properties on the Storage page and to view the controller Type property or the controller Details properties (such as, Location; World Wide Name (WWN) for FC Controllers; and, Number Of Ports).

  1. To view subsystem and component health status properties, click System Information → category-name.

    For example, the navigation pane shows a list of subsystems such as Processors, Memory, Power, Cooling, and Storage. To view server component health status details for Processors, click System InformationProcessors.

  2. On the component category page, you can:
    • Determine the overall health for the subsystem category and the number of components installed for each category.

    • Determine the health details and the installed location for each component currently installed on the server.

      On some servers, you can also enable and disable components from the component category page. For further information about enabling or disabling subcomponents on your Oracle server, refer to the Oracle ILOM documentation.

    • View further information about the installed component by clicking the Details link in the table.

View Subsystem and Component Information and Health Status (CLI)

You can view the health status properties for subsystems and components from the comand-line interface (CLI) under the /System target.

  1. To access subsystem and component health details from the CLI, type show /System/category-name.

    Where category-name equals one of the subsystem target names under show /System.

    For example:

    • To view the subsystem health status for memory modules (DIMMs) on a server, type show /System/Memory

      /System/Memory
         Targets:
            DIMMs
       
         Properties:
            health = OK
            health_details = -
            installed_memory = 16 GB
            installed_dimms = 2
            max_dimms = 16
       
         Commands:
            cd
            show
      
    • To view the subsystem health status for a specific DIMM on a server, type show /System/Memory/DIMMs/DIMM_0 .

       /System/Memory/DIMMs/DIMM_0
          Targets:
      
          Properties:
              health = OK
              health_details = -
              part_number = 07075400,M393A4K40CB2-CTD
              serial_number = 00CE0117490324CDF0
              location = P0/D0 (CPU 0 DIMM 0)
              manufacturer = Samsung
              memory_size = 32 GB
              type = DDR4 SDRAM
      
          Commands:
              cd
              show