6 Hardware Monitoring

All hardware assets are monitored for their status, according to the asset's monitoring profile. The user interface reports information for a selected asset in a series of tabbed displays. The tabs and the type of information is specific for the asset type.

Based on your observations, you can control your hardware assets and do the following actions:

Viewing Hardware Details

Enterprise Manager Ops Center reports the information that it can acquire from an individual asset. Hardware information is displayed in increasing detail on the Summary tab, the Hardware tab, and the Monitoring tab.

See the Enterprise Manager Ops Center Reference Guide for a list of the asset attributes that can be used in monitoring.

To View Details About Hardware

  1. Expand Assets in the the Navigation pane.

  2. Select All Assets.

  3. (Optional) Select an asset type to filter the assets. The default groups for hardware are Systems, Chassis, and Switches.

  4. Select an asset. The Summary tab provides a high-level view of the asset's attributes and, for most assets, displays the firmware version.

  5. Select the Hardware tab to see information about that asset's hardware and firmware components. Depending on the asset, you can refine the information to specific components.

  6. Select the Monitoring tab, if the asset has one, to view the current state of various hardware variables. For each variable, this tab also shows the values for the warning threshold, the critical threshold, and the non-recoverable threshold.

Server Details

For server hardware, the Summary page displays:

  • Name

  • Description and Tags

  • Current Alert Status

  • Model

  • Serial Number

  • Management Interface IP

  • MAC Address

  • Processor

  • Memory

  • Power state

    • On – The server is powered on and running.

    • Standby – The server is powered off but responds to commands.

    • Unknown – An error occurred while attempting to retrieve the power status of the hardware. The server is connected but is not returning any information on power status.

    • Unreachable –The server cannot be contacted for information about its power state. This indicates a network problem or that the server is in standby mode.

  • Locator Lights state

  • Notification

Use the Hardware tab to view information about each component of the system:

  • System: Description, type, and version of all firmware installed except for disk firmware. See the Disk tab for firmware version.

  • CPU: Name, Model, Architecture, Speed, Manufacturer

  • Memory: Name, Type, Size in bytes, Manufacturer, Part number, Serial number

  • Network Adaptors: Name or each, MAC Address, Manufacturer, Part number, Serial Number

  • Disks: Name of each, Model, Size in bytes, Slot ID, Node ID, Firmware Version, Manufacturer, Root Disk, RAID Disk

  • Power Supply: Name, Manufacturer, Part number, Serial number

  • Disk Controller: Name, Model Number, Firmware Version, BIOS Version, PCI Address, PCI Version ID

  • Disk Expander: Name, Manufacturer, Version, Model Number, Firmware Version, Chassis ID

  • Fan Tray: Name, Manufacturer, Part number, Serial number

  • Fans: Name, Speed

Chassis Details

For chassis hardware as a group, the Summary page shows:

  • The five largest consumers of CPU

  • The five largest consumers of memory

  • The five largest consumers of the network

For chassis hardware, the Summary page displays:

  • Group Name

  • Description

  • Location

  • Type

M-Series Server

The hardware resources in a SPARC Enterprise M-Series Server are divided into one or more logical units, called dynamic system domains. Enterprise Manager Ops Center can monitor each domain, in addition to the server hardware.

For an M-Series server, the Dashboard tab displays:

  • Number of Dynamic System Domains it is supporting

  • Model

  • Product Serial Number

  • Description

  • Support contract

  • XCP Firmware version

  • OBP Firmware version

  • XSCF Firmware version

  • Operator Panel Switch Status: Locked

  • Current Alert Status

The Summary tab repeats some of the Dashboard's information and adds details. For the Power status, the reported status is for the server's domains. When any domain is powered on, the status is reported as powered on. When all domains are powered off, the Summary tab shows a status of Powered Off; the M-Series server itself remains powered on.

  • Name

  • Model

  • Product Serial Number

  • Management IP

  • MAC Address

  • Current Alert Status

  • Power

  • Locator Light

  • Notification

  • All firmware versions

  • For each domain: Name, Model, Health, Power, Locator Light, Notification

The Hardware tab shows the state of the server or, if a Dynamic System Domain is selected, the state of that domain. The Hardware tab reports the following:

  • Model

  • Serial Number

  • State

  • Power

  • Locator Light

  • Notification

  • Operator Panel Switch State At the System level, the hardware report includes:

  • The Unallocated Resources table lists all the physical system boards and their status: PSD ID, Assignment Status, Power Status, Connection Status, Diagnostics Status, Operational Status

  • The Allocated Resources table lists all domains that are using the physical system boards and their status: Domain ID, PSB ID, XSB ID, LSB ID, Assignment Status, Power Status, Connection Status, Diagnostics Status, Operational Status

  • The Dynamic System Domain table lists all the domains and their details: Domain ID, MAC Address, Autoboot Policy, Secure Mode Policy, CPU Mode, Diagnostics Level, Domain Degradation Policy, Operational Status

You change the display to show information about each component of the system:

  • CPU: Name, Architecture, Type, Manufacturer, Speed, Core Count, Thread Count, Serial Number, Part Number, Version, Status For Sensors: Name, Description, Type, Value

  • Memory: Name, Type, Size in bytes, Serial number, Part number, Status For Sensors: Name, Description, Type, Value

  • Board: Name, Serial number, Part number, Memory mirrored, Version, Status

  • Power Supply: Name, Serial number, Part Number, Status For Sensors: Name, Description, Type, Value

  • Board: Name, XSB Mode, Memory Mirrored, Serial Number, Part Number, Version, Status For Sensors: Name, Description, Type, Value

  • IO Unit: Name, Serial Number, Part Number Version, Status For Sensors: Name, Description, Type, Value

  • XSCF: Name, Host Name, Serial Number, Part Number, Version, Status

  • Fan Tray: Name, Manufacturer, Part number, Serial number

  • Fans: Name, Speed For Sensors: Name, Description, Type, Value

Enterprise Manager Ops Center monitors the voltage for the Board and IO Unit components and the speed for the Fan components. Click on the Monitoring tab to see the actual value and the threshold values.

See SPARC Enterprise M-Series Server Support for information requirements.

Sun ZFS Storage Appliance

The Sun ZFS Storage Appliance support both file storage and application use.

The Dashboard tab reports the following hardware information:

  • Name

  • Description

  • Current Alert Status

  • Model

  • Serial Number

  • Management IP

  • Memory

  • Power

  • Locator Light

  • Appliance Kit Version

  • Running Time

  • Processor

The Hardware tab displays the appliance's firmware version and the following information for each component:

  • CPU: Name, Model, Architecture, Speed, Manufacturer

  • Memory: Name, Type, Size in bytes, Manufacturer, Part number, Serial number

  • Network Adapters: Name or each, MAC Address, Description, Manufacturer, Part number, Serial number

  • Disks: Name, Size in bytes, Manufacturer, Part number, Serial number

  • Power Supply: Name, Manufacturer, Part number, Serial Number

  • Fan Tray: Name, Manufacturer, Part number, Serial number

Switch Details

Enterprise Manager Ops Center can manage 10G Ethernet Fabric Switches and Datacenter Infiniband Switches. These types of switches reside in the system or blade system, providing the switch fabric.

Enterprise Manager Ops Center reports hardware information on the Summary tab:

  • Name

  • Model

  • Port count

  • Serial number

  • Management Interface IP

  • MAC Address

  • Fabric Manager: true or false

  • Fabric Manager Address

  • Power state

  • Locator lights state

  • Notification state

  • Current Alert Status

  • Firmware types and versions

At the System level, the Hardware tab includes:

  • Model

  • Server Name

  • Serial Number

  • State

  • Power

  • Firmware versions

  • Sensors: temperature and voltage

You change the display to show information about each component of the switch:

  • Network Adaptors: Name or each, MAC Address, IP Address, Description

  • Power Supply: Name, Manufacturer, Part number, Serial Number For Sensors: Description, Type, Status

  • Fan Sensors:Description, Type, Value, Status, Warning Threshold (Lower), Warning Threshold (Upper), Critical Threshold (Lower), Critical Threshold (Upper), Non-Recoverable Threshold (Lower), Non-Recoverable Threshold (Upper),

For more information about supported hardware, see the Supported Systems.

Monitoring Hardware Health

Enterprise Manager Ops Center monitors the sensors in the hardware and displays the following information:

  • CPU temperature

  • Ambient temperature

  • Fan speed in revolutions per minute

  • Voltages

  • LEDs

States of Hardware Health

If a hardware asset can report a value for a hardware variable, Enterprise Manager Ops Center reports its current state and compares it to the threshold value.

  • Good – The hardware asset is working properly.

  • Unknown – Enterprise Manager Ops Center is unable to retrieve information from the sensor. The hardware asset is connected but is not reporting information.

  • Unreachable – The hardware asset cannot be contacted. This state indicates a network problem.

  • Warning Failure – Enterprise Manager Ops Center has detected a potential or impending fault condition. Take action to prevent the problem.

  • Critical Failure – A fault condition has occurred. Take corrective action.

  • Nonrecoverable Failure – The hardware asset has failed. Recovery is not possible.

  • Faulted – The hardware asset reports a fault. Contact service personnel to repair.

Monitoring Hardware Variables

Enterprise Manager Ops Center monitors hardware assets according to the monitoring profile for that type of asset. The following hardware variables can be monitored:

  • Current

  • Disk

  • Fan

  • Power supply

  • Temperature

  • Voltage

To see the default profile for monitoring hardware, see Monitoring Profiles.

To View Hardware Variables

  1. Expand Assets in the Navigation pane.

  2. Expand the hardware type.

  3. Select the hardware. The Summary page of the hardware is displayed in the center pane.

  4. Click the Monitoring tab to view the variables.

  5. Select the Variable type. The variables are listed with their Warning, Critical, and Non-recoverable threshold values.

See Editing A Monitoring Rule to change a threshold value.

Monitoring Connectivity

Connectivity is the network interface of the system. You can view information about a hardware asset's Network Interface Card (NIC).

To Monitor Connectivity

  1. Expand Assets in the Navigation pane.

  2. Expand the hardware type and select the hardware asset. The Dashboard page of the hardware is displayed in the center panel.

  3. Click the Connectivity tab. The details about the network interface cards such as name, connection status, MAC address, and the corresponding IP address are displayed.

Monitoring Power Utilization

Input power is the power pulled into a power supply from an external resource. The power consumption of a hardware asset is the sum of the input power consumed by each power supply of the asset. Output power is the amount of power provided from the power supply to the system components, measured at the power supply output. Input power is calculated from output power by applying an efficiency function to the output power from each power supply.

Calculating power compensation for the blades is difficult because the power supplies are shared. Each blade gives a report based on the power consumption of the local components, but this is not an accurate power consumption value for an individual blade.

To measure the input power, the interfaces must be exposed and the service processors must be able to retrieve and report data with one-minute accuracy. Servers that can report power usage have a Charts tab. Use the following procedure to check whether any hardware asset can report its power utlization.

Checking Power Capability

  1. Expand the All Assets section of the Navigation pane.

  2. Select the server.

  3. Click the Capabilities tab.

  4. In the list of enabled capabilities, locate Report Power Usage.

Viewing Power Utilization

You can see current power usage and change the display of power graphs using the controls on the Energy tab and the Charts tab.

Energy Tab

The asset's Energy tab reports power consumption as the current value and for a period of time, as well as attributes of the fan and power supplies. The current values are reported for the following attributes:

  • Wattage

  • System Load for an OS

  • Utilization Per Cent for an Oracle VM Server for SPARC

  • Incoming air temperature and outgoing air temperatur

  • Power Policy

  • Cost Per KiloWatthour

  • Currency units used to compute cost. The price per currency unit is set by the Edit Energy Cost action in the Administration section of the Navigation pane.

    See Editing the Energy Cost for more information.

  • Total Power Cost for one day

The data over time is represented in the following graphs:

  • Power Consumption and Utilization: By default, the graph shows the power consumed in the last day in watts. If the server is shut down, the graph shows any existing historical data.

  • Temperature and Fan Speed: By default, the graph shows the incoming air temperature and the outgoing air temperature in Fahrenheit, and the average fan speed in RPM. Click on any point on the graph to see that data for that point in time.

By default, the graphs are in Live mode, which reports new information every five seconds. Click on the Live button in the graph's toolbar to make the information static. This enables you to change the period of the graphs to one of the following:

  • One hour (1H) – One point for every five minutes

  • One day (1D) – One point for every five minutes

  • Five days (5D – One point for every five minutes

  • Three weeks (3W) – One point every hour

  • Six weeks (6W) – One point every 12 hours

  • Six months (6M) – One point for every day

To make a graph with the minimum of two points, a hardware asset must have been managed for at least 10 minutes to view a one-hour graph and for at least two days to view the six-months graph.

The data for these time periods is stored separately. For example, if a server has been managed for two hours and you select the 6W view, the graph cannot be displayed because only one point of data of that type has been stored; the second point has not yet occurred. If you then select the 1D view, the graph can display 24 points of data (120 minutes at 5-minute intervals). However, the graph displays these points over a 24-hour period and not over the actual two-hour period. For the most accurate representation of the data, choose a time period that is less than or equal to the time that the hardware asset has been managed.

You can export the data for either the current view or all available data to a file in either CSV or XML format. Use the Export Chart Data toolbar icon to choose options for exporting the data.

If the graph is blank, one of the following conditions has occurred:

  • The server does not have the appropriate ILOM version.

  • The server has not been discovered through the ILOM driver.

  • The server is unreachable.

Charts Tab

The Chart tab provides more ways to display the power utilization data. You can change the graphed data to a bar chart or an area chart. You can also export the data for either the current view or all available data to a file in either CSV or XML format. Use the Export Chart Data button to choose options for exporting the data.

For groups and virtual pools, the following options are available:

  • Select Order: The five highest or five lowest historical power utilization.

  • Select Resource: Select the Power or Aggregate Power option for a homogeneous or heterogenous group of servers.

    • The Power option displays power utilization for the five highest or lowest power consumers in the group or virtual pool.

    • The Aggregate Power option displays the power utilization, using the sum of all members that report power consumption. The number of systems in the aggregate is included. For heterogenous group, the Chart tab includes a table of all systems in the group and their various power attributes for the selected time period. From this table, you can power off and power on selected servers to conserve power.

To View Power Utilization Charts

  1. Select a hardware asset from the Assets section in the Navigation pane. You can also select a group or virtual pool from the Assets section if the group contains an hardware asset.

  2. Select the Energy tab to view the current data. The Power Utilization and Consumption graph shows the power use in the last hour. The Temperature and Fan Speed graph shows the incoming air temperature, the outgoing air temperature, and the fan speed.

  3. To see historical data, click on the Live button to stop updating the data.

  4. Click on the Display field and select one of the time periods. Both graphs change immediately.

  5. To see the cost of the power use, click on the Chart Options drop-down list and select the Chart Cost option.

  6. To change the type of graph, click on the Charts tab and select either Bar or Area for the type of graph.

To Export Power Utilization Charts

  1. Expand Assets in the Navigation pane.

  2. Expand the hardware type and select the hardware asset.

  3. Select the Energy tab to display the graphs. In the graph's toolbar, click the Export to CSV/XML icon. or Select the Charts tab and then click the Export Chart Data button. The Export Data window is displayed.

  4. Select the format in which you want to store the data, either CSV and XML format.

  5. If you have already set the time period of the chart, select the option Current View for the Time Period. If you want the data for six months, select the 6 Months option for the Time Period.

  6. Click Export to store the data. The data is exported and saved in the directory where you are running the user interface.

Configuring Power Utilization

The CPUs of a server have the ability to manage power consumption.

To Set Policy for an Asset's Power Utilization

  1. Select a server, a group of servers, or some members of a group of servers from the Assets list in the Navigation pane.

  2. Click Set Power Policy in the Actions pane. If a selected server does not support power configuration, the action is not available. The Modify/Change Power Control Settings popup is displayed.

  3. Click one of the options for configuring a power policy:

    • Set Elastic Mode to enable CPU power management. This option conserves power but decreases performance.

    • Set Performance Mode to disable CPU power management. This option increases performance but increases power consumption.

  4. Click Close.

Using a Hardware Monitoring Profile

A hardware monitoring profile is a set of rules applied to a hardware asset. If a status changes or a threshold is crossed, an alert is created. Enterprise Manager Ops Center provides default profiles for each asset type. You can create new profiles or modify existing profiles.

To Display the Current Hardware Monitoring Profile:

  1. Expand Assets section of the Navigation pane.

  2. Expand All Assets.

  3. Select a hardware asset.

  4. Click on the Monitoring tab. For each rule in the profile, the Monitor tab shows the name of the rule, the limits of the rule, and whether the rule is in effect.

To Apply a Hardware Monitoring Profile:

  1. Expand Assets section of the Navigation pane.

  2. Expand All Assets.

  3. Select a hardware asset.

  4. Click on the Monitoring tab.

  5. Click Apply a Monitoring Profile in the Action pane. The Apply Monitoring Profile wizard starts, with the selected hardware asset specified as the target.

  6. Click on the Profile list and select a profile from the list. To see details of the profile, click the icon.

  7. Click Apply. The asset you select is now monitored, according to the rules in the profile.

Managing Locator Lights

You can activate or deactivate LED locator lights on managed servers and blades to locate a specific asset among many of the same type. This can simplify physical maintenance tasks.

To Activate Locator Lights

  1. Expand Assets in the Navigation pane.

  2. Expand All Assets and select one hardware asset or a homogeneous group.

  3. Click Locator Lights On in the Actions pane. The LED locator lights on the asset or assets are activated.

To Deactivate Locator Lights

  1. Expand Assets in the Navigation pane.

  2. Expand All Assets and select one hardware asset or a homogeneous group.

  3. Click Locator Lights Off in the Actions pane. The LED locator lights on the asset or assets are deactivated.

Resetting a Server

You can reset a server or a set of servers.

To Reset a Server

  1. Select a server or server group from either the Navigation pane or the Membership Graph.

  2. Click Reset Server(s) to reset the system. For a group, select the list of servers from the group and click the Reset Server(s) icon. A Reset pop-up window appears with the following options:

    • Reset

    • Force Reset

    • Reset with Network Boot

  3. Click the appropriate option as required to reset the selected hardware.

Powering an Asset On and Off

You can use Enterprise Manager Ops Center to power on and power off a server or chassis. Stopping a server initiates a graceful shutdown of the operating system and subsequent power-off of the server. If no operating system is installed, you must force a shutdown of the server.

You can power on a managed server or a server group. If boot PROMs are configured, the servers will boot.

To Power On a Server or Chassis

  1. Expand Assets in the Navigation pane.

  2. Select a hardware asset.

  3. Choose Power On in the Actions pane. For server groups, select the servers from the list and click the Power On icon.

  4. To start the server, choose Default Power On.

  5. To start the server and use a manual network boot, choose Power On with Network Boot. A manual network boot is required for OS-based or manually discovered servers.

  6. Click OK. A job is submitted. Jobs initiated on groups of servers will run longer than jobs on individual servers.

To Power Off a Server or Chassis

  1. Expand Assets in the Navigation pane.

  2. Select a hardware asset.

  3. Choose Power Off in the Actions pane. For server groups, select the servers from the list and click the Power Off icon. A Power OFF pop-up window opens. The following options are available:

    • Power OFF

    • Force Power OFF

    • Emergency Power OFF (This is available only for Chassis.)

  4. Click the appropriate option. A job is initiated. The selected asset is be powered off.

Accessing the Serial Console

From the Enterprise Manager Ops Center UI, you can get access to a managed hardware asset's operating system through the asset's serial console. The Enterprise Manager Ops Center UI opens the asset's serial console, starts an ssh session, and logs into the operating system using the stored credentials for the asset's service processor. You can then issue operating system commands but you cannot issue service processor commands.

Before You Begin

  • Enable ssh. Use Custom Discovery to discover the OS and the SPs on the hardware so that you can enable ssh access. If you have already discovered and managed an asset and now want to use the serial console, re-discover the asset using Custom Discovery.

  • Verify network access. Verify that the Enterprise Controller can use ssh on Port 22 so that the Enterprise Controller can reach the asset's Proxy Controller or agent.

  • Verify that your role gives you permission to access and change the asset.

To Access the Serial Console

  1. Expand All Assets in the Assets section of the Navigation pane.

  2. (Optional) Filter the assets by selecting an asset type in the View window in the center pane.

  3. Select the hardware asset. The asset details are displayed in the center pane.

  4. Click the Console tab.

  5. Click Enable Console to activate the console. A job is submitted to activate the console and log you into the operating system.

  6. Issue operating system commands.

  7. (Optional) Click Undock to detach the console from the UI and move it to another location on your monitor.

  8. To close the console, press the ESC+( keys.