14 Infiniband Switch

This chapter provides information about the Infiniband Switch metrics.

For each metric, it provides the following information:

  • Description

  • Metric table

    The metric table can include some or all of the following: target version, default collection frequency, default warning threshold, default critical threshold, and alert text.

These metrics describe the performance of each port of the switch and the aggregation of performance for Switch-to-Node and Switch-to-Switch link types. They also define whether a switch is a subnet manager for the network or not. Switch statistics are also covered.

14.1 Aggregate Sensors

This metric category is not initiated by the agent. The IB switch pushes information to the agent through SNMP trap mechanism. It works only when the agent subscribes for SNMP traps. Note that this metric is used only for generating alerts. No data is uploaded to repository. The All Metrics page will not show any data for this metric.

14.1.1 Alarm Status

This metric reports whether the severity is set or cleared (Major/Cleared).

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text
All Versions Every 5 Minutes Not Defined major The aggregate sensor %keyValue% has a fault.

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.1.2 Sensor Value

This metric reports whether the aggregate sensor is de-asserted (1) or aggregate sensor state is asserted (2).

Target Version Collection Frequency
All Versions Every 5 Minutes

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.2 Fan Speed Sensors

Similar to Aggregate sensors, this metric category contains SNMP trap based metrics.

14.2.1 Alarm Status

This metric reports the alarm status. These values (Critical/Major/Warning) indicate fan speed has exceeded fatal, critical, and non-critical thresholds, respectively. The first two states are shown as Critical alert in Enterprise Manager and the last state is shown as Warning.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text
All Versions Every 5 Minutes warning|FAULT_DIAGNOSED|FAULT_SUSPECTED|WARNING critical|major|CRITICAL|ERROR|FAILED|FAULTED|NOT_PRESENT|NON_RECOVERABLE|PREDICTIVE_FAILURE_ASSERTED|LOWER_CRITICAL|UPPER_CRITICAL|LOWER_NON_RECOVERABLE|UPPER_NON_RECOVERABLE The speed of fan %keyValue% has exceeded its threshold.

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.2.2 Fan Speed (revolutions per minute)

This metric reports the speed of the fan in revolutions per minute.

Target Version Collection Frequency
All Versions Every 5 Minutes

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.3 Fan Speed Sensor Alerts

Similar to Fan Speed Sensors, this metric category contains SNMP trap based metrics.

14.3.1 Alarm Status

This metric reports the alarm status. These values (Critical/Major/Warning) indicate that fan speed has exceeded fatal, critical, and non-critical thresholds, respectively. The first two states are shown as a Critical alert in Enterprise Manager and the last state is shown as Warning.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text
All Versions Every 5 Minutes warning|WARNING critical|major|CRITICAL|ERROR|FAILED The speed of fan %keyValue% has exceeded its threshold.

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.3.2 Fan Speed (revolutions per minute)

This metric reports the speed of the fan in revolutions per minute.

Target Version Collection Frequency
All Versions Every 5 Minutes

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.4 FRU Removal Alerts

This metric category provides information about field replaceable unit (FRU) removal alerts.

14.4.1 FRU Status

This metric displays an alert that is sent for all FRU removals.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text
All Versions Every 5 Minutes Not Defined Not Defined The FRU %keyValue% has been removed from the system.

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.5 Response

The metric in this category is used to detect whether the management server on the cell is running.

14.5.1 Response Status

This metric is checked at 1 minute intervals. A one in the status column indicates that the cell is up, otherwise the cell is down.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text
All Versions Every 1 Minute Not Defined 0 Failed to connect to Infiniband switch %target%.

Data Source

Not available.

User Action

No user action is required.

14.6 Switch Gateway Port State

This metric category provides information about the gateway metrics for gateway ports of an Infiniband switch.

14.6.1 10 Gb/s Ethernet Port

This metric displays the 10 Gb/s Ethernet port number.

Target Version Collection Frequency
All Versions Every 5 Minutes

14.6.2 State

This metric displays the state of the gateway.

Target Version Collection Frequency
All Versions Every 5 Minutes

14.6.3 Received Bytes

This metric displays the number of bytes received by the gateway

Target Version Collection Frequency
All Versions Every 5 Minutes

14.6.4 Received Packets

This metric displays the number of packets received by the gateway.

Target Version Collection Frequency
All Versions Every 5 Minutes

14.6.5 Received Jumbo Packets

This metric displays the number of jumbo packets received by the gateway.

Target Version Collection Frequency
All Versions Every 5 Minutes

14.6.6 Received Unicast Packets

This metric displays the number of unicast packets received by the gateway.

Target Version Collection Frequency
All Versions Every 5 Minutes

14.6.7 Received Broadcast Packets

This metric displays the number of broadcast packets received by the gateway.

Target Version Collection Frequency
All Versions Every 5 Minutes

14.6.8 Received Buffers

This metric displays the number of buffers received by the gateway.

Target Version Collection Frequency
All Versions Every 5 Minutes

14.6.9 Received CRC Errors

This metric displays the number of Cyclic Redundancy Check (CRC) errors received by the gateway.

Target Version Collection Frequency
All Versions Every 5 Minutes

14.6.10 Received Runtime Errors

This metric displays the number of runtime errors received by the gateway.

Target Version Collection Frequency
All Versions Every 5 Minutes

14.6.11 Received Total Errors

This metric displays the total number of errors received by the gateway.

Target Version Collection Frequency
All Versions Every 5 Minutes

14.6.12 Transmitted Bytes

This metric displays the number of bytes transmitted by the gateway.

Target Version Collection Frequency
All Versions Every 5 Minutes

14.6.13 Transmitted Packets

This metric displays the number of packets transmitted by the gateway.

Target Version Collection Frequency
All Versions Every 5 Minutes

14.6.14 Transmitted Jumbo Packets

This metric displays the number of jumbo packets transmitted by the gateway.

Target Version Collection Frequency
All Versions Every 5 Minutes

14.6.15 Transmitted Unicast Packets

This metric displays the number of unicast packets transmitted by the gateway.

Target Version Collection Frequency
All Versions Every 5 Minutes

14.6.16 Transmitted Multicast Packets

This metric displays the number of multicast packets transmitted by the gateway.

Target Version Collection Frequency
All Versions Every 5 Minutes

14.6.17 Transmitted Broadcast Packets

This metric displays the number of broadcast packets transmitted by the gateway.

Target Version Collection Frequency
All Versions Every 5 Minutes

14.6.18 Transmitted Total Errors

This metric displays the total number of errors transmitted by the gateway.

Target Version Collection Frequency
All Versions Every 5 Minutes

14.7 Switch Performance Summary

This metric category provides overall performance of the ibswitch across all ports.

14.7.1 Average link throughput (KBPS)

This metric reports the average number of bytes received and transmitted per second across all ports in the ibswitch (KBPS).

Target Version Collection Frequency
All Versions Every 5 Minutes

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.7.2 Highest link throughput (KBPS)

This metric reports the maximum number of bytes received and transmitted per second across all ports in ibswitch (KBPS).

Target Version Collection Frequency
All Versions Every 5 Minutes

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.7.3 Lowest link throughput (KBPS)

This metric reports the minimum number of bytes received and transmitted per second across all ports in ibswitch (KBPS).

Target Version Collection Frequency
All Versions Every 5 Minutes

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.8 Switch Port Configuration Monitor

This metric category is mainly used for monitoring the connectivity of ports and raising alerts when there is a disconnection.

14.8.1 GUID on the other end of the link

This metric reports the IB globally unique identifier (GUID). This is not an Enterprise Manager target GUID of the entity to which the port is connected. This can be switch GUID, if the other end is a switch port, or port GUID if it is an HCA port.

Target Version Collection Frequency
All Versions Every 5 Minutes

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.8.2 Name of the entity to which this port is connected

This metric reports the name of the entity (Switch/Cell/Compute Node) to which this switch port is connected.

Target Version Collection Frequency
All Versions Every 5 Minutes

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.8.3 Node GUID if the peer is a Switch port, Port GUID otherwise

This metric displays the node GUID if the peer port is a switch port. Otherwise, it displays the port GUID, indicating a HCA port.

Target Version Collection Frequency
All Versions Every 5 Minutes

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.8.4 Port number of the peer port

This metric reports the port number of the peer port.

Target Version Collection Frequency
All Versions Every 5 Minutes

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.8.5 Type of entity to which this disconnected port was connected

If this port is currently disconnected, then this field provides the type of the entity from which disconnection happened. It can take four possible values (Switch/Cell/Node/None). When the port is in connected state then the value for this metric is None.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text
All Versions Every 5 Minutes Not Defined node|cell|switch Port %PortNumber% on %target% is disconnected from port %ConnectedToPortNumberPrev% on %ConnectedToNamePrev%.

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.8.6 Type of the entity to which this port is connected.

This metric can take any of the three values (Switch/Cell/Compute Node) depending on what entity this port is connected to.

Target Version Collection Frequency
All Versions Every 5 Minutes

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.9 Switch Port Errors

The metrics in this metric category provide statistics obtained from perfquery output on the switch. This metric values provide the delta change in error counters since last collection. Alerts are raised only if there are new errors since last metric collection.

14.9.1 Excessive buffer overruns

This metric reports the number of ”buffer overruns exceeding the threshold” since last Collection (which is 5 minutes).

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text
All Versions Every 5 Minutes Not Defined Not Defined Port %PortNumber% has %value% excessive buffer overruns, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.9.2 Incoming VL15 packets dropped due to resource limitation

This metric reports the number of incoming VL 15 packets dropped due to lack of buffers since last metric collection.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text
All Versions Every 5 Minutes Not Defined Not Defined Port %PortNumber% has %value% incoming VL15 packets dropped, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.9.3 Link integrity errors

This metric displays the number of link integrity errors, that is errors on the local link.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text
All Versions Every 5 Minutes Not Defined Not Defined Port %PortNumber% has %value% link integrity errors, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.9.4 Link recovers

This metric reports the number of times the link error recovery process was completed successfully since last collection.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text
All Versions Every 5 Minutes Not Defined Not Defined Port %PortNumber% has %value% link recovers, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.9.5 Packets not transmitted due to constraints

This metric reports the number of packets not transmitted due to constrains since last collection.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text
All Versions Every 5 Minutes Not Defined Not Defined Port %PortNumber% has %value% packets not transmitted due to constraints, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.9.6 Received packets discarded due to constraints

This metric reports the number of packets discarded due to constraints since last collection.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text
All Versions Every 5 Minutes Not Defined Not Defined Port %PortNumber% has %value% received packets discarded due to constraints, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.9.7 Received packets marked with the EBP delimiter

This metric reports the number of packets marked with the EBP delimiter received on the port.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text
All Versions Every 5 Minutes Not Defined Not Defined Port %PortNumber% has %value% received packets marked with the EBP delimiter, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.9.8 Received packets with error

This metric reports the number of packets received with errors since last collection.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text
All Versions Every 5 Minutes Not Defined Not Defined Port %PortNumber% has %value% received packets containing an error, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.9.9 Symbol errors

This metric reports the number of symbols errors detected since last collection.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text
All Versions Every 5 Minutes Not Defined Not Defined Port %PortNumber% has %value% symbol errors, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.9.10 Total errors

This metric reports the sum total of all errors mentioned above.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text
All Versions Every 5 Minutes 10 Not Defined Port %PortNumber% has %value% total errors, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.10 Switch Port Performance

This metrics category contains performance metrics at the switch port level.

14.10.1 Link Throughput: bytes transmitted and received per sec (KBPS)

This metric reports the number of bytes transmitted and received.

Target Version Collection Frequency
All Versions Every 5 Minutes

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.10.2 Number of bytes received per sec (KBPS)

This metric reports the number of bytes received per second (KBPS).

Target Version Collection Frequency
All Versions Every 5 Minutes

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.10.3 Number of bytes transmitted per sec (KBPS)

This metric reports the number of bytes transmitted per second (KBPS).

Target Version Collection Frequency
All Versions Every 5 Minutes

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.10.4 Number of packets received per sec

This metric reports the number of packets received per second.

Target Version Collection Frequency
All Versions Every 5 Minutes

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.10.5 Number of packets transmitted per sec

This metric reports the number of packets transmitted per second.

Target Version Collection Frequency
All Versions Every 5 Minutes

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.11 Switch Port State

This metrics category contains Switch Port state metrics.

14.11.1 Active link width of port based on cable connectivity

This metric displays the active link width of the port based on the cable connectivity.

Target Version Collection Frequency
All Versions Every 5 Minutes

14.11.2 Is the link degraded?

This metric reports whether or not the link is degraded. If the active speed of a link is less than the enabled speed, then it is considered to be degraded and this column value is set to 1. It is mainly used for raising alerts.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text
All Versions Every 5 Minutes Not Defined 1 Port %PortNumber% is running in degraded mode.

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.11.3 Link state

This metric reports the link state. The link is down if the physical link state is 0.

Target Version Collection Frequency
All Versions Every 5 Minutes

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.11.4 Physical link state

This metric reports the physical link state. The physical link state is 0 if the port is in polling or disabled state.

Target Version Collection Frequency
All Versions Every 5 Minutes

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.11.5 The active link speed (Gbps)

The metric reports the speed of the active link.

Target Version Collection Frequency
All Versions Every 5 Minutes

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.12 Switch Port State (For Alerts)

This metrics category contains Switch Port state metrics (for alerts.

14.12.1 Indicates that cable is present but port is disabled

This metric reports that the cable is present but that the port is disabled. This metric's collection frequency is event-driven.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text
All Versions Event-driven Not Defined 1 Cable is present on Port %PortNumber% but the port is disabled.

14.12.2 Indicates that cable is present but port is polling for peer port

This metric reports that the cable is present but the port is checking for the peer port. This metric's collection frequency is event-driven.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text
All Versions Event-driven Not Defined 1 Cable is present on Port %PortNumber% but it is polling for peer port. This could happen when the peer port is unplugged/disabled.

14.13 Switch State Summary

This metrics category contains metrics that report the overall state of switch ports.

14.13.1 Number of active ports

This metric reports the total number of active ports.

Target Version Collection Frequency
All Versions Every 5 Minutes

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.13.2 Number of degraded ports

This metric reports the total number of degraded ports.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text
All Versions Every 5 Minutes Not Defined Not Defined Number of degraded ports is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.13.3 Number of ports with errors

This metric reports the number of ports with errors. From 12.1.0.3 Exadata plug-in onwards, degraded ports are counted both in Degraded ports and Error ports categories.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text
All Versions Every 5 Minutes Not Defined Not Defined Number of ports with errors is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.14 Switch Temperatures

This metrics category contains metrics that report the switch temperature.

14.14.1 Back of switch temperature

This metric reports the rear chassis temperature.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text
All Versions Every 5 Minutes Not Defined Not Defined Switch back temperature is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.14.2 Front of switch temperature

This metric reports the front chassis temperature.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text
All Versions Every 5 Minutes Not Defined Not Defined Switch front temperature is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.14.3 Switch I4 chip temperature

This metric reports the I4 chip temperature.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text
All Versions Every 5 Minutes Not Defined Not Defined Switch I4 chip temperature is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.14.4 Switch Service Processor temperature

This metric reports the management controller temperature.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text
All Versions Every 5 Minutes Not Defined Not Defined Switch service processor temperature is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.15 Temperature Sensors

Similar to other SNMP trap based metrics, this metric category contains metrics that are also used only for generating alerts and are not uploaded to the repository.

14.15.1 Alarm Status

This metric reports the alarm status. These values (Critical/Major/Warning) indicate if the temperature has exceeded fatal, critical, and non-critical thresholds, respectively. The first two states are shown as Critical alert in Enterprise Manager and the last state is shown as Warning.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text
All Versions Every 5 Minutes Not Defined critical|major|CRITICAL|ERROR|FAILED The temperature sensor %keyValue% has exceeded its threshold.

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.15.2 Temperature (degrees Celsius)

This metric reports the temperature of rear chassis/front chassis/I4 chip/Management controller.

Target Version Collection Frequency
All Versions Every 5 Minutes

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.16 Voltage Sensors

This metrics category contains metrics that report the voltage sensor.

14.16.1 Alarm Status

This metric reports the alarm status. These values (Critical/Major/Warning) indicate if the temperature has exceeded fatal, critical, and non-critical thresholds, respectively. The first two states are shown as Critical alert in Enterprise Manager and the last state is shown as Warning.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text
All Versions Every 5 Minutes Not Defined critical|major|CRITICAL|ERROR|FAILED The voltage sensor %keyValue% has exceeded its threshold.

Data Source

The data is collected using SNMP.

User Action

No user action is required.

14.16.2 Voltage (mV)

This metric reports the voltage recorded by various voltage sensors on the ibswitch.

Target Version Collection Frequency
All Versions Every 5 Minutes

Data Source

The data is collected using SNMP.

User Action

No user action is required.