This chapter provides information about the Infiniband Switch metrics.
For each metric, it provides the following information:
Description
Metric table
The metric table can include some or all of the following: target version, default collection frequency, default warning threshold, default critical threshold, and alert text.
These metrics describe the performance of each port of the switch and the aggregation of performance for Switch-to-Node and Switch-to-Switch link types. They also define whether a switch is a subnet manager for the network or not. Switch statistics are also covered.
This metric category is not initiated by the agent. The IB switch pushes information to the agent through SNMP trap mechanism. It works only when the agent subscribes for SNMP traps. Note that this metric is used only for generating alerts. No data is uploaded to repository. The All Metrics page will not show any data for this metric.
This metric reports whether the severity is set or cleared (Major/Cleared).
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions | Every 5 Minutes | Not Defined | major | The aggregate sensor %keyValue% has a fault. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Similar to Aggregate sensors, this metric category contains SNMP trap based metrics.
This metric reports the alarm status. These values (Critical/Major/Warning) indicate fan speed has exceeded fatal, critical, and non-critical thresholds, respectively. The first two states are shown as Critical alert in Enterprise Manager and the last state is shown as Warning.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions | Every 5 Minutes | warning|FAULT_DIAGNOSED|FAULT_SUSPECTED|WARNING | critical|major|CRITICAL|ERROR|FAILED|FAULTED|NOT_PRESENT|NON_RECOVERABLE|PREDICTIVE_FAILURE_ASSERTED|LOWER_CRITICAL|UPPER_CRITICAL|LOWER_NON_RECOVERABLE|UPPER_NON_RECOVERABLE | The speed of fan %keyValue% has exceeded its threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Similar to Fan Speed Sensors, this metric category contains SNMP trap based metrics.
This metric reports the alarm status. These values (Critical/Major/Warning) indicate that fan speed has exceeded fatal, critical, and non-critical thresholds, respectively. The first two states are shown as a Critical alert in Enterprise Manager and the last state is shown as Warning.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions | Every 5 Minutes | warning|WARNING | critical|major|CRITICAL|ERROR|FAILED | The speed of fan %keyValue% has exceeded its threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metric category provides information about field replaceable unit (FRU) removal alerts.
This metric displays an alert that is sent for all FRU removals.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions | Every 5 Minutes | Not Defined | Not Defined | The FRU %keyValue% has been removed from the system. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
The metric in this category is used to detect whether the management server on the cell is running.
This metric is checked at 1 minute intervals. A one in the status column indicates that the cell is up, otherwise the cell is down.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions | Every 1 Minute | Not Defined | 0 | Failed to connect to Infiniband switch %target%. |
Data Source
Not available.
User Action
No user action is required.
This metric category provides information about the gateway metrics for gateway ports of an Infiniband switch.
This metric displays the 10 Gb/s Ethernet port number.
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
This metric displays the state of the gateway.
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
This metric displays the number of bytes received by the gateway
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
This metric displays the number of packets received by the gateway.
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
This metric displays the number of jumbo packets received by the gateway.
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
This metric displays the number of unicast packets received by the gateway.
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
This metric displays the number of broadcast packets received by the gateway.
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
This metric displays the number of buffers received by the gateway.
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
This metric displays the number of Cyclic Redundancy Check (CRC) errors received by the gateway.
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
This metric displays the number of runtime errors received by the gateway.
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
This metric displays the total number of errors received by the gateway.
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
This metric displays the number of bytes transmitted by the gateway.
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
This metric displays the number of packets transmitted by the gateway.
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
This metric displays the number of jumbo packets transmitted by the gateway.
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
This metric displays the number of unicast packets transmitted by the gateway.
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
This metric displays the number of multicast packets transmitted by the gateway.
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
This metric category provides overall performance of the ibswitch across all ports.
This metric reports the average number of bytes received and transmitted per second across all ports in the ibswitch (KBPS).
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metric reports the maximum number of bytes received and transmitted per second across all ports in ibswitch (KBPS).
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metric reports the minimum number of bytes received and transmitted per second across all ports in ibswitch (KBPS).
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metric category is mainly used for monitoring the connectivity of ports and raising alerts when there is a disconnection.
This metric reports the IB globally unique identifier (GUID). This is not an Enterprise Manager target GUID of the entity to which the port is connected. This can be switch GUID, if the other end is a switch port, or port GUID if it is an HCA port.
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metric reports the name of the entity (Switch/Cell/Compute Node) to which this switch port is connected.
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metric displays the node GUID if the peer port is a switch port. Otherwise, it displays the port GUID, indicating a HCA port.
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metric reports the port number of the peer port.
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
If this port is currently disconnected, then this field provides the type of the entity from which disconnection happened. It can take four possible values (Switch/Cell/Node/None). When the port is in connected state then the value for this metric is None.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions | Every 5 Minutes | Not Defined | node|cell|switch | Port %PortNumber% on %target% is disconnected from port %ConnectedToPortNumberPrev% on %ConnectedToNamePrev%. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metric can take any of the three values (Switch/Cell/Compute Node) depending on what entity this port is connected to.
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
The metrics in this metric category provide statistics obtained from perfquery output on the switch. This metric values provide the delta change in error counters since last collection. Alerts are raised only if there are new errors since last metric collection.
This metric reports the number of ”buffer overruns exceeding the threshold” since last Collection (which is 5 minutes).
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions | Every 5 Minutes | Not Defined | Not Defined | Port %PortNumber% has %value% excessive buffer overruns, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metric reports the number of incoming VL 15 packets dropped due to lack of buffers since last metric collection.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions | Every 5 Minutes | Not Defined | Not Defined | Port %PortNumber% has %value% incoming VL15 packets dropped, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metric displays the number of link integrity errors, that is errors on the local link.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions | Every 5 Minutes | Not Defined | Not Defined | Port %PortNumber% has %value% link integrity errors, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metric reports the number of times the link error recovery process was completed successfully since last collection.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions | Every 5 Minutes | Not Defined | Not Defined | Port %PortNumber% has %value% link recovers, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metric reports the number of packets not transmitted due to constrains since last collection.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions | Every 5 Minutes | Not Defined | Not Defined | Port %PortNumber% has %value% packets not transmitted due to constraints, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metric reports the number of packets discarded due to constraints since last collection.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions | Every 5 Minutes | Not Defined | Not Defined | Port %PortNumber% has %value% received packets discarded due to constraints, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metric reports the number of packets marked with the EBP delimiter received on the port.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions | Every 5 Minutes | Not Defined | Not Defined | Port %PortNumber% has %value% received packets marked with the EBP delimiter, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metric reports the number of packets received with errors since last collection.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions | Every 5 Minutes | Not Defined | Not Defined | Port %PortNumber% has %value% received packets containing an error, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metric reports the number of symbols errors detected since last collection.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions | Every 5 Minutes | Not Defined | Not Defined | Port %PortNumber% has %value% symbol errors, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metric reports the sum total of all errors mentioned above.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions | Every 5 Minutes | 10 | Not Defined | Port %PortNumber% has %value% total errors, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metrics category contains performance metrics at the switch port level.
This metric reports the number of bytes transmitted and received.
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metric reports the number of bytes received per second (KBPS).
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metric reports the number of bytes transmitted per second (KBPS).
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metrics category contains Switch Port state metrics.
This metric displays the active link width of the port based on the cable connectivity.
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
This metric reports whether or not the link is degraded. If the active speed of a link is less than the enabled speed, then it is considered to be degraded and this column value is set to 1. It is mainly used for raising alerts.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions | Every 5 Minutes | Not Defined | 1 | Port %PortNumber% is running in degraded mode. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metric reports the link state. The link is down if the physical link state is 0.
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metrics category contains Switch Port state metrics (for alerts.
This metric reports that the cable is present but that the port is disabled. This metric's collection frequency is event-driven.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions | Event-driven | Not Defined | 1 | Cable is present on Port %PortNumber% but the port is disabled. |
This metric reports that the cable is present but the port is checking for the peer port. This metric's collection frequency is event-driven.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions | Event-driven | Not Defined | 1 | Cable is present on Port %PortNumber% but it is polling for peer port. This could happen when the peer port is unplugged/disabled. |
This metrics category contains metrics that report the overall state of switch ports.
This metric reports the total number of active ports.
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metric reports the total number of degraded ports.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions | Every 5 Minutes | Not Defined | Not Defined | Number of degraded ports is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metric reports the number of ports with errors. From 12.1.0.3 Exadata plug-in onwards, degraded ports are counted both in Degraded ports and Error ports categories.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions | Every 5 Minutes | Not Defined | Not Defined | Number of ports with errors is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metrics category contains metrics that report the switch temperature.
This metric reports the rear chassis temperature.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions | Every 5 Minutes | Not Defined | Not Defined | Switch back temperature is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metric reports the front chassis temperature.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions | Every 5 Minutes | Not Defined | Not Defined | Switch front temperature is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metric reports the I4 chip temperature.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions | Every 5 Minutes | Not Defined | Not Defined | Switch I4 chip temperature is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metric reports the management controller temperature.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions | Every 5 Minutes | Not Defined | Not Defined | Switch service processor temperature is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Similar to other SNMP trap based metrics, this metric category contains metrics that are also used only for generating alerts and are not uploaded to the repository.
This metric reports the alarm status. These values (Critical/Major/Warning) indicate if the temperature has exceeded fatal, critical, and non-critical thresholds, respectively. The first two states are shown as Critical alert in Enterprise Manager and the last state is shown as Warning.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions | Every 5 Minutes | Not Defined | critical|major|CRITICAL|ERROR|FAILED | The temperature sensor %keyValue% has exceeded its threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
This metrics category contains metrics that report the voltage sensor.
This metric reports the alarm status. These values (Critical/Major/Warning) indicate if the temperature has exceeded fatal, critical, and non-critical thresholds, respectively. The first two states are shown as Critical alert in Enterprise Manager and the last state is shown as Warning.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions | Every 5 Minutes | Not Defined | critical|major|CRITICAL|ERROR|FAILED | The voltage sensor %keyValue% has exceeded its threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.