Telemetry Measurements

The library collects various device telemetry data that you can use to monitor library performance.

Telemetry data can be a running total (such as robot and CAP operational data) or show an instantaneous value for a point in time (such as temperature or fan speed). Running totals reset after each library startup.

Robot Measurements

The robot tracks gets and puts of tape cartridges.

  • Gets - running total of gets performed by robot
  • Get Retries - running total of retries performed during gets
  • Failures - running total of failures during gets
  • Puts - running total of puts performed by robot
  • Put Retries - running total of retries performed during puts
  • Put Failures - running total of failures during puts

CAP Measurements

The library collects CAP measurements for both rotational and Access Module CAPs.

  • Operations - running total of open and close operations performed by CAP
  • Retries - running total of retries
  • Unrecoverable Errors - running total of unrecoverable errors for the CAP (typically zero or one because an unrecoverable error requires replacement)
  • Reboots - running total of CAP restarts (typically just one at library startup, but this can be higher if you replace the CAP controller card while the library is running)

Library Energy Measurements

Each PDU in the library has a single sensor that collects the energy draw and energy consumption for the PDU. Total power consumption is the sum of all PDU energy usage in the library.

  • Kilowatts - average power draw over measurement period, in KW
  • Kilowatt hours - energy consumption over measurement period, in KW-hours
  • Duration - measurement period

Device Power Measurements

Some devices contain "hot swap controller chips" that collect the energy draw and energy consumption for the device and any downstream components.

  • Input Voltage - input DC voltage to the device in volts.
  • Watts - power draw in watts at the time of measurement
  • Input Amps - input current to the device in amps.

Devices monitored in the Base card cage:

  • Library controllers
  • Feature cards
  • Root switches, which each have a 12V input sensor and a 48V power sensor for the web camera
  • Fan assemblies
  • Storage hard drives
  • Video card

Robot-related devices monitored:

  • Rail controller, which has an input sensor to the rail controller and an input sensor to the rail
  • Robot controller, which captures input to the robot controller and robot, including all mechanical mechanisms

Drive-related devices monitored:

  • Drive switches
  • Drive controller which captures input to the drive tray (including power for the drive controller, fans in the drive tray, tape drive and encryption card, if present)

Access Module devices monitored:

  • Access module controllers which capture the input to the Access Module (including power for access module controller and the mechanical mechanisms in the module)

Other devices monitored:

  • Rotational CAPs (rotary)

Network Statistics Measurements

Ethernet switches and some device controllers in the library collect a running total of network performance data.

The root and drive switches collect data infrequently due to the high number of ports, about every 10 to 20 minutes. The rail, robot, drive, and Access Module controllers collect data every few minutes. Each device has multiple sensors that represent the network ports on that device that link to another device. The sensor names reflect the destination of the link.

  • Port Speed - speed at which the port is running.
  • Transmit Bytes – bytes transmitted by the port.
  • TX Dropped Packets – transmit packets dropped by the port due to lack of resources or internal MAC sublayer transmit error.
  • TX Collisions – collisions experienced by a port during packet transmissions.
  • TX Pause Events – PAUSE packets transmitted on the port.
  • Receive Bytes – bytes of data received by the port.
  • RX Dropped Packets – packets received by a port that were dropped due to lack of resources. This increments only if the receive error was not counted by the RX Alignment Errors or the RX FCS Errors counters.
  • RX Pause Frames – PAUSE packets received by a port.
  • RX Alignment Errors – packets received by a port that have a bad FCS with a nonintegral number of bytes.
  • RX FCS Errors – packets received by a port that have a bad FCS with an integral number of bytes.
  • RX Symbol Errors – Number of times a valid length packet was received at a port and at least one invalid data symbol was detected.

Devices that record network statistics:

  • Access Module controller
  • Drive controller
  • Drive switch
  • Rail controller
  • Robot controller
  • Root switch

Fan Measurements

The library monitors fan speed and performance.

  • Performance - an overall assessment of the fan's health based on comparing the measured fan speed to the expected speed. The actual speed may be higher or lower than the expected speed.
    • GOOD — measured speed is within 15% of the expected speed.
    • MARGINAL — measured speed is between 15% and 20% of the expected speed.
    • POOR — measured speed is more than 20% away from the expected speed.
    • UNSTABLE — the fan speed cannot be measured accurately
    • NO_READING — the fan performance cannot be determined at the time of the measurement.
  • Speed - actual fan speed in RPM at the time of the measurement

Devices that record fan measurements:

  • Drive controller (up to three fans, depending on drive type)
  • Fan assembly (four fans)

Temperature Measurements

The library measures the temperature of most controllers in the card cage.

  • Temperature - in degrees Celsius at time of measurement

Devices that record temperature:

  • Library controllers (two sensors)
  • Robot controllers (two sensors)
  • Drive controllers
  • Root switches (two sensors)
  • Drive switches
  • Access Module controllers
  • DC convertors (two sensors).