Monitoring the System Using the CLI
|
This chapter describes the CLI commands you can use to monitor the 5800 system. For information on using the GUI to monitor the system, see Monitoring the 5800 System Using the GUI.
This chapter contains the following sections:
Obtaining System Status
- Obtain basic system state information with the command sysstat. This command provides an estimate of free space in the system that is available for data storage. For a detailed breakdown of space usage per disk, refer to the df command described in Obtaining Disk Status.
Note - In a multicell configuration, you can specify a cell ID with the -c or --cellid option to see information about a particular cell. If you do not specify a cell ID, information about all cells is displayed. If you use -v (--verbose) or -i (--interval) options with the sysstat command in a multicell configuration, you must specify the cell ID.
|
For example:
ST5800 $ sysstat
Cell 23: Online. Estimated Free Space: 14.96 TB
16 nodes online, 64 disks online.
Data VIP 10.7.226.22, Admin VIP 10.7.226.21
Data services Online, Query Engine Status: HAFaultTolerant
Data Integrity check not completed since boot
Data Reliability check not completed since boot
Query Integrity not established
NDMP status: Backup ready.
|
The output that the sysstat command produces is explained below. Data reported is for all online disks in the entire system.
- Data services Online indicates that the system is available to read and write to via the API, while Data services Offline means that the system is not available to read and write to via the API.
- Query Engine Status reports the states of the query engine, as follows:
HAFaultTolerant - Query services are available and highly fault tolerant.
FaultTolerant - Query services are available, but not as fault tolerant as in the HAFaultTolerant state.
Operational - Query services are available, but not fault tolerant.
Starting - The query engine is starting up. This process may include creating the query database or recreating the connection to the database. Query services are not available during this process.
Unknown - The query engine is in an undetermined state. This may be because it is too early in the starting up process to establish a connection to the query engine, or the query engine is in the process of restarting.
Stopped - The query engine is stopped; query services are not available.
Unavailable - The query engine is not returning any status at this time, probably because it is in a transitional state; query services may not be available.
Nonoperational - The query engine is corrupted; no query services will be available until the system has completed recreating the engine.
- Data Integrity check indicates when the system last completed checking each fragment on the system for integrity against bit rot. Each cycle of this testing might take up to one week to complete, so the check will be listed as not complete for the first week after a system reboot.
- Data Reliability check indicates when the system last completed a full cycle of testing to detect and recover any missing fragments, indicating that the system has full reliability. Each cycle of this testing takes approximately 12 hours to complete, so the check will be listed as not complete for the first 12 hours after a system reboot.
- Query Integrity established provides assurance that a query of data stored on the 5800 system will accurately reflect the contents of the object archive. Exceptions would be data that was stored or deleted from the 5800 system while the query was in progress, as well as objects that were stored after the query integrity time and for which the store operation returned the special error status isIndexed=false to the storing application.
- NDMP Status check indicates the status of the Network Data Management Protocol (NDMP), which enables you to back-up the data stored on the system to tape and restore that data in the event of catastrophic system loss. This check indicates whether the data has been backed up and is available for restoration and also whether the backup or restore is in progress.
- Obtain extended system state with the command sysstat -v or sysstat --verbose. Use the -i or --interval option to indicate the number of seconds at which to repeat the statistics listing. (If you use the -v, --version, -i, or --interval options in a multicell system, you must use -c or --cellid to specify the cell ID.)
Verbose output includes the online/offline status of each node and disk in the system. The online/offline status reported by the command refers to the logical system status. To see the state of hardware components, refer to the hwstat command described in Obtaining FRU Listings.
For example:
ST5800 $ sysstat --verbose
NODE-101 [ONLINE]
DISK-101:0 [ONLINE]
DISK-101:1 [OFFLINE]
DISK-101:2 [ONLINE]
DISK-101:3 [ONLINE]
NODE-102 [ONLINE]
DISK-102:0 [ONLINE]
DISK-102:1 [ONLINE]
DISK-102:2 [ONLINE]
DISK-102:3 [ONLINE]
NODE-103 [ONLINE]
DISK-103:0 [ONLINE]
|
Note - If a disk is listed as off line, the disk should be replaced.
|
Displaying Performance Statistics
- Display real-time performance metrics about throughput and operations using the command perfstats.
The metrics displayed reflect activity during a specified time internal (default is 15 seconds). There is a delay between the time an action occurred on the system and the time it is displayed by the perfstats command.
For example:
ST5800 $ perfstats
Cell Performance Statistics:
Avg Avg
# Ops Op/sec KB/sec
-------- ---------- ------------
Add MD: 0 0.00 0.00
Store: 0 0.00 0.00
Retrieve: 1 0.20 0.15
Retrieve MD: 0 0.00 0.00
Delete: 0 0.00 -
Query: 687 22.90 -
WebDAV Put: 0 0.00 0.00
WebDAV Get: 0 0.00 0.00
Hive Performance Statistics:
Load 1m: 4.12 Load 5m: 4.21 Load 15m: 4.43
Disk Used: 241.28 GB Disk Total: 13.38 TB Usage: 1.8%
|
- Display performance statistics for repeated intervals of time, for a specified period of time, using the commands perfstats --howlong minutes and perfstats --interval seconds.
Note - To specify that the perfstats command should run indefinitely, use --howlong 0.
|
- Display performance statistics for a single node in the system using the command perfstats --node node_id.
For example:
ST5800 $ perfstats --node NODE-101
NODE-101 Performance Statistics:
Avg Avg
# Ops Op/sec KB/sec
-------- ---------- ------------
Add MD: 0 0.00 0.00
Store: 0 0.00 0.00
Retrieve: 1 0.20 0.15
Retrieve MD: 0 0.00 0.00
Delete: 0 0.00 -
Query: 687 22.90 -
WebDAV Put: 0 0.00 0.00
WebDAV Get: 0 0.00 0.00
Hive Performance Statistics:
Load 1m: 4.12 Load 5m: 4.21 Load 15m: 4.43
Disk Used: 241.28 GB Disk Total: 13.38 TB Usage: 1.8%
|
Viewing the System Software Version
- Display the version of the system software using the command version.
For example:
ST5800 $ version
ST5800 1.1 release [1.1-11076]
- Display version information for each node, the service node, and the switches using the command version --verbose.
For example:
Note - In normal operation, all nodes should be running the same version of Service Management Daughter Card (SMDC) and the same version of Basic Input Output System (BIOS).
|
ST5800 $ version --verbose
ST5800 1.1 release [1.1-11076]
Service Node:
BIOS Version: 1.1.3
SMDC Version: 4.13
Switch:
Overlay Version (sw#1): 11068
Overlay Version (sw#2): 11068
NODE-101:
BIOS version: 0.1.8
SMDC version: 4.18
NODE-102:
BIOS version: 0.1.8
SMDC version: 4.18
NODE-103:
BIOS version: 0.1.8
SMDC version: 4.18
NODE-104:
BIOS version: 0.1.8
SMDC version: 4.18
.
.
.
ST5800 $
|
Obtaining FRU Listings
- Obtain a list of field-replaceable units (FRUs) with the command
hwstat --cellid cellid.
ST5800 $ hwstat --cellid 8
Component Type FRU ID Status
------------ ------ ---------------------------------------------- --------
NODE-101 NODE cd904c73-d8ca-d311-0080-c88c5581e000 ONLINE
DISK-101:0 DISK SATA_____HITACHI_HDS7250S______KRVN63ZAGHVTZD ENABLED
DISK-101:1 DISK SATA_____HITACHI_HDS7250S______KRVN63ZAGHVTVD ENABLED
DISK-101:2 DISK SATA_____HITACHI_HDS7250S______KRVN63ZAGHVZBD ENABLED
DISK-101:3 DISK SATA_____HITACHI_HDS7250S______KRVN63ZAGHWPYD ENABLED
NODE-102 NODE e3904c73-d8ca-d311-0080-558c5581e000 ONLINE
DISK-102:0 DISK SATA_____HITACHI_HDS7250S______KRVN63ZAGHVKWD ENABLED
DISK-102:1 DISK SATA_____HITACHI_HDS7250S______KRVN63ZAGG68AD ENABLED
DISK-102:2 DISK SATA_____HITACHI_HDS7250S______KRVN63ZAGHYPXD ENABLED
DISK-102:3 DISK SATA_____HITACHI_HDS7250S______KRVN63ZAGHWS0D ENABLED
DISK-108:0 DISK SATA_____HITACHI_HDS7250S______KRVN63ZAGHEE3D ENABLED
DISK-108:1 DISK SATA_____HITACHI_HDS7250S______KRVN63ZAGHEHAD ENABLED
DISK-108:2 DISK SATA_____HITACHI_HDS7250S______KRVN63ZAGHJ6BD ENABLED
.
.
.
SWITCH-1 SWITCH 00:11:95:a2:25:00 ACTIVE
SWITCH-2 SWITCH 00:11:95:a2:30:00 STANDBY
SN SN ec29694a-58c5-d311-0080-826c5c81e000 ONLINE
ST5800 $
|
- Obtain information about a specific FRU using the command
hwstat --FRUID fruid or hwstat -f fruid.
ST5800 $ hwstat --FRUID NODE-107
Component Type FRU ID Status
------------ ------ -------------------------------------------- --------
NODE-107 NODE 72cda8b6-aec3-d311-0080-2a835981e000 ONLINE
DISK-107:0 DISK ATA_____HITACHI_HDS7250S______KRVN63ZAGLX7GD ENABLED
DISK-107:1 DISK ATA_____HITACHI_HDS7250S______KRVN63ZAGLY5PD ENABLED
DISK-107:2 DISK ATA_____HITACHI_HDS7250S______KRVN63ZAGGY8VD ENABLED
DISK-107:3 DISK ATA_____HITACHI_HDS7250S______KRVN63ZAGLXA7D ENABLED
ST5800 $ hwstat -f SWITCH-1
Component: SWITCH-1 Type: SWITCH Status: [ACTIVE]
FRU ID: 00:11:95:a2:25:00
ST5800 $ hwstat -f SN
Component: SN Type: SN Status: [ONLINE]
FRU ID: c0904c73-d8ca-d311-0080-6d285981e000
ST5800 $
|
Obtaining Disk Status
Use the df command to display a summary of disk usage. In a multicell configuration, you can specify a cell ID with the -c or --cellid option to see information about a particular cell. If you do not specify a cell ID, information about all cells is displayed.
Note - In a multicell configuration, the df -p or df --physical option, which displays the physical free space on all disks, requires a cell ID.
|
Note the following information about the utilization numbers displayed:
- The used value in the display is not equivalent to the total number of object bytes stored in the system. The used value includes space consumed by data parity, object headers and footers, and query indexes.
- Storage utilization statistics displayed by df are refreshed every three minutes.
- When using df to view storage utilization, the system reserves 15% of raw storage space to allow for data recovery on a full system.
- Obtain a summary of disk usage in an easily readable format with the command df --human-readable or df -h.
The displayed numbers refer to the logical space used by, or available for, the user’s data storage. Reserved space is reserved by the system for data recovery operations, and is not available to the user. Total space is a sum of Available + Used + Reserved. The use percentage is calculated as Used / (Used + Available).
For example:
ST5800 $ df -h
Contacting all cells, please wait.
All Cells:
Total: 52.18 TB; Avail: 51.33 TB; Used: 864.16 GB; Usage: 1.6%
Cell 22:
Total: 26.71 TB; Avail: 26.29 TB; Used: 438.70 GB; Usage: 1.6%
Cell 23:
Total: 25.46 TB; Avail: 25.05 TB; Used: 425.46 GB; Usage: 1.6%
|
- Obtain information on the physical space available on the disk with the command df -p or df --physical.
Note - The system can no longer accept objects for storage when any disk in the system reaches 80% capacity.
|
For example:
ST5800 $ df --physical
All sizes expressed in 1K blocks
DISK-101:0: Total: 449128448; Avail: 434057216; Used: 15071232; Usage: 3.4%
DISK-101:1: Total: 449128448; Avail: 444561408; Used: 4567040; Usage: 1.0%
DISK-101:2: Total: 449128448; Avail: 444561408; Used: 4567040; Usage: 1.0%
DISK-101:3: Total: 449128448; Avail: 444561408; Used: 4567040; Usage: 1.0%
DISK-102:0: Total: 449128448; Avail: 444561408; Used: 4567040; Usage: 1.0%.
.
.
.
|
Obtaining Voltage, Temperature, and Fan Speed Data
Use the command sensors to display voltage, temperature, and fan speed data, as collected by system sensors.
For example:
ST5800 $ sensors
NODE-101:
DDR Voltage 2.60 Volts
CPU Voltage 1.42 Volts
VCC 3.3V 3.32 Volts
VCC 5V 5.12 Volts
VCC 12V 12.03 Volts
Battery Voltage 2.98 Volts
CPU Temperature 49 degrees C
System Temperature 32 degrees C
System Fan 1 speed 11340 RPM
System Fan 2 speed 11340 RPM
System Fan 3 speed 11070 RPM
System Fan 4 speed 10980 RPM
System Fan 5 speed 11070 RPM
NODE-102:
DDR Voltage 2.60 Volts
CPU Voltage 1.43 Volts
VCC 3.3V 3.32 Volts
VCC 5V 5.10 Volts
VCC 12V 12.10 Volts
Battery Voltage 2.98 Volts
CPU Temperature 49 degrees C
System Temperature 33 degrees C
System Fan 1 speed 11700 RPM
System Fan 2 speed 11430 RPM
.
.
.
ST5800 $
|
Sun StorageTek 5800 System Administration Guide
|
820-4118-10
|
|
Copyright © 2008, Sun Microsystems, Inc. All Rights Reserved.