ping
ping [-json] [-shard <shardId>]
The ping
and verify
commands return information about the runtime entities of a data store. The command accesses components and Admin services available from the topology, returning information about the state of various components.
-
-json
Displays output in JSON format.
-
–shard <shardId>
Displays a subset of status information about the specific shard ID you supply.
kv-> ping
Pinging components of store mystore based upon topology sequence #308
300 partitions and 3 storage nodes
Time: 2019-01-03 20:19:27 UTC Version: 19.1.0
Shard Status: healthy:1 writable-degraded:0 read-only:0 offline:0 total:1 Admin Status: healthy
Zone [name=1 id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false]
RN Status: online:3 read-only:0 offline:0
maxDelayMillis:0 maxCatchupTimeSecs:0
Storage Node [sn1] on localhost:13230
Zone: [name=1 id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false]
Status: RUNNING Ver: 19.1.0 2019-01-03 08:17:52 UTC Build id: 12641466031c Edition: Enterprise
Admin [admin1] Status: RUNNING,MASTER
Rep Node [rg1-rn1] Status: RUNNING,MASTER sequenceNumber:633 haPort:13233
available storage size:109 GB
Storage Node [sn2] on localhost:13240
Zone: [name=1 id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false]
Status: RUNNING Ver: 19.1.0 2019-01-03 08:17:52 UTC Build id: 12641466031c Edition: Enterprise
Admin [admin2] Status: RUNNING,REPLICA
Rep Node [rg1-rn2] Status: RUNNING,REPLICA sequenceNumber:633 haPort:13243 available storage size:109 GB delayMillis:0 catchupTimeSecs:0
Storage Node [sn3] on localhost:13250 Zone: [name=1 id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false] Status: RUNNING Ver: 19.1.0 2019-01-03 08:17:52 UTC Build id: 12641466031c Edition: Enterprise
Admin [admin3] Status: RUNNING,REPLICA
Rep Node [rg1-rn3] Status: RUNNING,REPLICA sequenceNumber:633 haPort:13253 available storage size:109 GB delayMillis:0 catchupTimeSecs:0
After running a ping
command, you should understand what is most useful (or troubling) about the system health. The most important content is the Shard Status entry. The following ping
output details indicate one shard (total:1
) that is healthy (healthy:1
). All of the status types you'd prefer not to see (writable-degraded
, read-only
, and offline
are zero (0
), indicating nothing has one of those states. Everything is good.
Shard Status: healthy:1
writable-degraded:0
read-only:0
offline:0
total:1
What exactly does a healthy shard indicate? A healthy shard is one with all of its RNs running. Thus, if all shards in the topology are healthy, then all RNs are running, and no failures exist. Why are RNs so important? Because they are the components that perform read and write data operations.
Checking the Admin nodes status is also useful. In this simple example, only one Admin shard exists, so there is a single result: Admin Status: healthy
. Other possible states are: writable-degraded
, read-only
, or offline
.
Result | Meaning |
---|---|
healthy |
All nodes are running, and the system is fully operational. |
writable-degraded |
A majority of the nodes are running. All operations are supported, but a minority of the nodes are offline or don't support writes. If you are using RF=3, this state is one step closer to being unable to support all operations. For example, with one node offline, losing another node means quorum will be lost, and the shard becomes read-only. Most people use RF=3, so this is typically what writable-degraded means. |
read-only |
Only a minority of the nodes are running. Read operations are supported, but write operations are not. |
offline |
No nodes are running, so no operations are supported. |
About Zone Status
The next information from ping is about zones:
Zone [name=1 id=zn1
type=PRIMARY
allowArbiters=false
masterAffinity=false]
RN Status: online:3 read-only:0 offline:0
maxDelayMillis:0
maxCatchupTimeSecs:0
For stores with multiple zones, this information provides the status of nodes in different locations. For example, if a store was deployed using three zones, with the machines for each zone in a separate building, this information gives a quick summary status for machines in each building. In this simple example, there is only one zone, so that status information is similar to that for the entire store. The maxDelayMillis
and maxCatchupTimeSecs
entries provide information about data replication to replicas located in the zone. In our example, both values are zero (0). However, having large numbers for these entries could suggest that there are hardware problems with the machines in the zone, or problems with the network that connects that zone to other zones. Such information would be used only for more detailed debugging.
About Storage Nodes
Storage Node [sn1] on localhost:13230 Zone:
[name=1 id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false]
Status: RUNNING Ver: 19.1.0 2019-01-03 08:17:52 UTC Build id: 12641466031c
Edition: Enterprise
Admin [admin1] Status: RUNNING,MASTER
Rep Node [rg1-rn1] Status: RUNNING,MASTER
sequenceNumber:633 haPort:13233 available storage size:109 GB
Status:
entry for the SN can have several possible values:
Status | Description |
---|---|
STARTING |
The storage node is starting up. |
WAITING_FOR_DEPLOY |
The storage node is running but is waiting to be deployed in a new store. |
RUNNING |
The storage node is running -- this is the usual state. |
STOPPING |
The storage node is in the process of stopping, but is not yet in a STOPPED status.
|
STOPPED |
The storage node is stopped. |
UNREACHABLE |
The storage node is not reachable, either because the SN service is down, the host machine is offline, or the machine is not reachable over the network. |
About RNs and Admins on the Storage Node
Admin [admin1] Status: RUNNING,MASTER Rep Node [rg1-rn1]
Status: RUNNING,MASTER sequenceNumber:633 haPort:13233 available storage size:109 GB
The Status:
entry for both admin nodes and RNs, can have the following values:
Status | Description |
---|---|
STARTING |
The node is starting up. |
RUNNING,MASTER |
The node is up and is the master. The master is in contact with a majority of nodes in the shard, and can perform writes requiring acknowledgment. This is the first of two normal states. |
RUNNING,REPLICA
|
The node is up and is a replica. This is the second of two normal states. |
RUNNING,MASTER (non-authoritative) |
The node is up and is the master, but is not in contact with a majority of nodes in the shard. A non-authoritative master can perform only writes that do not require acknowledgment. |
STOPPING |
The node is stopping. |
UNREACHABLE |
The node could not be contacted over the network. The node is either stopped, failed, or there is a problem with the network connection to the machine. |
Additional status values that can be appended to the status line to provide more information: | |
readonly requests enabled
|
The node is running in read-only mode because the plan enable-requests command was run to set the node into read-only user operations mode.
|
requests disabled |
The node is running with all user operation requests disabled, because the plan enable-requests command was run to disable all requests on the node. The plan enable-requests command disables requests on a per-shard basis, so it will prevent writes or all operations on all data in the shard.
|
ping
and verify
commands can display one of the following states for RNs and shards. The table describes their effects and outcomes:
Displayed State | Effects | Outcome |
---|---|---|
Unknown |
Masters go down. | Represents the read-only state of the RNs and shards still running. Currently, we do not support read-only status for any RN. |
Non-Authoritative Master |
Replica nodes go down. | After Replica nodes are down, remaining RNs and shards are in read-only mode. Currently, we do not support read-only status for any RN. |
Out of disk space |
Masters and replica nodes go down. Replicas are left in the RUNNING, UNKNOWN state, and the masters are in the Non-Authoritative state.
|
When masters and replica nodes go down, any remaining RNs and shards are in read-only mode. Currently, we do not support read-only status for any RN. |
Write requests disabled |
RNs and shard health are in read-only enabled request state. | RNs and shards are unable to accept any user requests, and are marked offline. |
ping
and verify
commands detect these states. Following is the output of a ping command on a shard (rg2
), in a normal state, showing how results are returned:
kv-> ping -shard rg2
Pinging components of store mystore based upon topology sequence #2376
shard rg2 500 partitions and 3 storage nodes Time: 2018-09-28 07:06:46 UTC Version: 18.3.2
Shard Status: healthy: Admin Status: healthy Zone [name=shardzone id=zn1 type=PRIMARY
allowArbiters=false masterAffinity=false]
RN Status: online:3 offline:0 maxDelayMillis:0 maxCatchupTimeSecs:0
Storage Node [sn10] on nodeA:5000 Zone: [name=shardzone id=zn1 type=PRIMARY
allowArbiters=false masterAffinity=false] Status: RUNNING Ver: 18.3.2 2018-09-17 09:33:45 UTC
Build id: a72484b8b33c Edition: Enterprise
Rep Node [rg2-rn1]
Status: RUNNING,MASTER sequenceNumber:71,166 haPort:5010
available storage size:8 GB Storage Node [sn11] on nodeB:5000
Zone: [name=shardzone id=zn1 type=PRIMARY
allowArbiters=false masterAffinity=false] Status: RUNNING Ver: 18.3.2 2018-09-17 09:33:45 UTC
Build id: a72484b8b33c Edition: Enterprise
Rep Node [rg2-rn2]
Status: RUNNING,REPLICA sequenceNumber:71,166 haPort:5011
available storage size:4 GB delayMillis:0 catchupTimeSecs:0
Storage Node [sn12] on nodeC:5000 Zone: [name=shardzone id=zn1 type=PRIMARY
allowArbiters=false masterAffinity=false] Status: RUNNING Ver: 18.3.2 2018-09-17 09:33:45 UTC
Build id: a72484b8b33c Edition: Enterprise
Rep Node [rg2-rn3]
Status: RUNNING,REPLICA sequenceNumber:71,166 haPort:5012
available storage size:6 GB delayMillis:0 catchupTimeSecs:0
- Shard status becomes writable-degraded and is read-only:
kv-> ping
Pinging components of store concurrent plan store based upon topology sequence #1082
1000 partitions and 9 storage nodes
Time: 2018-11-06 05:12:36 UTC Version: 18.3.8
Shard Status: healthy:2 writable-degraded:12 read-only:4 offline:0 total:18
Admin Status: healthy
Zone [name=dc1 id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false]
RN Status: online:30 read-only:24 offline:0 maxDelayMillis:0 maxCatchupTimeSecs:0
Storage Node [sn1] on slcao397:5000
Zone: [name=dc1 id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false] Status: RUNNING
Ver: 18.3.8 2018-10-26 11:36:43 UTC Build id: 6259xxxxxxxx Edition: Enterprise
- RNs can have the
RUNNING,UNKNOWN
state for more than one reason, including reaching a disk limit, or when the RN is down:
Storage Node [sn4] on slcao400:5000 Zone: [name=dc1 id=zn1 type=PRIMARY
allowArbiters=false masterAffinity=false] Status: RUNNING Ver: 18.3.8 2018-10-26 11:36:43 UTC
Build id: 6259xxxxxxxx Edition: Enterprise
Rep Node [rg7-rn1] Status: RUNNING,UNKNOWN sequenceNumber:173,717,825 haPort:5020
available storage size:-3 GB delayMillis:? catchupTimeSecs:?
Rep Node [rg8-rn1] Status: RUNNING,UNKNOWN sequenceNumber:173,555,937 haPort:5021
available storage size:-3 GB delayMillis:? catchupTimeSecs:?
Rep Node [rg9-rn1] Status: RUNNING,MASTER sequenceNumber:173,697,007 haPort:5022 available storage size:-3 GB
Rep Node [rg10-rn1] Status: RUNNING,UNKNOWN sequenceNumber:173,293,747 haPort:5023
available storage size:-3 GB delayMillis:? catchupTimeSecs:?
Rep Node [rg11-rn1] Status: RUNNING,UNKNOWN sequenceNumber:170,561,758 haPort:5024 available storage size:-3 GB
delayMillis:? catchupTimeSecs:?
Rep Node [rg12-rn1] Status: RUNNING,MASTER sequenceNumber:170,410,483 haPort:5025 available storage size:-3 GB
- A running out of disk space error results in the master becoming
non-authoritative
:
Storage Node [sn6] on slcao402:5000 Zone: [name=dc1 id=zn1 type=PRIMARY allowArbiters=false
masterAffinity=false] Status: RUNNING Ver: 18.3.8 2018-10-26 11:36:43 UTC Build id: 6259xxxxxxxx
Edition: Enterprise
Rep Node [rg7-rn3] Status: RUNNING,MASTER (non-authoritative)
sequenceNumber:173,754,579 haPort:5020 available storage size:45 GB
Rep Node [rg8-rn3] Status: RUNNING,REPLICA sequenceNumber:173,555,937
haPort:5021 available storage size:46 GB
delayMillis:0 catchupTimeSecs:0
Rep Node [rg9-rn3] Status: RUNNING,REPLICA
sequenceNumber:173,697,007 haPort:5022 available storage size:45 GB
delayMillis:0 catchupTimeSecs:0
Rep Node [rg10-rn3] Status: RUNNING,MASTER (non-authoritative)
sequenceNumber:173,293,747 haPort:5023 available storage size:45 GB
Rep Node [rg11-rn3] Status: RUNNING,REPLICA sequenceNumber:170,561,758
haPort:5024 available storage size:45 GB delayMillis:0 catchupTimeSecs:0
Rep Node [rg12-rn3] Status: RUNNING,REPLICA sequenceNumber:170,410,483 haPort:5025
available storage size:46 GB delayMillis:0 catchupTimeSecs:0
ping -json
: kv-> ping -json
{
"operation" : "ping",
"returnCode" : 5000,
"description" : "No errors found",
"returnValue" : {
"topology" : {
"storeName" : "OurStore",
"sequenceNumber" : 104,
"numPartitions" : 100,
"numStorageNodes" : 1,
"time" : 1546801860520,
"version" : "18.3.4"
},
"adminStatus" : "healthy",
"shardStatus" : {
"healthy" : 1,
"writable-degraded" : 0,
"read-only" : 0,
"offline" : 0,
"total" : 1
},
"zoneStatus" : [ {
"resourceId" : "zn1",
"name" : "OurZone",
"type" : "PRIMARY",
"allowArbiters" : false,
"masterAffinity" : false,
"rnSummaryStatus" : {
"online" : 1,
"offline" : 0,
"read-only" : 0,
"hasReplicas" : false
}
} ],
"snStatus" : [ {
"resourceId" : "sn1",
"hostname" : "OurHost",
"registryPort" : 5000,
"zone" : {
"resourceId" : "zn1",
"name" : "OurZone",
"type" : "PRIMARY",
"allowArbiters" : false,
"masterAffinity" : false
},
"serviceStatus" : "RUNNING",
"version" : "18.4.0 2018-12-06 09:21:03 UTC Build id: fbfbd1541004 Edition: Enterprise",
"adminStatus" : {
"resourceId" : "admin1",
"status" : "RUNNING",
"state" : "MASTER",
"authoritativeMaster" : true
},
"rnStatus" : [ {
"resourceId" : "rg1-rn1",
"status" : "RUNNING",
"requestsEnabled" : "ALL",
"state" : "MASTER",
"authoritativeMaster" : true,
"sequenceNumber" : 381,
"haPort" : 5013,
"availableStorageSize" : "97 GB"
} ],
"anStatus" : [ ]
} ],
"exitCode" : 0
}
}
ping
utility through Admin utility tools, available in kvtool.jar
. For more information see ping.