ping

ping [-json] [-shard <shardId>]

The ping and verify commands return information about the runtime entities of a data store. The command accesses components and Admin services available from the topology, returning information about the state of various components.

-json

Displays output in JSON format.
–shard <shardId>

Displays a subset of status information about the specific shard ID you supply.

Here is a basic example of calling ping from the Admin CLI:

kv-> ping
Pinging components of store mystore based upon topology sequence #308
300 partitions and 3 storage nodes
Time: 2019-01-03 20:19:27 UTC   Version: 19.1.0
Shard Status: healthy:1 writable-degraded:0 read-only:0 offline:0 total:1 Admin Status: healthy
Zone [name=1 id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false]
RN Status: online:3 read-only:0 offline:0 
maxDelayMillis:0 maxCatchupTimeSecs:0
Storage Node [sn1] on localhost:13230    
Zone: [name=1 id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false]    
Status: RUNNING   Ver: 19.1.0 2019-01-03 08:17:52 UTC  Build id: 12641466031c Edition: Enterprise
Admin [admin1]	    Status: RUNNING,MASTER
Rep Node [rg1-rn1]	Status: RUNNING,MASTER sequenceNumber:633 haPort:13233 
available storage size:109 GB
Storage Node [sn2] on localhost:13240    
Zone: [name=1 id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false]    
Status: RUNNING   Ver: 19.1.0 2019-01-03 08:17:52 UTC  Build id: 12641466031c Edition: Enterprise
Admin [admin2]		  Status: RUNNING,REPLICA
	Rep Node [rg1-rn2]     Status: RUNNING,REPLICA sequenceNumber:633 haPort:13243 available storage size:109 GB delayMillis:0 catchupTimeSecs:0
Storage Node [sn3] on localhost:13250    Zone: [name=1 id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false]    Status: RUNNING   Ver: 19.1.0 2019-01-03 08:17:52 UTC  Build id: 12641466031c Edition: Enterprise
	Admin [admin3]	  Status: RUNNING,REPLICA
	Rep Node [rg1-rn3]     Status: RUNNING,REPLICA sequenceNumber:633 haPort:13253 available storage size:109 GB delayMillis:0 catchupTimeSecs:0

About Shard and Admin Status

After running a ping command, you should understand what is most useful (or troubling) about the system health. The most important content is the Shard Status entry. The following ping output details indicate one shard (total:1) that is healthy (healthy:1). All of the status types you'd prefer not to see (writable-degraded, read-only, and offline are zero (0), indicating nothing has one of those states. Everything is good.

Shard Status: healthy:1 
writable-degraded:0 
read-only:0 
offline:0 
total:1

What exactly does a healthy shard indicate? A healthy shard is one with all of its RNs running. Thus, if all shards in the topology are healthy, then all RNs are running, and no failures exist. Why are RNs so important? Because they are the components that perform read and write data operations.

Checking the Admin nodes status is also useful. In this simple example, only one Admin shard exists, so there is a single result: Admin Status: healthy. Other possible states are: writable-degraded, read-only, or offline.

For both RN shards and admins, these are what each result indicates:

Result	Meaning
`healthy`	All nodes are running, and the system is fully operational.
`writable-degraded`	A majority of the nodes are running. All operations are supported, but a minority of the nodes are offline or don't support writes. If you are using RF=3, this state is one step closer to being unable to support all operations. For example, with one node offline, losing another node means quorum will be lost, and the shard becomes read-only. Most people use RF=3, so this is typically what writable-degraded means.
`read-only`	Only a minority of the nodes are running. Read operations are supported, but write operations are not.
`offline`	No nodes are running, so no operations are supported.

About Zone Status

The next information from ping is about zones:

Zone [name=1 id=zn1
type=PRIMARY 
allowArbiters=false 
masterAffinity=false] 
RN Status: online:3 read-only:0 offline:0 
maxDelayMillis:0 
maxCatchupTimeSecs:0

For stores with multiple zones, this information provides the status of nodes in different locations. For example, if a store was deployed using three zones, with the machines for each zone in a separate building, this information gives a quick summary status for machines in each building. In this simple example, there is only one zone, so that status information is similar to that for the entire store. The maxDelayMillis and maxCatchupTimeSecs entries provide information about data replication to replicas located in the zone. In our example, both values are zero (0). However, having large numbers for these entries could suggest that there are hardware problems with the machines in the zone, or problems with the network that connects that zone to other zones. Such information would be used only for more detailed debugging.

About Storage Nodes

Next, there is information about the nodes associated with a particular storage node:

Storage Node [sn1] on localhost:13230 Zone: 
[name=1 id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false] 
Status: RUNNING Ver: 19.1.0 2019-01-03 08:17:52 UTC Build id: 12641466031c 
Edition: Enterprise 
Admin [admin1] Status: RUNNING,MASTER 
Rep Node [rg1-rn1] Status: RUNNING,MASTER 
sequenceNumber:633 haPort:13233 available storage size:109 GB

The Status: entry for the SN can have several possible values:

Status	Description
`STARTING`	The storage node is starting up.
`WAITING_FOR_DEPLOY`	The storage node is running but is waiting to be deployed in a new store.
`RUNNING`	The storage node is running -- this is the usual state.
`STOPPING`	The storage node is in the process of stopping, but is not yet in a `STOPPED` status.
`STOPPED`	The storage node is stopped.
`UNREACHABLE`	The storage node is not reachable, either because the SN service is down, the host machine is offline, or the machine is not reachable over the network.

About RNs and Admins on the Storage Node

The next entries provide status information about RNs and any Admin processes that are running on the storage node. Not all storage nodes have admin nodes. The number of RNs running on the storage node depends on the SN capacity.

Admin [admin1] Status: RUNNING,MASTER Rep Node [rg1-rn1] 
Status: RUNNING,MASTER sequenceNumber:633 haPort:13233 available storage size:109 GB

The Status: entry for both admin nodes and RNs, can have the following values:

Status	Description
`STARTING`	The node is starting up.
`RUNNING,MASTER`	The node is up and is the master. The master is in contact with a majority of nodes in the shard, and can perform writes requiring acknowledgment. This is the first of two normal states.
`RUNNING,REPLICA`	The node is up and is a replica. This is the second of two normal states.
`RUNNING,MASTER (non-authoritative)`	The node is up and is the master, but is not in contact with a majority of nodes in the shard. A non-authoritative master can perform only writes that do not require acknowledgment.
`STOPPING`	The node is stopping.
`UNREACHABLE`	The node could not be contacted over the network. The node is either stopped, failed, or there is a problem with the network connection to the machine.
Additional status values that can be appended to the status line to provide more information:
`readonly requests enabled`	The node is running in read-only mode because the `plan enable-requests` command was run to set the node into read-only user operations mode.
`requests disabled`	The node is running with all user operation requests disabled, because the `plan enable-requests` command was run to disable all requests on the node. The `plan enable-requests` command disables requests on a per-shard basis, so it will prevent writes or all operations on all data in the shard.

While not shown in the initial example, the ping and verify commands can display one of the following states for RNs and shards. The table describes their effects and outcomes:

Displayed State	Effects	Outcome
`Unknown`	Masters go down.	Represents the read-only state of the RNs and shards still running. Currently, we do not support read-only status for any RN.
`Non-Authoritative Master`	Replica nodes go down.	After Replica nodes are down, remaining RNs and shards are in read-only mode. Currently, we do not support read-only status for any RN.
`Out of disk space`	Masters and replica nodes go down. Replicas are left in the `RUNNING, UNKNOWN` state, and the masters are in the `Non-Authoritative` state.	When masters and replica nodes go down, any remaining RNs and shards are in read-only mode. Currently, we do not support read-only status for any RN.
`Write requests disabled`	RNs and shard health are in read-only enabled request state.	RNs and shards are unable to accept any user requests, and are marked offline.

Both the ping and verify commands detect these states. Following is the output of a ping command on a shard (rg2), in a normal state, showing how results are returned:

kv-> ping -shard rg2
Pinging components of store mystore based upon topology sequence #2376
shard rg2 500 partitions and 3 storage nodes Time: 2018-09-28 07:06:46 UTC   Version: 18.3.2
Shard Status: healthy: Admin Status: healthy Zone [name=shardzone id=zn1 type=PRIMARY 
allowArbiters=false masterAffinity=false]
RN Status: online:3 offline:0 maxDelayMillis:0 maxCatchupTimeSecs:0
Storage Node [sn10] on nodeA:5000    Zone: [name=shardzone id=zn1 type=PRIMARY 
allowArbiters=false masterAffinity=false] Status: RUNNING   Ver: 18.3.2 2018-09-17 09:33:45 UTC  
Build id: a72484b8b33c Edition: Enterprise
        Rep Node [rg2-rn1] 
        Status: RUNNING,MASTER sequenceNumber:71,166 haPort:5010
        available storage size:8 GB Storage Node [sn11] on nodeB:5000 
        Zone: [name=shardzone id=zn1 type=PRIMARY 
        allowArbiters=false masterAffinity=false]  Status: RUNNING   Ver: 18.3.2 2018-09-17 09:33:45 UTC  
        Build id: a72484b8b33c Edition: Enterprise
        Rep Node [rg2-rn2]
        Status: RUNNING,REPLICA sequenceNumber:71,166 haPort:5011 
        available storage size:4 GB delayMillis:0 catchupTimeSecs:0
Storage Node [sn12] on nodeC:5000    Zone: [name=shardzone id=zn1 type=PRIMARY 
allowArbiters=false masterAffinity=false] Status: RUNNING   Ver: 18.3.2 2018-09-17 09:33:45 UTC  
Build id: a72484b8b33c Edition: Enterprise
        Rep Node [rg2-rn3] 
        Status: RUNNING,REPLICA sequenceNumber:71,166 haPort:5012
        available storage size:6 GB delayMillis:0 catchupTimeSecs:0

Following are examples of return information when different states occur.

Shard status becomes writable-degraded and is read-only:

kv-> ping 
Pinging components of store concurrent plan store based upon topology sequence #1082
 1000 partitions and 9 storage nodes 
Time: 2018-11-06 05:12:36 UTC Version: 18.3.8 
        Shard Status: healthy:2 writable-degraded:12 read-only:4 offline:0 total:18 
Admin Status: healthy 
Zone [name=dc1 id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false] 
RN Status: online:30 read-only:24 offline:0 maxDelayMillis:0 maxCatchupTimeSecs:0 
Storage Node [sn1] on slcao397:5000 
Zone: [name=dc1 id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false] Status: RUNNING 
Ver: 18.3.8 2018-10-26 11:36:43 UTC Build id: 6259xxxxxxxx Edition: Enterprise

RNs can have the RUNNING,UNKNOWN state for more than one reason, including reaching a disk limit, or when the RN is down:

Storage Node [sn4] on slcao400:5000 Zone: [name=dc1 id=zn1 type=PRIMARY 
allowArbiters=false masterAffinity=false] Status: RUNNING Ver: 18.3.8 2018-10-26 11:36:43 UTC 
Build id: 6259xxxxxxxx Edition: Enterprise 
        Rep Node [rg7-rn1] Status: RUNNING,UNKNOWN sequenceNumber:173,717,825 haPort:5020
        available storage size:-3 GB delayMillis:? catchupTimeSecs:? 
        Rep Node [rg8-rn1] Status: RUNNING,UNKNOWN sequenceNumber:173,555,937 haPort:5021 
        available storage size:-3 GB delayMillis:? catchupTimeSecs:? 
        Rep Node [rg9-rn1] Status: RUNNING,MASTER sequenceNumber:173,697,007 haPort:5022 available storage size:-3 GB 
        Rep Node [rg10-rn1] Status: RUNNING,UNKNOWN sequenceNumber:173,293,747 haPort:5023 
        available storage size:-3 GB delayMillis:? catchupTimeSecs:? 
        Rep Node [rg11-rn1] Status: RUNNING,UNKNOWN sequenceNumber:170,561,758 haPort:5024 available storage size:-3 GB 
        delayMillis:? catchupTimeSecs:? 
        Rep Node [rg12-rn1] Status: RUNNING,MASTER sequenceNumber:170,410,483 haPort:5025 available storage size:-3 GB

A running out of disk space error results in the master becoming non-authoritative:

 Storage Node [sn6] on slcao402:5000 Zone: [name=dc1 id=zn1 type=PRIMARY allowArbiters=false 
masterAffinity=false] Status: RUNNING Ver: 18.3.8 2018-10-26 11:36:43 UTC Build id: 6259xxxxxxxx 
Edition: Enterprise 
Rep Node [rg7-rn3] Status: RUNNING,MASTER (non-authoritative) 
sequenceNumber:173,754,579 haPort:5020 available storage size:45 GB 
Rep Node [rg8-rn3] Status: RUNNING,REPLICA sequenceNumber:173,555,937 
haPort:5021 available storage size:46 GB 
delayMillis:0 catchupTimeSecs:0 
Rep Node [rg9-rn3] Status: RUNNING,REPLICA 
sequenceNumber:173,697,007 haPort:5022 available storage size:45 GB 
delayMillis:0 catchupTimeSecs:0 
Rep Node [rg10-rn3] Status: RUNNING,MASTER (non-authoritative) 
sequenceNumber:173,293,747 haPort:5023 available storage size:45 GB 
Rep Node [rg11-rn3] Status: RUNNING,REPLICA sequenceNumber:170,561,758 
haPort:5024 available storage size:45 GB delayMillis:0 catchupTimeSecs:0 
Rep Node [rg12-rn3] Status: RUNNING,REPLICA sequenceNumber:170,410,483 haPort:5025 
available storage size:46 GB delayMillis:0 catchupTimeSecs:0

Finally, here is a basic example of calling ping -json:

kv-> ping -json
{
  "operation" : "ping",
  "returnCode" : 5000,
  "description" : "No errors found",
  "returnValue" : {
    "topology" : {
      "storeName" : "OurStore",
      "sequenceNumber" : 104,
      "numPartitions" : 100,
      "numStorageNodes" : 1,
      "time" : 1546801860520,
      "version" : "18.3.4"
    },
    "adminStatus" : "healthy",
    "shardStatus" : {
      "healthy" : 1,
      "writable-degraded" : 0,
      "read-only" : 0,
      "offline" : 0,
      "total" : 1
    },
    "zoneStatus" : [ {
      "resourceId" : "zn1",
      "name" : "OurZone",
      "type" : "PRIMARY",
      "allowArbiters" : false,
      "masterAffinity" : false,
      "rnSummaryStatus" : {
        "online" : 1,
        "offline" : 0,
        "read-only" : 0,
        "hasReplicas" : false
      }
    } ],
    "snStatus" : [ {
      "resourceId" : "sn1",
      "hostname" : "OurHost",
      "registryPort" : 5000,
      "zone" : {
        "resourceId" : "zn1",
        "name" : "OurZone",
        "type" : "PRIMARY",
        "allowArbiters" : false,
        "masterAffinity" : false
      },
      "serviceStatus" : "RUNNING",
      "version" : "18.4.0 2018-12-06 09:21:03 UTC  Build id: fbfbd1541004 Edition: Enterprise",
      "adminStatus" : {
        "resourceId" : "admin1",
        "status" : "RUNNING",
        "state" : "MASTER",
        "authoritativeMaster" : true
      },
      "rnStatus" : [ {
        "resourceId" : "rg1-rn1",
        "status" : "RUNNING",
        "requestsEnabled" : "ALL",
        "state" : "MASTER",
        "authoritativeMaster" : true,
        "sequenceNumber" : 381,
        "haPort" : 5013,
        "availableStorageSize" : "97 GB"
      } ],
      "anStatus" : [ ]
    } ],
    "exitCode" : 0
  }
}

You can also access the ping utility through Admin utility tools, available in kvtool.jar. For more information see ping.