- Administration Guide
- Monitor Session Delivery Manager Server Health and Disk Usage
- Use the Health Monitor to Determine OCSDM Server Health
Use the Health Monitor to Determine OCSDM Server Health
A heartbeat is a server message that essentially says that it is active. The Health Monitor maintains OCSDM server statistics and heartbeats for all OCSDM cluster nodes. It also keeps a count of the times a node was considered inactive and the number of times the node returned to an active state based on the number of received and missed heartbeats.
- From the menu bar, select Tools, Health Monitor.
- In the Health Monitor Console display, select Heartbeat from the Select Monitor drop-down list.
- The default IP
address for this node is displayed in the
Select
Source drop-down list. If this node is part of a cluster, you can
check the health status for another node by selecting its IP address from the
Select
Source drop-down list. The following table columns are described
below for the targeted node:
Cluster Member The IP address of the cluster node. If the IP address of the node has the (Master) label appended, this node is running the master replication database. Status The following status applies for a cluster node: - ACTIVE—This node is actively participating in the cluster.
- DOWN—The node failed to send its heartbeats, is in a failed state, or a network partition exists between the cluster and this node.
Up Time (dd:hh:mm) The number of days, hours, and minutes the node has been active. Down Time (dd:hh:mm) The number of days, hours, and minutes the node has been down. Last Heartbeat Timestamp The date and time of the last known heartbeat of the node. Heartbeat Count The total number of node heartbeats. Missed Heartbeat Count The total number of times the heartbeat monitor on this node missed a heartbeat from other nodes in the cluster. Note:
An increase in this statistic might indicate network issues between nodes in the cluster.HBFM The Heartbeat Failure Meter (HBFM) statistic indicates the amount of times the required heartbeat counter of a node was not received. This number increases when the heartbeats start arriving again. If this statistic reaches a count of 10 (default) the node status is DOWN. MHFM The Maximum Heartbeat Failed Meter (MHFM) statistic maintains the high-water mark of the HBFM statistic. This statistic is only reset if a node that left the cluster (status=DOWN) rejoins and starts sending heartbeats again. Inactivity Count The number of times the node was considered to be in the DOWN state. Reset Count Number of times the node has gone from a state of DOWN to a state of ACTIVE. If the node rejoins the cluster after being DOWN, the reset counter is incremented by 1 and the MHFM is reset to 0.