Java Dynamic Management Kit 5.0 Tutorial

Configuring the Heartbeat

To monitor the connection, the connector client sends periodic heartbeats (ping requests) to the connector server that acknowledges them by sending a reply (ping responses). If either heartbeat is lost, the components of the connector retry until either the connection is reestablished or the number of retries has been exhausted.

In a connector client, the methods specified by the HeartBeatClientHandler interface set the heartbeat period and the number of retries that are attempted. You should determine these parameters empirically to implement the desired connection monitoring behavior, taking into account the network conditions and topology between the hosts of your manager and agent applications.

In Example 10–7, the management application configures the heartbeat mechanism before the connection to an agent is established.

Example 10–7 Configuring the Heartbeat in the Connector Client

// CREATE a new RMI connector client
//
echo("\tInstantiate the RMI connector client...");
connectorClient = new RmiConnectorClient();

// SET heartbeat period to 1 sec. Default value is 10 secs
//
echo("\tSet heartbeat period to 1 second...");
connectorClient.setHeartBeatPeriod(1000);

// SET heartbeat number of retries to 2. Default value is 6 times
//
echo("\tSet heartbeat number of retries to 2 times...");
connectorClient.setHeartBeatRetries(2);

Using the same methods, the heartbeat configuration can also be modified at any time, even after the connection has been established. By default, the heartbeat mechanism is activated in a connector with a 10 second heartbeat and 6 retries, meaning that a connection that cannot be reestablished within one minute is assumed to be lost.

Setting the number of heartbeat retries to zero causes a lost connection to be signalled immediately after the heartbeat fails. Setting the heartbeat period to zero deactivates the mechanism and prevents any further connection failures from being detected.

No specific configuration is necessary on the agent-side connector server. It automatically responds to the heartbeat messages. These heartbeat messages contain the current heartbeat settings from the connector client that also configure the connector server. In this way, both client and server apply the same retry policy, and when the configuration is updated in the connector client, it is immediately reflected in the connector server. The connector server can handle multiple connections from different management applications, each with its specific heartbeat configuration.

The connector server applies its retry policy when the next expected heartbeat message is not received within the heartbeat period. From that moment, the connector server begins a timeout period that lasts 20% longer than the number of retries times the heartbeat period. This corresponds to the time during which the connector client attempts to resend the heartbeat, with a safety margin to allow for communication delays. If no further heartbeat is received in that timeout, the connection is determined to be lost.

The heartbeat ping messages also contain a connection identifier so that connections are not erroneously reestablished. If a connector server is stopped, thereby closing all connections, and then restarted between two heartbeats or before the client's timeout period has elapsed, the server responds to heartbeats from a previous connection. However, the connector client detects that the identifier has changed and immediately declares that the connection is lost, regardless of the number of remaining retries.

During the timeout period, the notification push mechanism in the connector server is suspended to avoid losing notifications (see Chapter 12, Notification Forwarding). Similarly, while the connector client is retrying the heartbeat, it must suspend the notification pull mechanism if it is in effect.

When a connection is determined to be lost, both the connector client and server free any resources that were allocated for maintaining the connection. For example, the connector server unregisters all local listeners and deletes the notification cache needed to forward notifications. Both components also return to a coherent, functional state, ready to establish or accept another connection.

The state of both components after a connection is lost is identical to the state that is reached after the disconnect method of the connector client is called. In fact, the disconnect method is called internally by the connector client when a connection is determined to be lost, and the equivalent, internal method is called in the connector server when its timeout elapses without recovering a heartbeat.