Java Dynamic Management Kit 5.0 Tutorial

Heartbeat Mechanism

The heartbeat mechanism monitors the connection between a manager and an agent and automates the cleanup procedure when the connection is lost. This enables both the manager and the agent to release resources that were allocated for maintaining the connection.

The mechanism is entirely contained in the connector client and connector server components, no additional objects are involved. In addition, connector clients send notifications that the manager application can receive to be aware of changes in the status of a connection.

All connector clients of the Java DMK implement the HeartBeatClientHandler interface to provide a heartbeat mechanism. This means that agent-manager connections over RMI, HTTP, and HTTPS can be monitored and controlled in the same way. A manager application could even use the same notification handler for all connector clients where the heartbeat mechanism is activated.

Configuring the Heartbeat

To monitor the connection, the connector client sends periodic heartbeats (ping requests) to the connector server that acknowledges them by sending a reply (ping responses). If either heartbeat is lost, the components of the connector retry until either the connection is reestablished or the number of retries has been exhausted.

In a connector client, the methods specified by the HeartBeatClientHandler interface set the heartbeat period and the number of retries that are attempted. You should determine these parameters empirically to implement the desired connection monitoring behavior, taking into account the network conditions and topology between the hosts of your manager and agent applications.

In Example 10–7, the management application configures the heartbeat mechanism before the connection to an agent is established.


Example 10–7 Configuring the Heartbeat in the Connector Client

// CREATE a new RMI connector client
//
echo("\tInstantiate the RMI connector client...");
connectorClient = new RmiConnectorClient();

// SET heartbeat period to 1 sec. Default value is 10 secs
//
echo("\tSet heartbeat period to 1 second...");
connectorClient.setHeartBeatPeriod(1000);

// SET heartbeat number of retries to 2. Default value is 6 times
//
echo("\tSet heartbeat number of retries to 2 times...");
connectorClient.setHeartBeatRetries(2);

Using the same methods, the heartbeat configuration can also be modified at any time, even after the connection has been established. By default, the heartbeat mechanism is activated in a connector with a 10 second heartbeat and 6 retries, meaning that a connection that cannot be reestablished within one minute is assumed to be lost.

Setting the number of heartbeat retries to zero causes a lost connection to be signalled immediately after the heartbeat fails. Setting the heartbeat period to zero deactivates the mechanism and prevents any further connection failures from being detected.

No specific configuration is necessary on the agent-side connector server. It automatically responds to the heartbeat messages. These heartbeat messages contain the current heartbeat settings from the connector client that also configure the connector server. In this way, both client and server apply the same retry policy, and when the configuration is updated in the connector client, it is immediately reflected in the connector server. The connector server can handle multiple connections from different management applications, each with its specific heartbeat configuration.

The connector server applies its retry policy when the next expected heartbeat message is not received within the heartbeat period. From that moment, the connector server begins a timeout period that lasts 20% longer than the number of retries times the heartbeat period. This corresponds to the time during which the connector client attempts to resend the heartbeat, with a safety margin to allow for communication delays. If no further heartbeat is received in that timeout, the connection is determined to be lost.

The heartbeat ping messages also contain a connection identifier so that connections are not erroneously reestablished. If a connector server is stopped, thereby closing all connections, and then restarted between two heartbeats or before the client's timeout period has elapsed, the server responds to heartbeats from a previous connection. However, the connector client detects that the identifier has changed and immediately declares that the connection is lost, regardless of the number of remaining retries.

During the timeout period, the notification push mechanism in the connector server is suspended to avoid losing notifications (see Chapter 12, Notification Forwarding). Similarly, while the connector client is retrying the heartbeat, it must suspend the notification pull mechanism if it is in effect.

When a connection is determined to be lost, both the connector client and server free any resources that were allocated for maintaining the connection. For example, the connector server unregisters all local listeners and deletes the notification cache needed to forward notifications. Both components also return to a coherent, functional state, ready to establish or accept another connection.

The state of both components after a connection is lost is identical to the state that is reached after the disconnect method of the connector client is called. In fact, the disconnect method is called internally by the connector client when a connection is determined to be lost, and the equivalent, internal method is called in the connector server when its timeout elapses without recovering a heartbeat.

Receiving Heartbeat Notifications

The connector client also sends notifications that signal any changes in the state of the connection. These notifications are instances of the HeartBeatNotification class. The HeartBeatClientHandler interface includes methods specifically for registering for heartbeat notifications. These methods are distinct from those of the NotificationRegistration interface that a connector client implements for transmitting agent-side notifications (see Registering Manager-Side Listeners).


Example 10–8 Registering for Heartbeat Notifications

// Register this manager as a listener for heartbeat notifications
// (the filter and handback objects are not used in this example)
//
echo("\tAdd this manager as a listener for heartbeat notifications...");
connectorClient.addHeartBeatNotificationListener(this, null, null);

Instances of heartbeat notifications contain the connector address object from the connection that generated the event. This enables the notification handler to listen to any number of connectors and retrieve all relevant information about a specific connection when it triggers a notification. The HeartBeatNotification class defines constants to identify the possible notification type strings for heartbeat events:

Once they are established, connections can go through any number of retrying-reestablished cycles and then be terminated by the user or determined to be lost and terminated automatically. When the heartbeat mechanism is deactivated by setting the heartbeat period to zero, only heartbeat notifications for normally established and normally terminated connections continue to be sent. In that case, connections can be lost but they are not detected nor indicated by a notification.

The following diagram shows the possible sequence of heartbeat notifications during a connection. Retries are enabled when the getHeartBeatRetries method returns an integer greater than zero.

Figure 10–1 Sequencing of Heartbeat Notifications

Diagram showing sequencing of heartbeat notifications

Example 10–9 shows the source code for the notification handler method in our manager class. The handler prints out the notification type and the RMI address associated with the connector that generated the notification:


Example 10–9 Heartbeat Notification Handler

public void handleNotification(Notification notification, Object handback){

    echo("\n>>> Notification has been received...");
    echo("\tNotification type = " + notification.getType());

    if (notification instanceof HeartBeatNotification) {
        ConnectorAddress notif_address =
            ((HeartBeatNotification)notification).getConnectorAddress();

        if (notif_address instanceof RmiConnectorAddress) {
            RmiConnectorAddress rmi_address =
                (RmiConnectorAddress) notif_address;

            echo("\tNotification connector address:");
            echo("\t\tTYPE   = " + rmi_address.getConnectorType());
            echo("\t\tHOST   = " + rmi_address.getHost());
            echo("\t\tPORT   = " + rmi_address.getPort());
            echo("\t\tSERVER = " + rmi_address.getName());
	        }
    }
}

In the agent application, the connector server does not emit any notifications about the state of its connections. The HTTP protocol-based connectors do provide a count of active clients, but there is no direct access to heartbeat information in an agent's connector servers.

Running the Heartbeat Example

The examplesDir/HeartBeat directory contains all of the files for the Agent and Client applications that demonstrate the heartbeat mechanism through an RMI connector.

To Run the Heartbeat Example: Normal Termination
  1. Compile all files in this directory with the javac command.

    For example, on the Solaris platform with the Korn shell, type:


    $ cd examplesDir/HeartBeat/
    $ javac -classpath classpath *.java
    

    To demonstrate the various communication scenarios, we will run the example three times: once to see a normal termination, once to see how the manager reacts to a lost connection, and once to see how the agent reacts to a lost connection.

  2. Start the agent on another host or in another terminal window with the following command:


    $ java -classpath classpath Agent
    

    The agent only creates the RMI connector server to which the client application will establish a connection, and then it waits for management operations.

  3. Wait for the agent to be completely initialized, then start the manager with the following command, where hostname is the name of the host running the agent.

    The RMI connector in this example uses port 1099 by default. If you started the agent on the same host, you can omit the hostname and the port number:


    $ java -classpath classpath Client hostname 1099
    

    The client application creates the RMI connector client, configures its heartbeat, and registers a notification listener, as seen in the code examples. When the connection is established, the listener outputs the notification information in the terminal window.

  4. Press Enter in the manager window to call the disconnect method on the connector client and stop the Client application.

    In the terminal window, the heartbeat notification listener outputs the information for the normal connection termination before the application ends.

  5. Leave the agent application running for the next scenario.

To Run the Heartbeat Example: Connector Client Reaction
  1. Start the Client application again:


    $ java -classpath classpath Client hostname 1099
    
  2. When the connection is established, press Control-C in the agent's window to stop the connector server and the agent application.

    This simulates a broken communication channel as seen by the connector client.

    Less than a second later, when the next heartbeat fails, the heartbeat retrying notification is displayed in the manager's terminal window. Two seconds later, after both retries have failed, the lost connection and terminated connection notifications are displayed.

  3. Press Enter in the manager window to exit the Client application.

To Run the Heartbeat Example: Connector Server Reaction
  1. Start the agent in debug mode on another host or in another terminal window with the following command:


    $ java -classpath classpath -DLEVEL_DEBUG Agent
    
  2. Wait for the agent to be completely initialized, then start the Client application again:


    $ java -classpath classpath Client hostname 1099
    

    When the connection is established, the periodic heartbeat messages are displayed in the debug output of the agent.

  3. This time, press Control-C in the client's window to stop the connector client and the manager application.

    This simulates a broken communication channel as seen by the connector server in the agent.

    After the heartbeat retry timeout elapses in the agent, the lost connection message is displayed in the heartbeat debugging output of the agent.

  4. Type Control-C in the agent window to stop the agent application and end the example.