Chapter 4. Utilities

Table of Contents

Administering the Replication Group
Listing Group Members
Locating the Current Master
Adding and Removing Nodes from the Group
Restoring Log Files
Backing up a Replicated Application
Converting Existing Environments for Replication

This chapter discusses the APIs that you use to administer and manage your replication group.

Administering the Replication Group

There are a series of administrative activities that an application might want to take relative to a replication group. These activities can be performed by electable nodes in the replication group or by applications that do not have access to a replicated environment (in other words, utilities designed to help administer and monitor the group). All of these functions can be accessed using the ReplicationGroupAdmin class.

You can use the ReplicationGroupAdmin class to:

  1. List replication group members.

  2. Locate the current Master.

  3. Remove nodes from the replication group.

You instantiate an instance of the ReplicationGroupAdmin class by providing it with the name of the replication group that you want to administer, as well as a Set of InetSocketAddress objects. The InetSocketAddress objects are used as a list of helper hosts that the application can use to perform administrative functions. For example:

...

    Set<InetSocketAddress> helpers =
        new HashSet<InetSocketAddress>();
    InetSocketAddress helper1 =
        new InetSocketAddress("node1.example.com", 1550);
    InetSocketAddress helper2 =
        new InetSocketAddress("node2.example.com", 1550);

    helpers.add(helper1);
    helpers.add(helper2);

    ReplicationGroupAdmin rga =
        new ReplicationGroupAdmin("test_rep_group", helpers);   

Listing Group Members

To list all the members of a replication group, use the ReplicationGroupAdmin.getGroup() method. This returns an instance of ReplicationGroup. You then can then:

  1. use the ReplicationGroup.getNodes() method to locate all the nodes in the replication group.

  2. use the ReplicationGroup.getElectableNodes() method to locate all the electable nodes in the replication group.

  3. use ReplicationGroup.getMonitorNodes() to locate all the monitor nodes that currently belong to the replication group.

Note

In order to obtain a ReplicationGroup object, the process must be able to discover the current Master. This means that the helper nodes you provide when you instantiate the ReplicationGroupAdmin class must be reachable and able to identify the current Master. If they cannot, then these methods throw an UnknownMasterException.

All of these methods return a set of ReplicationNode objects, which you can then use to query for node information, such as its name, the InetSocketAddress where the node is located, and the node's type.

For example:

...

    Set<InetSocketAddress> helpers =
        new HashSet<InetSocketAddress>();
    InetSocketAddress helper1 =
        new InetSocketAddress("node1.example.com", 1550);
    InetSocketAddress helper2 =
        new InetSocketAddress("node2.example.com", 1550);

    helpers.add(helper1);
    helpers.add(helper2);

    ReplicationGroupAdmin rga =
        new ReplicationGroupAdmin("test_rep_group", helpers); 

    try {
        ReplicationGroup rg = rga.getGroup();
        for (ReplicationNode rn : rg.getElectableNodes()) {
            // Do something with the replication node.
        }
    } catch (UnknownMasterException ume) {
        // Can't find a master
    }   

Locating the Current Master

You can use the ReplicationGroupAdmin class to locate the current Master in the replication group. This information is available using the ReplicationGroupAdmin.getMasterNodeName() and ReplicationGroupAdmin.getMasterSocket() methods.

ReplicationGroupAdmin.getMasterNodeName() returns a string that holds the node name associated with the Master.

ReplicationGroupAdmin.getMasterSocket() returns an InetSocketAddress class object that represents the host and port where the Master can currently be found.

Both methods will throw an UnknownMasterException if the helper nodes are not able to identify the current Master.

For example:

import java.net.InetSocketAddress;
import java.util.HashSet;
import java.util.Set;

import com.sleepycat.je.rep.UnknownMasterException;
import com.sleepycat.je.rep.util.ReplicationGroupAdmin;

...

    Set<InetSocketAddress> helpers =
        new HashSet<InetSocketAddress>();
    InetSocketAddress helper1 =
        new InetSocketAddress("node1.example.com", 1550);
    InetSocketAddress helper2 =
        new InetSocketAddress("node2.example.com", 1550);

    helpers.add(helper1);
    helpers.add(helper2);

    ReplicationGroupAdmin rga =
        new ReplicationGroupAdmin("test_rep_group", helpers); 

    try {
        InetSocketAddress master = rga.getMasterSocket();
        System.out.println("Master is on host " + 
                    master.getHostName() + " at port " + 
                    master.getPort()); 
        }
    } catch (UnknownMasterException ume) {
        // Can't find a master
    }   

Adding and Removing Nodes from the Group

In order to add nodes to a replication group, you simply start up a node and identify at least one helper node that can identify the current Master to the new node. After the new node has been populated with a current enough copy of the data contained on the Master, the new node is automatically a member of the replication group.

The node's status as a member of the group is persistent. That is, it is a member of the group regardless of whether it is running, and whether other nodes in the group can reach it over the network. This means that for the purposes of elections and message acknowledgements, the node counts toward the total number of nodes that must respond and/or participate in an event.

If, for example, you are using a durability guarantee that requires all nodes in the replication group to acknowledge a transaction commit on the Master, and if a node is down or otherwise unavailable for some reason, then the commit cannot complete on the Master because it will not receive acknowledgements from all the nodes in the replication group.

Similarly, elections for Masters require a bare majority of nodes to participate in the election. If so many nodes are shutdown or unavailable due to a network partition event that a bare majority of nodes cannot be found to hold the election, then your replication group can perform no write activities. This situation persists until at least enough nodes come back online to represent a bare majority of the nodes belonging to the replication group.

For this reason, if you have a node that you intend to shutdown for a long time, then you should remove that node from the replication group. You do this using the ReplicationGroupAdmin.removeMember() method. Note the following rules when using this method:

  • For best results, shutdown the node before removing it.

  • You use the node's name (not the host/port pair) to identify the node you want to remove from the group. If the node name that you specify is unknown to the replication group, a MemberNotFoundException is thrown.

  • Once removed, the node can no longer connect to the Master, nor can it participate in elections. If you want to reconnect the node to the Master (that is, you want to add it back to the replication group), you will have to do so using a different node name than the node was using when it was removed from the group.

  • An active Master cannot be removed from the group. To remove the active Master, either shut it down or wait until it transitions to the Replica state. If you attempt to remove an active Master, a MasterStateException is thrown.

For example:

...

    Set<InetSocketAddress> helpers =
        new HashSet<InetSocketAddress>();
    InetSocketAddress helper1 =
        new InetSocketAddress("node1.example.com", 1550);
    InetSocketAddress helper2 =
        new InetSocketAddress("node2.example.com", 1550);

    helpers.add(helper1);
    helpers.add(helper2);

    ReplicationGroupAdmin rga =
        new ReplicationGroupAdmin("test_rep_group", helpers); 

    try {
        rga.removeMember("NODE3");
    } catch (MemberNotFoundException mnfe) {
        // Specified a node name that is not known to the
        // replication group.
    } catch (MasterStateException mse) {
        // Tried to remove an active Master
    }