Building the communications infrastructure

Replication Manager provides a built-in communications infrastructure that uses IPv6 whenever possible but also supports IPv4. When multiple addresses are defined for a site, Replication Manager attempts connections first on any IPv6 addresses and then on any IPv4 addresses until one succeeds. Replication Manager relies on platform configuration and defaults to govern use of IPv4-mapped IPv6 addresses in cases where one site is using IPv6 and the other site is using IPv4. If additional control over connections is required, a Replication Manager application can use the DB_ENV->repmgr_set_socket() method to specify a socket callback that determines whether a particular socket should be used in a connection attempt.

Base API applications must provide their own communications infrastructure, which is typically written with one or more threads of control looping on one or more communication channels, receiving and sending messages. These threads accept messages from remote environments for the local database environment, and accept messages from the local environment for remote environments. Messages from remote environments are passed to the local database environment using the DB_ENV->rep_process_message() method. Messages from the local environment are passed to the application for transmission using the callback function specified to the DB_ENV->rep_set_transport() method.

Processes establish communication channels by calling the DB_ENV->rep_set_transport() method, regardless of whether they are running in client or server environments. This method specifies the send function, a callback function used by Berkeley DB for sending messages to other database environments in the replication group. The send function takes an environment ID and two opaque data objects. It is the responsibility of the send function to transmit the information in the two data objects to the database environment corresponding to the ID, with the receiving application then calling the DB_ENV->rep_process_message() method to process the message.

The details of the transport mechanism are left entirely to the application; the only requirement is that the data buffer and size of each of the control and rec DBTs passed to the send function on the sending site be faithfully copied and delivered to the receiving site by means of a call to DB_ENV->rep_process_message() with corresponding arguments. Messages that are broadcast (whether by broadcast media or when directed by setting the DB_ENV->rep_set_transport() method's envid parameter DB_EID_BROADCAST), should not be processed by the message sender. In all cases, the application's transport media or software must ensure that DB_ENV->rep_process_message() is never called with a message intended for a different database environment or a broadcast message sent from the same environment on which DB_ENV->rep_process_message() will be called. The DB_ENV->rep_process_message() method is free-threaded; it is safe to deliver any number of messages simultaneously, and from any arbitrary thread or process in the Berkeley DB environment.

There are a number of informational returns from the DB_ENV->rep_process_message() method:


When DB_ENV->rep_process_message() returns DB_REP_DUPMASTER, it means that another database environment in the replication group also believes itself to be the master. The application should complete all active transactions, close all open database handles, reconfigure itself as a client using the DB_ENV->rep_start() method, and then call for an election by calling the DB_ENV->rep_elect() method.


When DB_ENV->rep_process_message() returns DB_REP_HOLDELECTION, it means that another database environment in the replication group has called for an election. The application should call the DB_ENV->rep_elect() method.


When DB_ENV->rep_process_message() returns DB_REP_IGNORE, it means that this message cannot be processed. This is normally an indication that this message is irrelevant to the current replication state, such as a message from an old master that arrived late.


When DB_ENV->rep_process_message() returns DB_REP_ISPERM, it means a permanent record, perhaps a message previously returned as DB_REP_NOTPERM, was successfully written to disk. This record may have filled a gap in the log record that allowed additional records to be written. The ret_lsnp contains the maximum LSN of the permanent records written.


When DB_ENV->rep_process_message() returns DB_REP_NEWSITE, it means that a message from a previously unknown member of the replication group has been received. The application should reconfigure itself as necessary so it is able to send messages to this site.


When DB_ENV->rep_process_message() returns DB_REP_NOTPERM, it means a message marked as DB_REP_PERMANENT was processed successfully but was not written to disk. This is normally an indication that one or more messages, which should have arrived before this message, have not yet arrived. This operation will be written to disk when the missing messages arrive. The ret_lsnp argument will contain the LSN of this record. The application should take whatever action is deemed necessary to retain its recoverability characteristics.