Chapter 7. Administration

Table of Contents

Hardware
Time Synchronization
Node Configuration
Running Backups
Adding and Removing Nodes
Upgrading a JE Replication Group
Upgrade Process
Things To Remember While Upgrading
Handling Problems While Upgrading
Resetting a Replication Group

This chapter describes issues pertaining to running a JE replication application. The topics discussed here have to do with hardware configuration, backups, node configuration, and other management issues that exist once the application has been placed into production.

Hardware

A JE replicated application should run well on typical commodity multi-core hardware, although greater hardware requirements than this may be driven by the architecture of your particular application. Check with the software developers who wrote your JE replicated application for any additional requirements they may have over and above typical multi-core hardware.

That said, keep the following in mind when putting a JE replication application into production:

  • Examine the hardware you intend to use, and review it for common points of failure between nodes in the replication groups, such as shared power supplies, routers and so forth.

  • The hardware that you use does not have to be identical across the entire production hardware. However, it is important to ensure that the least capable electable node has the resources to function as the Master.

    The Master is typically the node where demand for machine resources is the greatest. It needs to supply the replication streams for each active Replica, in addition to servicing the transaction load.

    Note that JE requires Monitor nodes to have minimal resource consumption (although, again, your application developers may have written your Monitor nodes such that they need resources over and above what JE requires), because Monitor nodes only listen for changes in the replication group.

  • Finally, your network is a critical part of your hardware requirements. It is critical that your network be capable of delivering adequate throughput under peak expected production work loads.

    Remember that your replicated application can consume quite a lot of network resources when a Replica starts up for the first time, or starts up after being shutdown for a long time. This is because the Replica must obtain all the data that it needs to operate. Essentially, this is a duplicate of the data contained by the Master node. So however much data the Master node holds, that much data will be transmitted across your network per node every time you start a new node.

    For restarting nodes, the amount of data that will cross your network is equal to the delta between the time the Replica last shutdown and the state of your Master node at the time that the Replica is starting up again. If the Replica has been down for a long time (days or weeks), this can be quite a lot of data, depending on your Master node's workload.

    Be aware, however, that restarting nodes do not have to get their data from the Master node. It is possible for them to catch up, or nearly catch up, using data obtained from some other currently running Replica. See Restoring Log Files for more information.

    Good application performance also depends on the latency of network connections used by electable and monitor nodes to perform elections, report election results, and obtain acknowledgments. Consider deploying secondary nodes on machines with higher latency connections to the other members of the replication group, keeping in mind that these nodes still have the same throughput requirements as electable nodes.