Sun OpenSSO Enterprise 8.0 Deployment Planning Guide

Analyzing the Deployment Architecture

Using the OpenSSO Enterprise embedded configuration data store can lower response time and ensure service availability when machine failure occurs. You can deploy multiple OpenSSO instances to serve as a single system, and their corresponding embedded configuration data store instances will be automatically configured in data replication mode. Each embedded configuration data store instance in the system will contain the same set of data. Any update request in a single instance will be replayed in all other instances in the system. By using the simplest architecture, the embedded configuration data store replication model uses multi-master (peer-to- peer) network structure.

Single-Server and Multiple-Servers Modes

The following figure illustrates OpenSSO Enterprise deployed with the embedded configuration data store in single-server mode.

Figure 15–1 Single-Server Mode

Embedded OpenDS configuration data store in single-server
mode.

Under multiple-servers mode, every OpenSSO Enterprise instance works with its own embedded configuration data store instance under the same memory space in the web container. The embedded configuration data store replication mechanism uses the custom replication protocol to maintain the data consistency between directory service instances.

The following figure illustrates OpenSSO Enterprise deployed with the embedded configuration data store in multiple-servers mode.

Figure 15–2 Multiple-Servers Mode

OpenSSO Enterprise deployed with embedded configuration
data store in multiple-servers mode.

Replication Structure

Replication is entirely handled by OpenSSO Enterprise. The OpenSSO Enterprise embedded configuration data store replication model supports multi-master network architecture. The embedded configuration data store separates actual data from replication metadata. In this model, the server that stores the configuration data is called the directory server. The server that stores the replication metadata is called the replication server.

Even the smallest deployment must include two replication server instances, to ensure availability when the replication server instances fails. Replication servers perform the following functions:

Manage connections from directory servers
Connect to other replication servers
Listen for connections from other replication servers
Receive changes from directory servers
Forward changes to directory servers and to other replication servers
Save changes to stable storage and trimming older operations

Each replication server contains a list of all the other replication servers in the replication topology. Replication servers are also responsible for providing other servers information with information about the replication topology.

Directory servers perform the following functions:

Receiving read and write requests from client applications
Forwarding changes to specific replication servers

Each directory server contains a list of the suffix DNs to be synchronized. For each suffix DN to be synchronized, each directory server contains a list of replication servers to connect to. When a change is made on a directory server, that directory server forwards the change to the local replication server. The replication server then relays the change to other replication servers in the topology, which in turn relay the change to all other directory servers in the topology.

Applications should typically perform reads and writes on the same directory server instance. This reduces the likelihood of consistency problems due to replication.

Every replication server instance maintains a message queue which is used to store pending changes. When one of the directory servers is down, all the changes applied to other servers will be stored in the corresponding message queue in the server instance which receives the requests. Once the directory server instance is back online, the replication servers relay all the changes to maintain data consistency. However, the size of the message queue and purge delay time are limited. By default, the size of the message queue is 10000 changes. The purge delay time is 24 hours. If one of the servers is down longer than the purge delay time, or if the changes applied to a particular directory server exceeds the size of message queue, the replication system will lose synchronization.

You can change the value of the purge delay and the size of message queue by adding the entries ds-cfg-replication-purge-delay and ds-cfg-queue-size attributes to the file config.ldif. The config.ldif file is under the directory OpenSSO base directory/opends/config directory. The unit of ds-cfg-replication-purge-delay is seconds, and the unit of ds-cfg-queue-size is integer. Once the embedded configuration data store instance loses synchronization, the only way to bring the system back to synchronization is to reconfigure OpenSSO Enterprise with the embedded configuration data.

To determine whether embedded configuration data store instances are synchronized, OpenSSO Enterprise CLI tools ssoadm provides a command embedded-status to check the status of embedded configuration data store instances. SeeChapter 1, ssoadm Command Line Interface Reference, in Sun OpenSSO Enterprise 8.0 Administration Reference Alternatively, you can check the embedded configuration data store logs when you suspect a problem with configuration data store inconsistencies. The logs are under the directory OpenSSO base directory/opends/logs. Current OpenSSO Enterprise embedded configuration data store replication implementation is recommended for use with the server instances located within the same geographical region.

Summary of Actual Replication Test Results

Replication tests were run using up to four instances of the OpenSSO Enterprise embedded configuration data store with Tomcat and GlassFish. The results show that replication was successful among the four instances using 8000 policies. The following is a summary of the test results:

When all four instances are online, the delay of synchronization of replication is generally less than one second, which is negligible.
The time required to load the same amount of data in an embedded configuration data store instance is longer when replication is enabled whether or not all instances are online.
The time required to load the data is incremental. The time required to load the second 1000 entries is longer that the time required to load the first 1000 entries.
Data loading time can be significantly reduced by breaking up the data and loading data at the same time using multiple instances.
With the same amount of memory heap size (2GB), Tomcat performs better than GlassFish v2 for smaller settings (one or two instances). GlassFish performs better in larger settings (four instances).