Administration and Deployment Guide

     Previous  Next    Open TOC in new window  Open Index in new window  View as PDF - New Window  Get Adobe Reader - New Window
Content starts here

Failover and System Reliability

This section describes features of AquaLogic Enterprise Security that support recovery from failure. It covers the following topics:

 


Understanding Failover

In general, failover is the ability of a product to detect a failure for a particular component and switch to a working replica of that component without losing functionality. ALES support two failover scenarios:

Assuring Runtime Availability for SSMs

ALES security providers depend on data stores for authentication, authorization, and credential mapping. You can configure ALES for failover in these three important cases:

The ALES SSMs have no runtime dependency on the Administration Server.

Figure 7-1 Runtime Availability

Runtime Availability

Assuring Administrative Availability

You can provide failover capability for ALES administration functions by installing redundant Administration Servers: a primary and a secondary. The secondary Administration Server is used only when the primary becomes unavailable.

For example, consider the global deployment illustrated in Figure 7-2. In this case, the enterprise has applications staged on servers in New York, Tokyo and London. The enterprise has deployed redundant ALES Administration Servers in its New York and London data centers as well as a replicated database to store ALES policies and entitlements information. Under normal conditions, administrators interact with the primary Administration Server in New York only. When policies are updated, the Administration Server pushes the changes to all the SSMs in the global environment.

Figure 7-2 Administrative Availability (Working Normally)

Administrative Availability (Working Normally)

Now consider the case when the data center in New York goes down, illustrated in Figure 7-3. The SSMs detect that the primary Administration Server is down and connect to the secondary Administration Server. The secondary Administration Server detects that the primary database is down and connects to the secondary database server (the replica).

Figure 7-3 Administrative Availability (After Failure)

Administrative Availability (After Failure)

One benefit of the ALES architecture is that even if all the Administration Servers go down (either for maintenance or due to failure), including the secondary Administration Servers, there is no impact on the applications in production or on the security services provided by those Security Service Modules and providers that you have configured. You cannot install or enroll new Security Service Modules until the primary Administration Server is running or you have reconfigured the secondary server as the primary. You can only enroll Security Service Modules using a primary Administration Server. When the primary database is available, it will be used by both the primary and secondary Admininstration Servers.

For information on how to configure the Administration Server for failover, see Setting up Administration Servers for Failover.

 


Failover Considerations for the Database Server

Figure 7-3 shows how the logical view of failover functionality when the primary database server fails. The number of redundant database servers you configure can vary; however, a minimum of two is recommended to maintain reliable services. It is up to the system administrator to set up database failover and configure data replication between the database instances.

Because the database server contains all of the configuration and security data used by the Administration Application, to protect your applications and resources, you want to make sure it is highly available and reliable. This can be accomplished by implementing recommendations from your database manufacturer (for example, through the use of clustering architecture or hot standby).

There are two approaches for making sure that two instances of ALES database contain the same data:

Figure 7-4 illustrates the failover mechanism for the ALES Administration Server using Oracle RAC.

Figure 7-4 ALES Administration Servers using Oracle RAC

ALES Administration Servers using Oracle RAC

Common methods of archiving high availability include periodic back-ups, fault tolerant disks, and copying files manually whenever they are changed. This is also the case for any optional external data sources you have configured. A database backup can be used for database recovery in the case of disk failure.

 


Failover Considerations for a Security Service Module

You can use the Administration Console to configure failover support for database-related providers and LDAP authentication providers. Configuration for database-related providers includes the specification of the secondary database and support for LDAP authenticators includes the specification of the secondary LDAP server.

The following providers support configuration of a secondary database:

The ASI Authorization Provider contacts an external process to evaluate its authorization queries. If that process dies, the ASI Authorization provider denies access to all resources. The ASI Authorization provider can be configured to contact the Administration database to retrieve subject attributes and group membership for use in authorization and delegation decisions. If the database connection fails, the provider connects to the configured secondary database. The provider tries to reconnect to the failed database after a configurable time-out. If all database connections fail and defined policies operate on user attributes and group membership, all access is denied.

The following providers support configuration of a secondary LDAP server:

The NT Authenticator already supports multiple domain controllers. The WebLogic Authenticator, WebLogic Authorizer and WebLogic Role Mapper use the internal LDAP server for WebLogic server as its data store. No support for a redundant source is required.

 


Failover Considerations for a Service Control Manager

When the Security Control Manager server starts, it contacts an Administration Server to make sure that it is using the latest version of configuration data. When configuration data is received by the SCM, it is cached locally. When configuration data is modified, the Administration Server pushes the updates to the SCM. Failover for the SCM server is implemented as follows:

  1. You can configure the SCM with addresses for primary and secondary Administration Servers. During installation, you can provide the address for a secondary Administration Server. After installation, you can set the addresses in the SCM_HOME/config/SCM.properties. The domain.asi.primary.pdurl property points to the primary Administration Server and the domain.asi.secondary.pdurl property points to the secondary Administration Server. By default, if you did not provide secondary admin server information during install, both of these properties point to the current Administration Server installed when the SCM was installed.
  2. If no Administration Server is available, the SCM continues to operate using the previously cached set of policies and configuration data. If the SCM is coming up for the first time or does not have a cache then it will stay up and continue looking for an Administration Server to connect to. Once a primary or secondary Administration Server is available, the SCM will get its configuration data and cache it.

 


Setting up Administration Servers for Failover

You can install two Administration Servers: a primary and a secondary. The secondary Administration Server is used for the purpose of failover when the primary becomes unavailable. The order in which the Administration Servers are installed is not important; using the Administration Console, you designate which one is suppose to be the primary and which one is the secondary. See Configure the Secondary Server Trust Synchronization Mechanism.

When an Administration Server is installed, a set of unique certificates is generated for it. Common trusted certificates enable SSMs and SCMs to connect by 2-way SSL. To enable failover, the trust stores of the primary and secondary Administration Servers need to be synchronized and also periodically kept synchronized when additional SSMs and SCMs are enrolled. The following sections describe how to set up and configure Administration Server trust synchronization.

Installing the Secondary Administration Server

The secondary Administration Server must be set up in the same manner as the primary Administration Server. It should be installed on a separate machine from the one on which the primary Administration Server has been installed.

  1. Install the servlet container that will host the Admin web application, WebLogic Server 8.1 or Apache Tomcat.
  2. Run the admin installation program to install the secondary Administration Server.
  3. Enter the following information:
    1. When prompted for the Enterprise Domain, make sure to enter the same domain name that you entered during the primary installation (default is asi).
    2. When prompted for the Secondary Server URL, leave this blank.
    3. When prompted for the Database Configuration, make sure to use the exact same database username that you specified during the primary installation.
    4. The database passwords used by the primary and secondary Administration Servers should be identical to each other.
    5. It is recommended that you use the same passwords you used to install the primary Administration Server; however, you can use instance specific passwords to protect the various sensitive artifacts on the Administration Servers. For example, you may use different key passwords for the CA, Admin, SSM, and SCM identities that you entered in the primary Administration Server installation. The same applies to the Identity, Peer, and Trust key store passwords.
    6. Note: Do not install the database schema at the conclusion of the secondary Administration Server installation process.
  4. Follow the steps described in Initialize the Secondary Server Trust Stores.
  5. Start the secondary Administration Server just as you normally would start a primary Administration Server.
  6. Follow the steps described in Configure the Secondary Server Trust Synchronization Mechanism.

Initialize the Secondary Server Trust Stores

Before starting the secondary Administration Server, you must synchronize the various trust stores used by the secondary Administration Server with those of the primary. If this is not done, the secondary Administration Server will not trust the SSMs and SCMs currently enrolled with the primary Administration Server, and as a result, there can be problems during failover.

To initialize the secondary Administration Server trust stores:

  1. On the secondary Administration Server, create the following directories:
    • ALES_HOME/primary-admin-ssl
    • ALES_HOME/primary-scm-ssl
  2. Copy the /ssl directory from the primary Administration Server to the secondary Administration Server machine into the ALES_HOME/primary-admin-ssl directory.
  3. Copy the /ssl directory from the primary Service Control Manager to the secondary Administration Server machine into the ALES_HOME/primary-scm-ssl directory.
  4. From the /bin directory of the secondary Administration Server installation, execute the initialize_backup_trust.bat (on Windows platforms) or initialize_backup_trust.sh (on UNIX platforms) command. When prompted for the primary Service Control Manager SSL directory, enter the path to the ALES_HOME/primary-scm-ssl directory. Likewise, when prompted for the primary Admin SSL directory, enter the path to the ALES_HOME/primary-admin-ssl directory.

Configure the Secondary Server Trust Synchronization Mechanism

Even though the secondary Administration Server trust stores are synchronized with those of the primary Administration Server when you complete the procedure described in Initialize the Secondary Server Trust Stores,, it is possible for them to become out-of-sync over time. This happens when a new SSM or SCM is enrolled with the primary Administration Server. The trust stores of the primary Administration Server are updated with the new SSM or SCM certificate during enrollment, but since enrollment happens only with the primary Administration server, the secondary Administration Server trust stores do not have the new certificates. A similar trust situation occurs when an SSM or SCM is un-enrolled.

To prevent the trust stores from becoming unsynchronized, the Administration Server has a trust synchronization mechanism that should be enabled on the secondary Administration Server. The trust synchronization mechanism on the secondary Administration Server periodically polls the primary Administration Server for any updates to its trust store, and if a change has occurred, the mechanism updates the secondary Administration Server's trust store with the contents of the primary. It is very important that you enable the trust synchronization mechanism only on the secondary Administration Server.

To configure the secondary Administration Server for trust synchronization:

  1. In the Administration Console, click on Administration Console at the top of the navigation tree and then select the Set Console Preferences page.
  2. Click the Failover tab.
  3. Figure 7-5 Configuring a Backup Admin Server in the Administration Console


    Configuring a Backup Admin Server in the Administration Console

    On the Failover tab, you can configure this Administration Server as either a primary or a secondary (backup) Administration Server. In case of the secondary server, you must specify the parameters that permit the secondary Administration Server to locate the primary server and periodically request a list of trusted entities. This mechanism keeps the trust stores of the primary and secondary Administration Servers synchronized. If this is a primary server, you don't need to do anything except ensuring that the Primary option is checked.

  4. Select Backup.
  5. In the Primary URL text box, enter the URL of the primary Administration Server. This URL is used to synchronize a trust relationship. The URL is the same URL used to access the Administration Console in the primary Administration Server.
  6. In the Username text box, enter the admin username (default is "system").
  7. In the Enter Password and Confirm Password text boxes, enter the password for the admin user.
  8. In the Synchronization interval text box, enter the number of seconds between attempts of trust relationship synchronization. The value for this setting depends on how frequently SSM or SCM instances are enrolled and un-enrolled from the primary Administration Application in your environment.
  9. Click Apply.

  Back to Top       Previous  Next