11 Configuring a Failover Processing Engine System

This chapter describes the procedure for configuring a failover Processing Engine system that will immediately take over processing of network traffic in the event that the primary Processing Engine system becomes unavailable. Note that the described procedure assumes that the primary Processing Engine system has been installed, configured, and is fully operational.

The procedure to configure failover Reporter and Collector systems is described in Section 9, "Configuring a Failover Reporter System" and Section 10, "Configuring a Failover Collector System".

11.1 Introduction to Failover Processing Engine Systems

The configuration of a secondary (or failover) Processing Engine system offers the advantage that it can seamlessly take over processing of monitored traffic in the event that the primary Processing Engine system becomes unavailable. In this way, a high level of operational reliability is achieved. The configuration of a failover Processing Engine system is shown in Figure 11-1.

Figure 11-1 Failover Processing Engine Configuration

Description of Figure 11-1 follows
Description of ''Figure 11-1 Failover Processing Engine Configuration''

At server level, a crossover cable connects the primary and secondary Processing Engine systems. As long as a regular "heartbeat" continues between the primary and secondary servers, the secondary server will not initiate processing of traffic. However, the secondary server will immediately take over the processing task of the primary server as soon as it detects an alteration in the "heartbeat" of the primary server. This process is referred to as failover.

Note that failback (that is, the process of restoring the RUEI installation to its original state), must be performed manually. The procedure is described in Section 11.5, "Instigating Processing Engine Failback".

Prerequisites

In order to configure a failover Processing Engine installation, the following conditions must be met:

  • The primary and secondary Processing Engine systems must be directly connected via a crossover cable. In addition, both systems must also be connected to a local or public network to in order to connect to the Reporter, remote Collector, and database systems.

  • The database and Collector instances used by the RUEI installation must both be remote.

  • The primary and secondary Processing Engine systems must share the same storage (such as SAN or NFS). In particular, the RUEI_DATA/processor/data and RUEI_DATA/processor/data/sslkeys directories.

11.2 Preparing the Primary Processing Engine

Make the RUEI_DATA/processor/data and RUEI_DATA/processor/sslkeys directories available on a shared storage location.

  1. Stop all processing on the primary Processing Engine system by issuing the following command as the RUEI_USER user:

    project -stop
    
  2. Mount the shared Processing Engine location on the primary Processing Engine system. To do so, edit the /etc/fstab file so that it is mounted at boot. For example:

    10.6.5.9:/home/nfs /processing_share nfs rsize=1024,wsize=1024  0 0
    
  3. Move the existing data and sslkey directories to the shared Processing Engine location. For example:

    mv RUEI_DATA/processor/data /processing_share
    mv RUEI_DATA/processor/sslkeys /processing_share
    

    where processing_share specifies the shared location for data and SSL keys on the primary and secondary Processing Engine systems.

11.3 Installing the Secondary Processing Engine

The installation procedure for a secondary Processing Engine system is almost identical to that of a standalone Processing Engine system. Note that Initial Setup Wizard should not be run. Do the following:

  1. When starting the installation procedure for the secondary Processing Engine system, ensure that the /etc/ruei.conf file is identical to that of the primary Processing Engine system.

  2. Install the Linux operating system and Processing Engine software on the secondary Processing Engine system. The procedure to do this is described in Chapter 2, "Installing the RUEI Software". Specifically:

11.4 Configuring Processing Engine Failover

Do the following:

  1. If you have not already done so, login to the primary Processing Engine system as the RUEI_USER user, and issue the following command to stop all processing of monitored traffic:

    project -stop 
    
  2. Copy the .ssh directory of the RUEI_USER user on the primary Processing Engine system, created while performing the procedure described in Section 2.13, "Configuring Reporter Communication (Split-Server Setup Only)", to the secondary Processing Engine system. Note that it must be copied to the same location.

  3. Ensure that the uid and gid settings of the RUEI_USER user are the same on both the primary and secondary Processing Engine systems. For example:

    id moniforce
    uid=501(moniforce) gid=502(moniforce) groups=502(moniforce)
    
  4. Configure the static IP addresses on both Processing Engine systems used for the crossover cable. This can be done using a utility such as system-config-network.

  5. Edit the /etc/fstab file so the RUEI_DATA/processor/data and RUEI_DATA/processor/sslkeys directories are mounted at boot. For example:

    10.6.5.9:/home/nfs /reporter_share nfs rsize=1024,wsize=1024  0 0
    

    where reporter_share specifies the shared location for data and SSL keys on the primary and secondary Processing Engine systems.

  6. Move the local data and sslkeys directories for the secondary Processing Engine system to the shared Processing Engine location by issuing the following commands:

    rm -rf RUEI_DATA/processor/data
    rm -rf RUEI_DATA/processor/sslkeys
    ln -s /reporter_share/data RUEI_DATA/processor/data 
    ln -s /reporter_share/sslkeys RUEI_DATA/processor/sslkeys 
    
  7. Login to the secondary Processing Engine system as the RUEI_USER user, and issue the following command:

    project -new -fromdb UX 
    

    This creates the secondary Processing Engine's on-disk configuration files using the primary Processing Engine's database configuration.

  8. Edit the /etc/ruei.conf file on both the primary and secondary Processing Engines to specify the virtual, primary, and standby IP addresses. For example:

    export RUEI_REP_FAILOVER_PRIMARY_IP=192.168.56.201 
    export RUEI_REP_FAILOVER_STANDBY_IP=192.168.56.202 
    export RUEI_REP_FAILOVER_VIRTUAL_IP=10.11.12.23 
    export RUEI_REP_FAILOVER_VIRTUAL_DEV=eth0 
    export RUEI_REP_FAILOVER_VIRTUAL_MASK=255.255.255.0 
    

    THE RUEI_REP_FAILOVER_PRIMARY_IP and RUEI_REP_FAILOVER_STANDBY_IP settings should specify the IP addresses of the crossover cable between the two Processing Engine systems. See Section 2.4.1, "Check The RUEI Configuration File" for an explanation of these settings. Note that the settings specified on both Processing Engine systems must be identical except for the RUEI_REP_FAILOVER_VIRTUAL_DEV setting.

  9. Issue the following command to restart processing of monitored traffic on the primary Processing Engine system:

    project -start 
    
  10. Install the ruei-reporter-failover.sh script on both Processing Engine systems. For example, in the /usr/local/sbin directory. It is located in the RUEI zip file (see Section 2.3, "Unpacking the RUEI Software").

  11. Add the following entry to the root user's crontab file of both the primary and secondary Processing Engine systems:

    * * * * * /usr/local/sbin/ruei-reporter-failover.sh
    

    This causes the secondary Processing Engine to send a heartbeat signal to the primary Processing Engine every 60 seconds, and take over processing of RUEI monitored traffic in the event that the Primary Processing Engine becomes unavailable.

    Wait at least 60 seconds.

  12. Ensure that all user access to the Reporter GUI is via the specified virtual IP address. This is necessary to ensure automatic failover to the secondary Processing Engine system in the event that the primary Processing Engine system becomes unavailable.

  13. Check the RUEI_DATA/processor/log/failover.log file on both Processing Engine systems. These files contain the results of the "ping" commands. Ensure that there are no error messages. For example, about unspecified failover configuration settings.

  14. Check the output of the /sbin/ifconfig command on the primary Processing Engine to ensure that the virtual IP address has been correctly configured. For example:

    /sbin/ifconfig
    eth0      Link encap:Ethernet  HWaddr 08:00:27:F7:B0:14
              inet addr:192.168.56.201  Bcast:192.168.56.255  Mask:255.255.255.0
              inet6 addr: fe80::a00:27ff:fef7:b014/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:80 errors:0 dropped:0 overruns:0 frame:0
              TX packets:311 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:12793 (12.4 KiB)  TX bytes:26268 (25.6 KiB)
    
    eth0:0    Link encap:Ethernet  HWaddr 08:00:27:F7:B0:14
              inet addr:10.11.12.23  Bcast:192.168.56.255  Mask:255.255.255.0
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
    
  15. Shutdown the primary Processing Engine system, and verify that the secondary Processing Engine begins processing monitored traffic. A warning that the primary system is unreachable and that the secondary system is being activated is reported in the Event log. Note that after doing so, you must perform a failback to return your RUEI installation to its original state.

11.5 Instigating Processing Engine Failback

Failback to the primary Processing Engine system must be performed manually in order to return your RUEI installation to its original state. Do the following:

  1. Load your global RUEI configuration settings using the following command as the root user:

    . /etc/ruei.conf
    
  2. Ensure that the heartbeat mechanism between the primary and secondary Processing Engine systems is functioning correctly. To do so, verify that they can 'ping' each other on the RUEI_REP_FAILOVER_PRIMARY_IP and RUEI_REP_FAILOVER_STANDBY_IP IP addresses.

  3. To instigate the fallback, remove the active-failover-server file, and shutdown the virtual interface on the secondary server by issuing the following commands:

    rm $RUEI_DATA/processor/data/active-failover-server
    ifconfig $RUEI_REP_FAILOVER_VIRTUAL_DEV:0 down