9 Configuring a Failover Reporter System

This chapter describes the procedure for configuring a failover Reporter system that will immediately take over processing of network traffic in the event that the primary Reporter system becomes unavailable. Note that the described procedure assumes that the primary Reporter system has been installed, configured, and is fully operational. Note that the installation procedure for a primary Reporter is identical to that of a standalone Reporter. The procedure to configure a failover Collector system is described in Chapter 10, "Configuring a Failover Collector System".

9.1 Introduction to Failover Reporter Systems

The configuration of a secondary (or failover) Reporter system offers the advantage that it can seamlessly take over processing of monitored traffic in the event that the primary Reporter system becomes unavailable. In this way, a high level of operational reliability is achieved. The configuration of a failover Reporter system is shown in Figure 9-1.

Figure 9-1 Failover Reporter Configuration

Description of Figure 9-1 follows
Description of ''Figure 9-1 Failover Reporter Configuration''

At server level, a crossover cable connects the primary and secondary Reporter systems. As long as a regular "heartbeat" continues between the primary and secondary servers, the secondary server will not initiate processing of traffic. However, the secondary server will immediately take over the processing task of the primary server as soon as it detects an alteration in the "heartbeat" of the primary server. This process is referred to as failover.

Note that failback (that is, the process of restoring the RUEI installation to its original state), must be performed manually. The procedure is described in Section 9.5, "Initiating Reporter Failback".

Prerequisites

In order to configure a failover Reporter installation, the following conditions must be met:

  • The primary and secondary Reporter systems must be directly connected via a crossover cable. In addition, both systems must also be connected to a local or public network to in order to connect to the remote Collector, Processing Engine, and database systems.

  • The database and Collector instances used by the RUEI installation must both be remote.

  • The primary and secondary Reporter systems must share the same storage (such as SAN or NFS). In particular, the RUEI_DATA/processor/data and RUEI_DATA/processor/data/sslkeys directories.

9.2 Preparing the Primary Reporter

Make the RUEI_DATA/processor/data and RUEI_DATA/processor/sslkeys directories available on a shared storage location.

  1. Stop all processing on the primary Reporter system by issuing the following command as the RUEI_USER user:

    project -stop
    
  2. Mount the shared Reporter location on the primary Reporter system. To do so, edit the /etc/fstab file so that it is mounted at boot. For example:

    10.6.5.9:/home/nfs /reporter_share nfs rsize=1024,wsize=1024  0 0
    
  3. Move the existing data and sslkey directories to the shared Reporter location. For example:

    mv RUEI_DATA/processor/data /reporter_share
    mv RUEI_DATA/processor/sslkeys /reporter_share
    

    where reporter_share specifies the shared location for data and SSL keys on the primary and secondary Reporter systems.

9.3 Installing the Secondary Reporter

The installation procedure for a secondary Reporter system is almost identical to that of a standalone Reporter system. Note that Initial Setup Wizard should not be run. Do the following:

  1. When starting the installation procedure for the secondary Reporter system, ensure that the /etc/ruei.conf file is identical to that of the primary Reporter system.

  2. Install the Linux operating system and RUEI Reporter software on the secondary Reporter system. The procedure to do this is described in Chapter 6, "Configuring RUEI". Specifically:

9.4 Configuring Reporter Failover

Do the following:

  1. If you have not already done so, login to the primary Reporter system as the RUEI_USER user, and issue the following command to stop all processing of monitored traffic:

    project -stop
    
  2. Copy the .ssh directory of the RUEI_USER user on the primary Reporter system, created while performing the procedure described in Section 2.13, "Configuring Reporter Communication (Split-Server Setup Only)", to the secondary Reporter system. Note that it must be copied to the same location.

  3. Ensure that the uid and gid settings of the RUEI_USER user are the same on both the primary and secondary Reporter systems. For example:

    id moniforce
    uid=501(moniforce) gid=502(moniforce) groups=502(moniforce)
    
  4. Configure the static IP addresses on both Reporter systems used for the crossover cable. This can be done using a utility such as system-config-network.

  5. Edit the /etc/fstab file so the RUEI_DATA/processor/data and RUEI_DATA/processor/sslkeys directories are mounted at boot. For example:

    10.6.5.9:/home/nfs /reporter_share nfs rsize=1024,wsize=1024  0 0
    

    where reporter_share specifies the shared location for data and SSL keys on the primary and secondary Reporter systems.

  6. Move the local data and sslkeys directories for the secondary Reporter system to the shared Reporter location by issuing the following commands:

    rm -rf RUEI_DATA/processor/data
    rm -rf RUEI_DATA/processor/sslkeys
    ln -s /reporter_share/data RUEI_DATA/processor/data 
    ln -s /reporter_share/sslkeys RUEI_DATA/processor/sslkeys 
    
  7. Login to the secondary Reporter system as the RUEI_USER user, and issue the following command:

    project -new -fromdb UX
    

    This creates the secondary Reporter's on-disk configuration files using the primary Reporter's database configuration.

  8. Edit the /etc/ruei.conf file on both the primary and secondary Reporters to specify the virtual, primary, and standby IP addresses. For example:

    export RUEI_REP_FAILOVER_PRIMARY_IP=192.168.56.201 
    export RUEI_REP_FAILOVER_STANDBY_IP=192.168.56.202 
    export RUEI_REP_FAILOVER_VIRTUAL_IP=10.11.12.23 
    export RUEI_REP_FAILOVER_VIRTUAL_DEV=eth0 
    export RUEI_REP_FAILOVER_VIRTUAL_MASK=255.255.255.0 
    

    THE RUEI_REP_FAILOVER_PRIMARY_IP and RUEI_REP_FAILOVER_STANDBY_IP settings should specify the IP addresses of the crossover cable between the two Reporter systems. See Section 2.4.1, "Check The RUEI Configuration File" for an explanation of these settings. Note that the settings specified on both Reporter systems must be identical except for the RUEI_REP_FAILOVER_VIRTUAL_DEV setting.

  9. Issue the following command to restart processing of monitored traffic on the primary Reporter system:

    project -start
    
  10. Install the ruei-reporter-failover.sh script on both Reporter systems. For example, in the /usr/local/sbin directory. It is located in the RUEI zip file (see Section 2.3, "Unpacking the RUEI Software").

  11. Add the following entry to the root user's crontab file of both the primary and secondary Reporter systems:

    * * * * * /usr/local/sbin/ruei-reporter-failover.sh
    

    This causes the secondary Reporter to send a heartbeat signal to the primary Reporter every 60 seconds, and take over processing of RUEI monitored traffic in the event that the Primary Reporter becomes unavailable.

    Wait at least 60 seconds.

  12. Ensure that all user access to the Reporter GUI is via the specified virtual IP address. This is necessary to ensure automatic failover to the secondary Reporter system in the event that the primary Reporter system becomes unavailable.

  13. Check the RUEI_DATA/processor/log/failover.log file on both Reporter systems. These files contain the results of the "ping" commands. Ensure that there are no error messages. For example, about unspecified failover configuration settings.

  14. Check the output of the /sbin/ifconfig command on the primary Reporter to ensure that the virtual IP address has been correctly configured. For example:

    /sbin/ifconfig
    eth0      Link encap:Ethernet  HWaddr 08:00:27:F7:B0:14
              inet addr:192.168.56.201  Bcast:192.168.56.255  Mask:255.255.255.0
              inet6 addr: fe80::a00:27ff:fef7:b014/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:80 errors:0 dropped:0 overruns:0 frame:0
              TX packets:311 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:12793 (12.4 KiB)  TX bytes:26268 (25.6 KiB)
    
    eth0:0    Link encap:Ethernet  HWaddr 08:00:27:F7:B0:14
              inet addr:10.11.12.23  Bcast:192.168.56.255  Mask:255.255.255.0
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
    
  15. Unregister all remote Collectors with the primary Reporter, and re-register them using the virtual IP address.

  16. Shutdown the primary Reporter system, and verify that the secondary Reporter begins processing monitored traffic. A warning that the primary system is unreachable and that the secondary system is being activated is reported in the Event log. Note that after doing so, you must perform a failback to return your RUEI installation to its original state.

  17. Update the Reporter URL (select System, then Maintenance, and then E-mail setup) with the virtual Reporter host name or IP address.

9.5 Initiating Reporter Failback

Failback to the primary Reporter system must be performed manually in order to return your RUEI installation to its original state. Do the following:

  1. Load your global RUEI configuration settings on the secondary server using the following command as the root user:

    . /etc/ruei.conf
    
  2. Ensure that the heartbeat mechanism between the primary and secondary Reporter systems is functioning correctly. To do so, verify that they can 'ping' each other on the RUEI_REP_FAILOVER_PRIMARY_IP and RUEI_REP_FAILOVER_STANDBY_IP IP addresses.

  3. To instigate the fallback, remove the active-failover-server file, and shutdown the virtual interface on the secondary server by issuing the following commands:

    rm $RUEI_DATA/processor/data/active-failover-server
    ifconfig $RUEI_REP_FAILOVER_VIRTUAL_DEV:0 down