C H A P T E R 11 |
Failover Groups |
Sun Ray servers configured in a failover group (FOG) provide users with a high level of availability when one of those servers becomes unavailable because of a network or system failure. This chapter describes how to configure failover groups.
For a discussion on how to utilize multiple failover groups to utilize regional hotdesking, see Hotdesking (Mobile Sessions).
This chapter covers these topics:
A failover group consists of two or more Sun Ray servers grouped together to provide highly-available and scalable Sun Ray service for a population of Sun Ray DTUs. Releases earlier than 2.0 supported DTUs available to the servers only on a common, dedicated interconnect. Beginning with the 2.0 release, this capability was expanded to allow access across the LAN to either local or remote Sun Ray devices. However, the servers in a failover group must still be able to reach one another, using multicast or broadcast, over at least one shared subnet. Servers in a group authenticate (or “trust”) one another using a common group signature. The group signature is a key used to sign messages sent between servers in the group; it must be configured to be identical on each server.
Failover groups that use more than one version of Sun Ray Server Software will be unable to use all the features provided in the latest releases. On the other hand, the failover group can be a heterogeneous group of Sun servers running various releases of the Solaris operating environment, such as Solaris 9, and Solaris 10.
When a dedicated interconnect is used, all servers in the failover group should have access to, and be accessible by, all the Sun Ray DTUs on a given sub-net. The failover environment supports the same interconnect topologies that are supported by a single-server Sun Ray environment; however, switches should be multicast-enabled.
FIGURE 11-1 illustrates a typical Sun Ray failover group. For an example of a redundant failover group, see FIGURE 11-2.
FIGURE 11-1 Simple Failover Group
When a server in a failover group fails for any reason, each Sun Ray DTU connected to that server reconnects to another server in the same failover group. The failover occurs at the user authentication level: the DTU connects to a previously existing session for the user’s token. If there is no existing session, the DTU connects to a server selected by the load-balancing algorithm. This server then presents a login screen to the user, and the user must relogin to create a new session. The state of the session on the failed server is lost.
The principal components needed to implement failover are:
A module that monitors the availability (liveness) of the Sun Ray servers and facilitates redirection when needed.
All DHCP servers configured to assign IP addresses to Sun Ray DTUs have a non-overlapping subset of the available address pool.
FIGURE 11-2 Redundant Failover Group
The redundant failover group illustrated in FIGURE 11-2 can provide maximum resources to a few Sun Ray DTUs. The server sr47 is the primary Sun Ray server, and sr48 is the secondary Sun Ray server; other secondary servers (sr49, sr50... are not shown.
The utadm command assists you in setting up a DHCP server. The default DHCP setup configures each interface for 225 hosts and uses private network addresses for the Sun Ray interconnect. For more information on using the utadm command, see the man page for utadm.
Before setting up IP addressing, you must decide upon an addressing scheme. The following examples discuss setting up class C and class B addresses.
The loss of a server usually implies the loss of its DHCP service and its allocation of IP addresses. Therefore, more DHCP addresses must be available from the address pool than there are Sun Ray DTUs. Consider the situation of five servers and 100 DTUs. If one of the servers fails, the remaining DHCP servers must have enough available addresses so that every “orphaned” DTUs gets a new working address.
TABLE 11-1 lists configuration settings used to configure five servers for 100 DTUs, accommodating the failure of two servers (class C) or four servers (class B).
The formula for address allocation is: address range (AR) = number of DTUs/(total servers - failed servers). For example, in the case of the loss of two servers, each DHCP server must be given a range of 100/(5-2) = 34 addresses.
Ideally, each server would have an address for each DTU. This would require a class B network. Consider these conditions:
Server IP addresses assigned for the Sun Ray interconnect should all be unique. Use the utadm tool to assign them.
When the Sun Ray DTU boots, it sends a DHCP broadcast request to all possible servers on the network interface. One (or more) server responds with an IP address allocated from its range of addresses. The DTU accepts the first IP address that it receives and configures itself to send and receive at that address.
The accepted DHCP response also contains information about the IP address and port numbers of the Authentication Managers on the server that sent the response.
The DTU then tries to establish a TCP connection to an Authentication Manager on that server. If it is unable to connect, it uses a protocol similar to DHCP, in which it uses a broadcast message to ask the Authentication Managers to identify themselves. The DTU then tries to connect to the Authentication Managers that respond in the order in which the responses are received.
Once a TCP connection to an Authentication Manager has been established, the DTU presents its token. The token is either a pseudo-token representing the individual DTU (its unique Ethernet address) or a smart card. The Session Manager then starts an X window/X server session and binds the token to that session.
The Authentication Manager then sends a query to all the other Authentication Managers on the same subnet and asks for information about existing sessions for the token. The other Authentication Managers respond, indicating whether there is a session for the token and the last time the token was connected to the session.
The requesting Authentication Manager selects the server with the latest connection time and redirects the DTU to that server. If no session is found for the token, the requesting Authentication Manager selects the server with the lightest load and redirects the token to that server. A new session is created for the token.
The Authentication Manager enables both implicit (smart card) and explicit switching. For explicit switching, see Group Manager.
In a large IP network, a DHCP server distributes the IP addresses and other configuration information for interfaces on that network.
The Sun Ray DHCP server can coexist with DHCP servers on other subnets, provided you isolate the Sun Ray DHCP server from other DHCP traffic. Verify that all routers on the network are configured not to relay DHCP requests. This is the default behavior for most routers.
If the Sun Ray server has multiple interfaces, one of which is the Sun Ray interconnect, the Sun Ray DHCP server should be able to manage both the Sun Ray interconnect and the other interfaces without cross-interference.
To Set Up IP Addressing on Multiple Servers, Each with One Sun Ray Interface |
1. Log in to the Sun Ray server as superuser and, open a shell window. Type:
where <interface_name> is the name of the Sun Ray network interface to be configured; for example, hme[0-9], qfe[0-9], or ge[0-9]. You must be logged on as superuser to run this command. The utadm script configures the interface (for example, hme1) at the subnet (in this example, 128).
The script displays default values, such as the following:
The default values are the same for each server in a failover group. Certain values must be changed to be unique to each server.
2. When you are asked to accept the default values, type n:
3. Change the second server’s IP address to a unique value, in this case 192.168.128.2:
4. Accept the default values for netmask, host name, and net name:
5. Change the DTU address ranges for the interconnect to unique values. For example:
Do you want to offer IP addresses for this interface? [Y/N]: new first Sun Ray address: [192.168.128.16] 192.168.128.50 number of Sun Ray addresses to allocate: [205] 34 |
6. Accept the default firmware server and router values:
The utadm script asks if you want to specify an authentication server list:
These servers are specified by a file containing a space-delimited list of server IP addresses or by manually entering the server IP addresses.
The newly selected values for interface hme1 are displayed:
7. If these are correct, accept the new values:
8. Stop and restart the server and power cycle the DTUs to download the firmware.
TABLE 11-2 lists the options available for the utadm command. For additional information, see the utadm man page.
Every server has a group manager module that monitors availability and facilitates redirection. It is coupled with the Authentication Manager.
In setting policies, the Authentication Manager uses the selected authentication modules and decides what tokens are valid and which users have access.
The Group Managers create maps of the failover group topology by exchanging keepalive messages among themselves. These keepalive messages are sent to a well-known UDP port (typically 7009) on all of the configured network interfaces. The keepalive message contains enough information for each Sun Ray server to construct a list of servers and the common subnets that each server can access. In addition, the Group Manager remembers the last time that a keepalive message was received from each server on each interface.
The keepalive message contains the following information about the server:
Note - The last two items are used to facilitate load distribution. See Load Balancing. |
The information maintained by the Group Manager is used primarily for server selection when a token is presented. The server and subnet information is used to determine the servers to which a given DTU can connect. These servers are queried about sessions belonging to the token. Servers whose last keepalive message is older than the timeout are deleted from the list, since either the network connection or the server is probably down.
In addition to automatic redirection at authentication, you can use the utselect or utswitch command for manual redirection.
Note - The utselect GUI is the preferred method to use for server selection. For more information, see the utselect man page. |
The Authentication Manager configuration file, /etc/opt/SUNWut/auth.props, contains properties used by the Group Manager at runtime. The properties are:
Property changes do not take effect until the Authentication Manager is restarted.
As superuser, open a shell window and type:
The Authentication Manager is restarted.
At the time of a server failure, the Group Manager on each remaining server attempts to distribute the failed server’s sessions evenly among the remaining servers. The load balancing algorithm takes into account each server’s capacity (number and speed of its CPUs) and load so that larger or less heavily loaded servers host more sessions.
When the Group Manager receives a token from a Sun Ray DTU and finds that no server owns an existing session for that token, it redirects the Sun Ray DTU to whichever server in the group has the lightest load. A Sun Ray DTU may appear to connect twice, once on the server that answered its DHCP request and a second time on a server that was less loaded than the first.
A failover group is one in which two or more Sun Ray servers use a common policy and share services. It is composed of a primary server and one or more secondary servers. For such a group, you must configure a Sun Ray Data Store to enable replication of the Sun Ray administration data across the group. Configure the secondary servers so that they serve users directly in addition to serving the Data Store. For best results in groups of four or more servers, configure the primary server so that it serves only the Sun Ray Data Store.
The utconfig command sets up the data store for a single system initially, and enables the Sun Ray servers for failover. The utreplica command then configures the Sun Ray servers as a failover group.
If the Sun Ray server is currently monitored by Sun Management Center, utreplica restarts the agent automatically. Log files for Sun Ray servers contain time-stamped error messages which are difficult to interpret if the time is out of sync. To make troubleshooting easier, all secondary servers should periodically synchronize with their primary server.
Tip - Use rdate <primary-host>, preferably with crontab, to synchronize secondary servers with their primary server. |
Layered administration of the group takes place on the primary server. The utreplica command designates a primary server, advises the server of its Administration Primary status, and tells it the host names of all the secondary servers.
Adding or removing secondary servers requires services to be restarted on the primary server. In large failover groups, and significant loads may be pushed onto the primary server from various sources. In addition, runaway processes from user applications on the primary can degrade the health of the entire failover group. Failover groups of more than four servers should have a dedicated primary server devoted to solely serving the Sun Ray Data Store, i.e., not hosting any Sun Ray sessions.
Tip - Configure the primary server before you configure the secondary servers. |
As a superuser, open a shell window on the primary server and type:
where secondary_server1 [secondary_server2...] is a space-separated list of unique host names of the secondary servers.
The purpose of a dedicated primary server is to serve the Sun Ray Data Store.
Follow the procedure to specify a primary server, as above; however, do not run utadm on this server.
The secondary servers in the group store a replicated version of the primary server’s administration data. Use the utreplica command to advise each secondary server of its secondary status and also the host name of the primary server for the group.
As superuser, open a shell window on the secondary server and type:
where primary-server is the hostname of the primary server.
To include an additional secondary server in an already configured failover group:
1. On the primary server, rerun utreplica -p -a with a list of secondary servers.
2. Run utreplica -s primary-server on the new secondary server.
As superuser, open a shell window and type:
This removes the replication configuration.
As superuser, open a shell window and type:
The result indicates whether the server is standalone (dedicated), primary (with the secondary host names), or secondary (with the primary host name).
To View Network (Failover Group) Status |
A failover group is a set of Sun Ray servers all running the same release of Sun Ray Server Software and all having access to all the Sun Ray DTUs on the interconnect.
1. From the Servers tab in the Admin GUI, click on a server name to display its Server Details screen.
FIGURE 11-3 Network Status Screen
The Network Status screen provides information on group membership and network connectivity for trusted servers--those in the same failover group.
Note - Sun Ray server broadcasts do not traverse routers or servers other than Sun Ray servers. |
If one of the servers of a failover group fails, the remaining group members operate from the administration data that existed prior to the failure.
The recovery procedure depends on the severity of the failure and whether a primary or secondary server has failed.
Note - When the primary server fails, you cannot make administrative changes to the system. For replication to work, all changes must be successful on the primary server. |
There are several strategies for recovering the primary server. The following procedure is performed on the same server which was the primary after making it fully operational.
Use this procedure to rebuild the primary server data store from a secondary server. This procedure uses the same hostname for the replacement server.
1. On one of the secondary servers, capture the current data store to a file called /tmp/store:
This provides an LDIF format file of the current data store.
2. FTP this file to the /tmp directory on the primary server.
3. Follow the directions in the Sun Ray Server Software 4.0 Installation and Configuration Guide to install Sun Ray Server Software.
4. After running utinstall, configure the server as a primary server for the group. Make sure that you use the same admin password and group signature.
5. Shut down the Sun Ray services, including the data store:
This populates the primary server and synchronizes its data with the secondary server. The replacement server is now ready for operation as the primary server.
8. (Optional) Confirm that the data store is repopulated:
9. (Optional) Perform any additional configuration procedures.
Note - This procedure is also known as promoting a secondary server to primary. |
1. Choose a server in the existing failover group to be promoted and configure it as the primary server:
2. Reconfigure each of the remaining secondary servers in the failover group to use the new primary server:
This resynchronizes the secondary server with the new primary server.
Where a secondary server has failed, administration of the group can continue. A log of updates is maintained and applied automatically to the secondary server when it has recovered. If the secondary server needs to be reinstalled, repeat the steps described in the Sun Ray Server Software 4.0 Installation and Configuration Guide.
The utconfig command asks for a group signature if you chose to configure for failover. The signature, which is stored in the /etc/opt/SUNWut/gmSignature file, must be the same on all servers in the group.
The location can be changed in the gmSignatureFile property of the auth.props file.
To form a fully functional failover group, the signature file must:
Tip - For slightly better security, use long passwords. |
1. As superuser of the Sun Ray server, open a shell window and type:
You are prompted for the signature.
2. Enter it twice identically for acceptance.
3. For each Sun Ray server in the group, repeat the steps, starting at step 1.
Note - It is important to use the utgroupsig command, rather than any other method, to enter the signature. utgroupsig also ensures proper internal replication. |
Being able to take servers offline makes maintenance easier. In an offline state, no new sessions are created. However, old sessions continue to exist and can be reactivated unless Sun Ray Server Software is affected.
At the command-line interface, type:
At the command-line interface, type:
Copyright © 2007, Sun Microsystems, Inc. All Rights Reserved.