C H A P T E R 10 |
Failover Groups |
Sun Ray servers configured in a failover group provide users with a high level of availability when one of those servers becomes unavailable because of a network or system failure. This chapter describes how to configure failover groups.
For a discussion on how to utilize multiple failover groups to utilize regional hotdesking, see .
This chapter covers these topics:
A failover group consists of two or more Sun Ray servers grouped together to provide highly-available and scalable Sun Ray service for a population of Sun Ray DTUs. Releases earlier than 2.0 supported DTUs available to the servers only on a common, dedicated interconnect. Beginning with the 2.0 release, this capability was expanded to allow access across the LAN to either local or remote Sun Ray devices. However, there is still a requirement for the servers in a failover group to be able to reach one another, using multicast or broadcast, over at least one shared subnet. Servers in a group authenticate (or "trust") one another using a common group signature. The group signature is a key used to sign messages sent between servers in the group; it must be configured to be identical on each server.
Failover groups that use more than one version of Sun Ray Server Software will be unable to use all the features provided in the latest releases. On the other hand, the failover group can be a heterogeneous group of Sun servers.
When a dedicated interconnect is used, all servers in the failover group should have access to, and be accessible by, all the Sun Ray DTUs on a given sub-net. The failover environment supports the same interconnect topologies that are supported by a single-server Sun Ray environment. However, switches should be multicast-enabled.
FIGURE 10-1 illustrates a typical Sun Ray failover group. For an example of a redundant failover group, see FIGURE 10-2.
When a server in a failover group fails for any reason, each Sun Ray DTU connected to that server reconnects to another server in the same failover group. The failover occurs at the user authentication level; the DTU connects to a previously existing session for the user's token. If there is no existing session, the DTU connects to a server selected by the load-balancing algorithm. This server then presents a login screen to the user and the user must relogin to create a new session. The state of the session on the failed server is lost.
The principal components needed to implement failover are:
The redundant failover group illustrated in FIGURE 10-2 can provide maximum resources to a few Sun Ray DTUs. The server sr47 is the primary Sun Ray server and sr48 is the secondary Sun Ray server; other secondary servers (sr49, sr50... are not shown.
The utadm command assists you in setting up a DHCP server. The default DHCP setup configures each interface for 225 hosts and uses private network addresses for the Sun Ray interconnect. For more information on using the utadm command, see the man page for utadm.
Before setting up IP addressing, you must decide upon an addressing scheme. The following examples discuss setting up class C and class B addresses.
The loss of a server usually implies the loss of its DHCP service and its allocation of IP addresses. Therefore, more DHCP addresses must be available from the address pool than there are Sun Ray DTUs. Consider the situation of 5 servers and 100 DTUs. If one of the servers fails, the remaining DHCP servers must have enough available addresses so that all "orphaned" DTUs get a new working address.
TABLE 10-1 describes how to configure five servers for 100 DTUs, accommodating the failure of two servers (class C) or four servers (class B).
The formula for address allocation is: address range (AR) = number of DTUs/(total servers - failed servers). For example, in the case of the loss of two servers, each DHCP server must be given a range of 100/(5-2) = 34 addresses.
Ideally, each server would have an address for each DTU. This would require a class B network. Consider these conditions:
Server IP addresses assigned for the Sun Ray interconnect should all be unique. Use the utadm tool to assign them.
When the Sun Ray DTU boots, it sends a DHCP broadcast request to all possible servers on the network interface. One (or more) server responds with an IP address allocated from its range of addresses. The DTU accepts the first IP address that it receives and configures itself to send and receive at that address.
The accepted DHCP response also contains information about the IP address and port numbers of the Authentication Managers on the server that sent the response.
The DTU then attempts to establish a TCP connection to an Authentication Manager on that server. If it is unable to connect, it uses a protocol similar to DHCP in which it uses a broadcast message to ask the Authentication Managers to identify themselves. The DTU then attempts to connect to the Authentication Managers that responded in the order in which the responses were received.
Once a TCP connection to an Authentication Manager has been established, the DTU presents its token. The token is either a pseudo-token representing the individual DTU (its unique Ethernet address) or a smart card. The Session Manager then starts an X window/X server session and binds the token to that session.
The Authentication Manager then sends a query to all of the other Authentication Managers on the same subnet and asks for information about existing sessions for the token. The other Authentication Managers respond, indicating whether there is a session for the token and the last time the token was connected to the session.
The requesting Authentication Manager selects the server with the latest connection time and redirects the DTU to that server. If no session is found for the token, the requesting Authentication Manager selects the server with the lightest load and redirects the token to that server. A new session is created for the token.
The Authentication Manager enables both implicit (smart card) and explicit switching. For explicit switching, see Group Manager.
In a large IP network, a DHCP server distributes the IP addresses and other configuration information for interfaces on that network.
The Sun Ray DHCP server can coexist with DHCP servers on other subnets, provided you isolate the Sun Ray DHCP server from other DHCP traffic. Verify that all routers on the network are configured not to relay DHCP requests. This is the default behavior for most routers.
If the Sun Ray server has multiple interfaces, one of which is the Sun Ray interconnect, the Sun Ray DHCP server should be able to manage both the Sun Ray interconnect and the other interfaces without cross-interference.
To Set Up IP Addressing on Multiple Servers Each With One Sun Ray Interface |
1. Log in to the Sun Ray server as superuser and, open a shell window. Type:
where <interface_name> is the name of the Sun Ray network interface to be configured; for example, hme[0-9], qfe[0-9], or ge[0-9]. You must be logged on as superuser to run this command. The utadm script configures the interface (for example, hme1) at the subnet (in this example, 128).
The script displays default values, such as the following:
The default values are the same for each server in a failover group. Certain values must be changed to be unique to each server.
2. When you are asked to accept the default values, type n:
3. Change the second server's IP address to a unique value, in this case 192.168.128.2:
4. Accept the default values for netmask, host name, and net name:
5. Change the DTU address ranges for the interconnect to unique values. For example:
Do you want to offer IP addresses for this interface? [Y/N]: new first Sun Ray address: [192.168.128.16] 192.168.128.50 number of Sun Ray addresses to allocate: [205] 34 |
6. Accept the default firmware server and router values:
The utadm script asks if you want to specify an authentication server list:
These servers are specified by a file containing a space-delimited list of server IP addresses or by manually entering the server IP addresses.
The newly selected values for interface hme1 are displayed:
7. If these are correct, accept the new values:
8. Stop and restart the server and power cycle the DTUs to download the firmware.
TABLE 10-2 lists the options available for the utadm command. For additional information, see the utadm man page.
Every server has a group manager module that monitors availability and facilitates redirection. It is coupled with the Authentication Manager.
In setting policies, the Authentication Manager uses the selected authentication modules and decides what tokens are valid and which users have access.
Each Group Manager creates maps of the failover group topology by exchanging keepalive messages among themselves. These keepalive messages are sent to a well-known UDP port (typically 7009) to all of the configured network interfaces. The keepalive message contains enough information for each Sun Ray server to construct a list of servers and the common subnets that each server can access. In addition, the group manager remembers the last time that a keepalive message was received from each server on each interface.
The keepalive message contains the following information about the server:
Note - The last two items are used to facilitate load distribution. See Load Balancing. |
The information maintained by the Group Manager is used primarily for server selection when a token is presented. The server and subnet information is used to determine the servers to which a given DTU can connect. These servers are queried about sessions belonging to the token. Servers whose last keepalive message is older than the timeout are deleted from the list, since either the network connection or the server is probably down.
In addition to automatic redirection at authentication, you can use the utselect graphical user interface (GUI) or utswitch command for manual redirection.
Note - The utselect GUI is the preferred method to use for server selection. For more information, see the utselect man page. |
The Authentication Manager configuration file, /etc/opt/SUNWut/auth.props, contains properties used by the Group Manager at runtime. The properties are:
These properties have default values that are rarely changed. Only very knowledgeable Sun support personnel should direct customers to change these values to help tune or debug their systems. If any properties are changed, they must be changed for all servers in the failover group, since the auth.props file must be the same on all servers in a failover group.
Property changes do not take effect until the Authentication Manager is restarted.
As superuser, open a shell window and type:
The Authentication Manager is restarted.
At the time of a server failure, the Group Manager on each remaining server attempts to distribute the failed server's sessions evenly among the remaining servers. The load balancing algorithm takes into account each server's capacity (number and speed of its CPUs) and load so that larger or less heavily loaded servers host more sessions.
When the Group Manager receives a token from a Sun Ray DTU and finds that no server owns an existing session for that token, it redirects the Sun Ray DTU to the server in the group with the lightest load. It is possible that a Sun Ray DTU appears to connect twice; once on the server that answered its DHCP request and a second time on a server that was less loaded than the first.
A failover group is one in which two or more Sun Ray servers use a common policy and share services. It is composed of a primary server and one or more secondary servers. For such a group, you must configure a Sun Ray Data Store to enable replication of the Sun Ray administration data across the group.
The utconfig command sets up the internal database for a single system initially, and enables the Sun Ray servers for failover. The utreplica command then configures the Sun Ray servers as a failover group.
Log files for Sun Ray servers contain time-stamped error messages which are difficult to interpret if the time is out of sync. To make troubleshooting easier, all secondary servers should periodically synchronize with their primary server.
Tip - Use rdate <primary-host>, preferably with crontab, to synchronize secondary servers with their primary server. |
Layered administration of the group takes place on the primary server. The utreplica command designates a primary server, advises the server of its Administration Primary status, and tells it the host names of all the secondary servers.
Tip - Configure the primary server before you configure the secondary servers. |
As a superuser, open a shell window on the primary server and type:
where secondary_server1 [secondary_server2...] is a space-separated list of unique host names of the secondary servers.
The secondary servers in the group store a replicated version of the primary server's administration data. Use the utreplica command to advise each secondary server of its secondary status and also the host name of the primary server for the group.
As superuser, open a shell window on the secondary server and type:
where primary-server is the hostname of the primary server.
To include an additional secondary server in an already configured failover group:
1. On the primary server, rerun utreplica -p -a with a list of secondary servers.
2. Run utreplica -s primary-server on the new secondary server.
As superuser, open a shell window and type:
This removes the replication configuration.
As superuser, open a shell window and type:
The result indicates whether the server is standalone, primary (with the secondary host names), or secondary (with the Primary host name).
A failover group is a set of Sun Ray servers all running the same release of Sun Ray Server Software and all having access to all the Sun Ray DTUs on the interconnect.
To View Failover Group Status |
1. From the navigation menu in the Admin GUI, select the arrow to the left of Failover Group to expand the menu.
The Failover Group Status window is displayed.
The Failover Group Status window describes the health and current state of multiple Sun Ray servers within your failover group. This window also describes the health of any Sun Ray servers that have responded to a Sun Ray broadcast.
The Failover Group Status window provides information on group membership and network connectivity. The servers are listed by name in the first column. Failover Group Status only displays public networks and Sun Ray interconnect fabrics.
In FIGURE 10-3 the information provided is from the point of view of the server in the upper left hand of the table. In this example the server is ray-146.
Note - Sun Ray server broadcasts do not traverse over routers or servers other than Sun Ray servers. |
These icons depict current failover group status:
If one of the servers of a failover group fails, the remaining group members operate from the administration data that existed prior to the failure.
The recovery procedure depends on the severity of the failure and whether a primary or secondary server has failed.
Note - When the primary server fails, you cannot make administrative changes to the system. For replication to work, all changes must be successful on the primary server. |
There are several strategies for recovering the primary server. The following procedure is performed on the same server which was the primary after making it fully operational.
Use this procedure to rebuild the primary server administration data store from a secondary server. This procedure uses the same hostname for the replacement server.
1. On one of the secondary servers, capture the current data store to a file called /tmp/store:
This provides an LDIF format file of the current database.
2. FTP this file to the /tmp directory on the primary server.
3. Follow the directions in the Sun Ray Server Software 3.1.1 Installation and Configuration Guide to install Sun Ray Server Software.
4. After running utinstall, configure the server as a primary server for the group. Make sure that you use the same admin password and group signature.
5. Shut down the Sun Ray services, including the data store:
This populates the primary server and synchronizes its data with the secondary server. The replacement server is now ready for operation as the primary server.
8. (Optional) Confirm that the data store is repopulated:
9. (Optional) Perform any additional configuration procedures.
Note - This procedure is also known as promoting a secondary server to primary. |
1. Choose a server in the existing failover group to be promoted and configure it as the primary server:
2. Reconfigure each of the remaining secondary servers in the failover group to use the new primary server.:
This resynchronizes the secondary server with the new primary server.
Where a secondary server has failed, administration of the group can continue. A log of updates is maintained and applied automatically to the secondary server when it has recovered. If the secondary server needs to be reinstalled, repeat the steps described in the Sun Ray Server Software 3.1.1 Installation and Configuration Guide.
The utconfig command asks for a group signature if you chose to configure for failover. The signature, which is stored in the /etc/opt/SUNWut/gmSignature file, must be the same on all servers in the group.
The location can be changed in the gmSignatureFile property of the auth.props file.
To form a fully functional failover group, the signature file must:
1. As superuser of the Sun Ray server, open a shell window and type:
You are prompted for the signature.
2. Enter it twice identically for acceptance.
3. For each Sun Ray server in the group, repeat the steps, starting at step 1.
Note - It is important to use the utgroupsig command, rather than any other method, to enter the signature. utgroupsig also ensures that internal database replication occurs properly. |
Being able to take servers offline makes maintenance easier. In an offline state, no new sessions are created. However, old sessions continue to exist and can be reactivated unless Sun Ray Server Software is affected.
At the command-line interface, type:
At the command-line interface, type:
Copyright © 2006, Sun Microsystems, Inc. All Rights Reserved.