6.11 Load Balancing Troubleshooting

6.11.1 How Load Balancing Works
6.11.2 Verifying Load Balancing Configuration
6.11.3 Fixing Poor Load Balancing Situations

There may be times when load balancing is not working properly. This section provides details about how load balancing works and ways to troubleshoot any problems that may occur.

6.11.1 How Load Balancing Works

The first step to troubleshoot load balancing is to understand how it works. When a desktop client initiates or re-initiates a connection to the Authentication Manager (such as when a smart card is inserted or a user name is entered at the NSCM login GUI), a token is presented to the Authentication Manager (utauthd). The Authentication Manager checks whether a user session for the token is available on any of the Sun Ray servers in the failover group. If no user session is available, a load balancing process is initiated. Idle sessions are ignored during the load balancing process.

For the load balancing process, various load-related parameters and the server's total CPU power are combined into a parameter called "desirability." Then, a weighted random selection is made between all "online" servers in the same failover group, where the token is more likely to be redirected to a server with a higher desirability. Once the token is redirected to a Sun Ray server according to the load balancing process, the Authentication Manager on the selected server checks whether an idle session already exists for the redirected token on this server. If no idle session exists for the redirected token on the server, then the Authentication Manager initiates a new session.

Note

The reason to incorporate a weighted random selection into the load balancing algorithm is to avoid all sessions ending up on the same server when many users log in simultaneously, for example, specific times in the morning when everybody gets into the office.

Here are some examples to show when load balancing occurs:

  • A smart card user logs out, pulls out the card, and reinserts the card. Load balancing occurs when the card is reinserted.

  • An NSCM user logs out. Load balancing always occurs when the user next logs in at the NSCM login GUI.

  • A user terminates the session on a Sun Ray Client by pressing Ctrl-Alt-Bksp-Bksp. Load balancing occurs on the next login.

Here are some more general notes about load balancing:

  • A Sun Ray server that is "offline" will still provide the NSCM login GUI if NSCM is enabled. However, if a user then logs in, load balancing is triggered and the actual user session will be created on another server.

  • DHCP options that define the Sun Ray server to which a desktop client connects affects only the initial connection. The Authentication Manager then assigns the client to an online server in the failover group based on the normal load balancing process.

  • The initial connection to the Authentication Manager can be load balanced by adding several hosts to the servers= line of the .parms file and adding the select=random line. The select=random line forces the Sun Ray Client to randomly select one of the hosts in the servers= line.

  • The load balancing algorithm relies on the Sun Ray server's OS to provide correct system information. Some CPUs have support for running multiple threads per core. If the OS represents this CPU functionality as additional cores in the CPU, this can cause poor load balancing in a heterogeneous failover group.

  • Load balancing is completely unrelated to assigning DHCP addresses to desktop clients. Load balancing occurs after a desktop client obtains a DHCP address and connects to the Authentication Manager.

  • NSCM sessions are automatically terminated at log out, and newly created when the user name is entered at the NSCM login GUI. Because of this, NSCM usually leads to much better load balancing.

6.11.2 Verifying Load Balancing Configuration

Use the following list to verify if load balancing is properly configured in your failover group.

  • Make sure that all the servers are configured as part of a failover group through the utconfig or utsetup command, as shown below:

    Configure this server for a failover group? (y/[n])? y
    About to configure the following software products:
    .
    .
    .
    Failover group: yes 
    You have chosen to configure this server for a failover group.
    
    All servers in a failover group must share a unique signature,
    which is a string of 8 or more characters where at least two
    characters are letters and at least one is not.
    
    Enter signature:
    Re-enter signature:         
    

    The utconfig or utsetup command creates a log file in /var/adm/log (Oracle Solaris) or /var/log/SUNWut (Oracle Linux). Check this log file to verify if the server was configured for a failover group. You can also verify that the /etc/opt/SUNWut/utadmin.conf file exists on each server.

  • Make sure all the servers are using the same group signature, which is requested by the utconfig or utsetup command.

    You can use the utgstatus command on a server to show the list of Sun Ray servers that are in the same failover group, meaning they share the same group signature. If a server is not on the list, you can use the utgroupsig command to update the group signature on the missing server.

  • Make sure session selection policy is enabled on each server in the failover group by using the -g option of the utpolicy command.

    Beyond the session selection policy, all Sun Ray Software policy options must be identical across all servers in a failover group.

    Note

    The currently configured utpolicy options are synced from the primary server's Sun Ray data store to all the secondary servers when they are configured. Once the primary and secondary servers are configured, making any utpolicy changes on any server in a failover group are automatically replicated to all the other servers.

  • It is recommended that all servers in a failover group have an identical /etc/opt/SUNWut/auth.props file.

  • Make sure the Group Manager and load balancing are enabled in the server's /etc/opt/SUNWut/auth.props file:

    enableLoadBalancing = true
    enableGroupManager = true
    

    If these values are false, the Sun Ray server will not be included in load balancing.

  • If all Sun Ray servers in the failover group are in the same subnet and the network components are dropping multicast packets, change the communication to broadcast by disabling the multicast setting in the /etc/opt/SUNWut/auth.props file:

    enableMulticast = false     
    

    By default, Sun Ray servers within the same failover group use multicast communication.

  • Make sure all the Sun Ray servers in the failover group are "online" by using the Admin GUI or the utgstatus command.

  • Make sure all the Sun Ray interfaces are up and reachable through the utgstatus command.

    Also, check the /var/opt/SUNWut/log/auth_log* file for token query timed out messages, for example:

    token query timed out to host labhost2 interface 192.168.128.2      
    

    In this example, labhost2 was unreachable on interface 192.168.128.2, so this interface was ignored during load balancing.

    Note

    If the group signatures match but utgstatus shows some servers or interfaces as down or unreachable even though they are up, it is likely that some network component (such as bad firmware or a bad port on a switch) is dropping multicast packets. Try disabling the mulitcast setting as described above. Also, check the patch state of the interface driver, especially if complex configurations such as IPMP are used.

  • If different network interfaces are connected to the same physical switch, make sure the network interfaces have different ethernet addresses.

  • If a network issue is likely, run the following commands to display any errors or collisions:

                
    /bin/netstat -in 
    /bin/netstat -sn              
    

    Also, collect a few minutes of /opt/SUNWut/sbin/utcapture output to check for packet loss. The utcapture checks only server-to-client UDP traffic.

  • Make sure you are following the rules for mixing Sun Ray servers in a failover group. See Section 6.4, “Mixing Different Sun Ray Servers” for details.

  • Make sure the utauthd daemon is running on the Sun Ray servers in order to accept new sessions.

6.11.3 Fixing Poor Load Balancing Situations

After a server outage in a failover group with two servers, there may be a situation when almost all users are hosted on the live server while the second server is idle. This occurs because during an outage of one server, the remaining servers have to host all user sessions. This session imbalance will not change even after all servers are available, because load balancing is strictly limited to Sun Ray session creation. Again, session creation occurs when a token is presented and it currently has no user session, such as when a smart card is inserted that does not have a user session for its token.

There is no way to move an existing user session to another server in order to balance the load. To fix this particular problem, at least half of the current users should force the creation of new sessions so they can be properly load balanced. For example, they could log out of an NSCM session or log out of a smart card session and pull out and reinsert their smart card. You can also affect load balancing by taking a server "offline" to temporarily prevent Sun Ray servers that are under a high load from being assigned any new sessions.