7.7. Troubleshooting Arrays and Load Balancing

This section describes some typical problems when using SGD servers, and how to fix them.

The following troubleshooting topics are covered:

7.7.1. Troubleshooting Array Resilience

To help you to diagnose and fix problems when using array resilience, you can do the following:

  • Show status information for the SGD array

  • Enable array resilience logging

7.7.1.1. Showing Status Information For an SGD Array

You use the tarantella status command on an SGD server to show status information for the server.

This section includes some examples of using tarantella status to show status information for an SGD array when the primary server in the array goes down. Section 7.1.6.2.1, “Primary Server Goes Down” includes a detailed description of this array resilience scenario.

The original network configuration used for the examples is a three-node array of SGD servers in the domain example.com, as follows:

  • Primary server – boston

  • Secondary servers – newyork, detroit

When the primary server boston goes down, running tarantella status on newyork indicates that there is a connection problem with the SGD array, as follows:

$ tarantella status
Array members (3):
 - newyork.example.com (secondary): Accepting standard connections.
 - boston.example.com (primary): NOT ACCEPTING CONNECTIONS.
 - detroit.example.com (secondary): Accepting standard connections.
...

If the SGD servers in the array do not agree on the array membership, tarantella status shows the array configuration as seen by every SGD server in the array. For example, running tarantella status on newyork during the failover stage might show the following information:

$ tarantella status
	Inconsistent array: the servers report different array membership.
...
boston.example.com reports an error:
 - Host is unavailable
 
newyork.example.com reports 3 members as:
 - newyork.example.com
 - boston.example.com
 - detroit.example.com
 
detroit.example.com reports 1 member as:
 - detroit.example.com

The tarantella status command indicates if the array is in a repaired state. For example, running tarantella status from detroit after the failover stage has completed might show the following information:

$ tarantella status
Array members (2):
 - newyork.example.com (primary)
 - detroit.example.com (secondary)
...
This node is in a repaired array. Any alterations to array state will prevent recovery
of the original array.
Use the tarantella status --originalstate command to see the original array state.

You use the --originalstate option to list the members of the array before it was repaired. For example, using the --originalstate option on any server in the array shows the original array members, as follows:

$ tarantella status --originalstate
Original array members (3):
 - boston.example.com (primary)
 - newyork.example.com (secondary)
 - detroit.example.com (secondary)
...

After the recovery stage, you can use the tarantella status command to verify that the original array formation has been recreated. For example, running tarantella status on newyork might display the following information:

$ tarantella status
Array members (3):
 - newyork.example.com (secondary): Accepting standard connections.
 - boston.example.com (primary): Accepting standard connections.
 - detroit.example.com (secondary): Accepting standard connections.
...

7.7.1.2. Enabling Array Resilience Logging

To enable logging for array resilience, add the following log filters in the Log Filter field on the Global Settings → Monitoring tab in the Administration Console:

server/failover/*:failover%%PID%%.log
server/failover/*:failover%%PID%%.jsl

See Section 7.4.3, “Using Log Filters to Troubleshoot Problems With an SGD Server” for more information on configuring and using SGD log filters.

7.7.2. Troubleshooting Clock Synchronization Issues

Problems can arise if the clocks on the SGD servers in an array are not in synchronization. If possible, use NTP software or the rdate command to ensure the clocks on all SGD hosts are synchronized.

You run the tarantella status command on the primary SGD server to show any clock synchronization issues for the array. The following example indicates that the clock on the secondary server newyork.example.com is out of synchronization.

$ tarantella status
Array members (3):
 - boston.example.com (primary): Accepting standard connections.
 - newyork.example.com (secondary): Accepting standard connections.
 - detroit.example.com (secondary): Accepting standard connections.
 
WARNING: The clocks on the array nodes are not synchronized.
The following array members disagree with the primary:
 - newyork.example.com

If clocks are out of synchronization, a warning message is also displayed on the Secure Global Desktop Servers tab of the Administration Console.

You use the --byserver option of tarantella status to display the clock setting on each SGD server in the array, as follows:

$ tarantella status --byserver
 
boston.example.com:
 - Array member (primary): Accepting standard connections.
 ...
 - Current time reported: Wed Apr 28 09:36:16 BST 2010
 
newyork.example.com:
 - Array member (secondary): Accepting standard connections.
 ...
 - Current time reported: Wed Apr 28 09:38:02 BST 2010
 
detroit.example.com:
 - Array member (secondary): Accepting standard connections.
 ...
 - Current time reported: Wed Apr 28 09:36:16 BST 2010
 
WARNING: The clocks on the array nodes are not synchronized.

7.7.3. Troubleshooting Advanced Load Management

If you experience problems with the Least CPU Usage and Most Free Memory methods of application load balancing, you can get information from the following places to help you understand what is happening:

  • SGD server log files

    Add the following filters to the Log Filters field on the Global Settings → Monitoring tab in the Administration Console:

    server/tier3loadbalancing/*:t3loadbal%%PID%%.log
    server/tier3loadbalancing/*:t3loadbal%%PID%%.jsl

    This provides detailed information about the decision to run an application and the data being sent by the application server.

    See Section 7.4.3, “Using Log Filters to Troubleshoot Problems With an SGD Server” for more information on configuring and using SGD log filters.

  • SGD Enhancement Module logs

    For UNIX or Linux platform application servers, these are in the /opt/tta_tem/var/log/tier3loadprobePID_error.log file.

    For Windows application servers, this information is displayed in the Event Viewer.

  • Load balancing service connection Common Gateway Interface (CGI) program

    Go to the https://applicationserver:3579?get&ttalbinfo URL.

You can use this information to troubleshoot the following common problems:

7.7.3.1. The Load Balancing Service Is Not Working

If you think the load balancing service is not working, check the following.

Questions

Questions and Answers

7.7.3.1.1: Is the SGD Enhancement Module installed and running?

On Microsoft Windows applications servers, use Control Panel → Administrative Tools → Services to check whether the Tarantella Load Balancing Service is listed and is started.

On UNIX and Linux platform application servers, run the following command as superuser (root) to check that load balancing processes are running:

# /opt/tta_tem/bin/tem status

7.7.3.1.2: Is the primary SGD server running?

The load balancing service on the application server sends load information to the primary SGD server. If the primary is not available, SGD uses Fewest application sessions as the method for load balancing application servers.

7.7.3.1.3: Is your firewall blocking the load balancing service?

For the load balancing service to work, the firewall must allow the following connections:

  • A TCP connection on port 3579 between the SGD server and the application server.

  • A UDP connection on port 3579 between the application server and the SGD server.

Note

These connections do not need to be authenticated.

7.7.3.1.4: What do the log files show?

Check the log files for further information, see Section 7.7.3, “Troubleshooting Advanced Load Management” for details.

7.7.3.2. SGD Ignores an Application Server Load Balancing Properties File

After creating a load balancing properties file for an application server, you must do a warm restart of the primary SGD server. Run the following command as superuser (root):

# tarantella restart sgd --warm

Ensure that no users are logged in to the SGD server, and that there are no application sessions, including suspended application sessions, running on the SGD server.

7.7.3.3. One of the Application Servers Is Never Picked

If one of the application servers is never picked to run applications, check the following.

Questions

  • 7.7.3.3.1: Is the load balancing service running on the application server?

  • 7.7.3.3.2: Is the application server available to run applications?

  • 7.7.3.3.3: What do the log files show?

Questions and Answers

7.7.3.3.1: Is the load balancing service running on the application server?

See Section 7.7.3.1, “The Load Balancing Service Is Not Working”.

7.7.3.3.2: Is the application server available to run applications?

Check the application server object in the Administration Console. Ensure the Application Start check box is selected on the General tab for the application server object.

Check that the application server is up.

7.7.3.3.3: What do the log files show?

Check the log files for further information, see Section 7.7.3, “Troubleshooting Advanced Load Management” for details.

7.7.3.4. One of the Application Servers Is Always Picked

If one application server is always picked to run applications regardless of its load, check the following.

Questions

  • 7.7.3.4.1: Is more than one application server configured to run the application?

  • 7.7.3.4.2: Are the other application servers available to run applications?

  • 7.7.3.4.3: Is the correct load balancing method selected?

  • 7.7.3.4.4: Are you using server affinity?

  • 7.7.3.4.5: Is the load balancing service running on the application server?

  • 7.7.3.4.6: What do the log files show?

Questions and Answers

7.7.3.4.1: Is more than one application server configured to run the application?

Check the Hosting Application Servers tab for the application object.

7.7.3.4.2: Are the other application servers available to run applications?

Check the application server objects in the Administration Console. Ensure the Application Start check box is selected on the General tab

Check that all the application servers are up.

7.7.3.4.3: Is the correct load balancing method selected?

In the Administration Console, check that either Most Free Memory or Least CPU Usage is selected as the load balancing method on the Performance tab for the application object, or on the Global Settings → Performance tab.

7.7.3.4.4: Are you using server affinity?

Server affinity means that, if possible, SGD starts an application on the same application server as the last application started by the user. Server affinity is on by default, see Section 7.2.5.5, “Server Affinity”.

7.7.3.4.5: Is the load balancing service running on the application server?

See Section 7.7.3.1, “The Load Balancing Service Is Not Working”.

7.7.3.4.6: What do the log files show?

Check the log files for further information, see Section 7.7.3, “Troubleshooting Advanced Load Management” for details.

7.7.3.5. Two Identical Application Servers, But One Runs More Applications Than the Other

Check that the server weighting value for the servers are the same. See Section 7.2.7.1, “Application Server's Relative Power”.

7.7.3.6. The SGD Server Log File Shows an Update Received for an Unknown ID

The SGD server log file might show an information message containing the following text:

Got an update for unknown id from machine applicationserver

This message can be ignored. It occurs only when the primary SGD server is restarted.

7.7.4. SGD Uses Too Much Network Bandwidth

If SGD is using a lot of network bandwidth, set the Bandwidth Limit attribute for a user profile to reduce the maximum allowable bandwidth the user can use.

Note

Reducing the available bandwidth might have implications for application usability.

In the Administration Console, go to the User Profiles tab and select the user profile object you want to configure. Go to the Performance tab and select a value from the Bandwidth Limit list.

Alternatively, use the following command:

$ tarantella object edit --name obj --bandwidth bandwidth

The following are the available bandwidths:

Administration Console

Command Line

2400 bps

2400

4800 bps

4800

9600 bps

9600

14.4 Kbps

14400

19.2 Kbps

19200

28.8 Kbps

28800

33.6 Kbps

33600

38.8 Kbps

38800

57.6 Kbps

57600

64 Kbps

64000

128 Kbps

128000

256 Kbps

256000

512 Kbps

512000

768 Kbps

768000

1 Mbps

1000000

1.5 Mbps

1500000

10 Mbps

10000000

None

0

Note

None is the default. This means there is no limit on bandwidth usage.

7.7.5. Users Cannot Connect to an SGD Server When It Is In Firewall Traversal Mode

If users cannot connect to an SGD server when it is in firewall traversal mode, this is usually caused by starting the SGD server before the SGD web server.

In firewall traversal mode, an SGD server listens on port 443 and forwards any web connections to the SGD web server, which is configured to listen on localhost port 443 (127.0.0.1:443).

If an SGD server is started before the SGD web server, the SGD server binds to all the available interfaces and this means that the SGD server forwards any web connections to itself in an infinite loop.

One solution is to always start the SGD web server before the SGD server. If you use the tarantella start command, the SGD server and web server are always started in the correct order.

Another solution is to configure SGD so that it never binds to the localhost interface. To do this, use the following command:

$ tarantella config edit \
--tarantella-config-server-bindaddresses-external "!127.0.0.1"
Note

On some shells you cannot use straight quotation marks, "!127.0.0.1", as the !127 might be substituted. Use single straight quotation marks instead, '!127.0.0.1'.

You can also use this command to specify exactly which interfaces you do want SGD to bind to. You do this by typing a comma-separated list of DNS names or IP addresses.

See Section 1.5.2, “Firewall Traversal” for more details about running SGD in firewall traversal mode.

7.7.6. Users Cannot Relocate Their Sessions

When a user logs in to an SGD server without logging out of another, normally the user's session is relocated to the new server. This is sometimes called session moving, or session grabbing.

If the clocks on all SGD servers in the array are not synchronized, user sessions might not relocate successfully.

SGD uses the time stamps on user sessions to determine which is newer. The newer user session is considered to be current. If clocks are not synchronized, the time stamps might give misleading information.

Because time synchronization is important, use Network Time Protocol (NTP) software to synchronize clocks. Alternatively, use the rdate command.

See also Section 7.4.2, “User Sessions and Application Sessions” for more information about user sessions in SGD.