Sun Java System Access Manager 7.1 Performance Tuning and Troubleshooting Guide

Part II Troubleshooting Performance Issues

Chapter 6 Best Practices for Performance Tuning and Testing

Using a planned, systematic approach to tuning will help you avoid most performance troubleshooting pitfalls. This chapter includes the following topics:

Avoiding Common Performance Testing and Tuning Mistakes

Don't make the same mistakes deployment engineers and performance test teams usually make. Deployment engineers usually construct the system and perform the functional tests. Next the engineers hand over the system to the performance testing team. The testing team develops test plans and test scripts based on the targeted load assumptions. The project manager usually gives the testing team only a few hours or a few days to conduct the performance tests. Using this approach, the testing team usually encounters unexpected behaviors during the tests.

The testing team then realizes that performance tuning was not done been before the tests. Tuning is hastily done, but problems still persist. The testing team starts to experiment with different parameter settings and configurations. This frequently leads to more problems which jeopardize the schedule. Even when the testing team successfully produces a performance report, the report usually fails to cover test cases and information crucial to capacity planning and production launch support. For example, the report often does not capture the system capacity, request breakdowns, and the system stability under stress.

You can avoid these performance testing and tuning mistakes by using a systematic approach, and by allocating adequate project resources and time.

Using a Systematic Approach to Performance Tuning

The best practice is a systematic approach to performance testing with an allocation of a minimum of three weeks testing time. A good performance tuning plan includes the following phases:

Constructing the System

During the system construction phase, the entire system is built step by step in a modular fashion. For a detailed example, see the document Deployment Example 1: Access Manager 7.1 Load Balancing, Distributed Authentication UI, and Session Failover. Each module is in the example is built and then verified. It's always easier to verify a module build than to troubleshoot an entire system. The modular verification tests prevent configuration problems from being buried in the system. Some of these verification steps are performance related. For example, there are steps to verify that sticky load balancing is working properly. See To Configure the Access Manager Load Balancer in Deployment Example 1: Access Manager 7.1 Load Balancing, Distributed Authentication UI, and Session Failover

Automated Performance Tuning

In this phase, you tune the system using the automated tuning script amtune that comes with the product. The amtune script automates most of the performance tunings and address most, if not all, Access Manager tuning needs. Manual tweaking is unnecessary and may cause harm unless you run into some of the known extreme problems

Related Systems Tuning

In this phase, you manually tune Directory Server, any Web Servers that host Web Policy Agents, and any Application Servers that host J2EE Policy Agents. The typical tuning steps are as follows:

Run amtune to tune the Access Manager system. For more detailed information, see Chapter 2, Access Manager Tuning Scripts.
Follow the amtune onscreen prompts to tune the related Directory Server configuration instances. The following is an overview of the primary tuning steps you must complete:
1. Increase the nsslapd-dbcachesize value.
2. Relocate nsslapd-db-home-directory to the /tmp directory.
For detailed information, see the Directory Server documentation.
Manually tune the user Directory Server user database instance if one is used. The following is an overview of the primary tuning steps you must complete:
1. Increase the nsslapd-dbcachesize value.
2. Relocate the nsslapd-db-home-directoryto the /tmp directory.
If the Access Manager sub-realm is pointing to an external Directory Server user database, then manually tune the sub-realm LDAP connection pool.

The amtune script tunes only the LDAP connection pools of the root realm. See Tuning the LDAP Connection Pool and LDAP Configurations. You can configure the following parameters on LDAPv3IDRpo:
1. LDAP Connection Pool Minimum Size
2. LDAP Connection Pool Maximum Size
If you have installed a Web Policy Agent on a Sun Web Server, then manually tune the Web Server. You must configure the following parameters in the magnus.conf:
- RqThrottle
- RqThrottleMin
- RqThrottleIncrement
- ConnQueueSize
If Access Manager is deployed on a Sun Web Server, the amtune script will modify the Web Server magnus.conf file. You can copy the changes and use the changed values in the Web Policy Agent Web Server.
If you have installed a J2EE Policy Agent on an application server, seeThird-Party Web Containers for instructions on manually tune both the J2EE Policy Agent and the application server. You must configure settings for heap sizes and for garbage collection (GC) behaviors.

Baseline Modular Performance Testing

The system is largely performance tuned after you've run the amtune script. But it is still too early to perform the final complex performance tests. It's always more difficult to troubleshoot performance problems in the entire system than to troubleshoot individual system components performing basic transactions. So in this phase, you perform several baseline tests. Be sure that the specific baseline test scripts you write will:

Verify the functions of the sub-systems under the stress load of basic transactions such as authentications and authorizations.
Establish baseline performance benchmarks for basic transactions.

Conducting Baseline Authentication Tests

You will need the following test scripts are to generate the basic authentication workload:

For all tests, randomly pick user IDs from a large user pool, from minimally 100K to one million users. The load test script should first log the user in, then either log the user out or simply drop the session and let the session time out. A good practice is to remove all static pages and graphics requests from the scripts. This will make the workload cleaner and well— defined. The results are easier to interpret.

The test scripts should have zero think time to put the maximum workload on the system. The tests are not focused on response times in this phase. The baseline tests should determine the maximum system capacity based on maximum throughput. The number of test users, sometimes called test threads, is usually a few hundred. The exact number is unimportant. What is important is to achieve as close to 100% Access Manager CPU usage as possible while keeping the average response time to at least 500 ms. A minimum of 500 ms is used to minimize the impact of relatively small network latencies. If the average response time is too low (for example 50ms), a large portion is likely to be caused by network latency. The data will be contaminated with unnecessary noise.

Determine the Number of Test Users

In the following example baseline test, 200 users per one AM instance are used. For your tests, you could use 200 users for one Access Manager instance, 400 users for two Access Manager instances, 600 users for three Access Manager instances, and so forth. If the workload is too low, start with 100 users, and increase it by increments of 100 to find out the minimum number. Once you have determined the minimum test users per AM instance, use with this number for the rest of the tests to make the results more comparable.

Determine the System Steady State

In the example baseline tests, the performance data is captured at the steady state. The system can take any where from 5 to 15 minutes to reach its steady state. Watch the tests. The following indicators will settle into predictable patterns when the system has reached its steady state:

Transactions per second (TPS), also called throughput
Average response time of individual transactions
CPU usage of all affected servers (including Access Manager, Directory Server, and any load generation machines)
Number of transactions performed by each component in a given period, categorized by transaction types (see Appendix for details)

The following are examples of capturing transactions by categories on different sytems.

On each Access Manager host, parse the container access log to gather the number of different transactions received. For example, if Access Manager is deployed on Sun Web Server, use the following command to obtain the result:

cd /opt/SUNwbsvr/https-<am1host>/logs
cp access a; grep Login a | wc; grep naming a | wc; grep session a| 
wc; grep policy a | wc ; grep jaxrpc a | wc; grep notifi a | wc;  
grep Logout a | wc; wc a;

On each LDAP server, parse the LDAP access log to gather the number of different transactions received. For example, use the following command to obtain the result:

cd <slapd-xxx>/logs
cp access a; grep BIND a | grep "uid=u" | wc; grep BIND a|wc; 
grep UNBIND a| wc; grep SRCH a| wc; grep RESULT a| wc; wc a ;

Conduct the Baseline Test

In this example, the baseline test follows this sequence:

Log in and log out on each individual AM directly.
Log in and time out on each individual AM directly.
Log in and log out using a load balancer with one Access Manager server.
Log in and time out using a load balancer with one Access Manager server.
Log in and log out test on LB with two AM instances behind
Perform login and timeout test on LB with two AM instances behind

If you have two Access Manager instances behind a load balancer, the above tests actually involve at least ten individual test runs: two test runs for 1 through 4, one test run, and one test run for 6.

Note –

In order to perform any log in and timeout test, you must reduce the maximum session timeout value to lower than the default value. For example, change the default 30 minutes to one minute. Otherwise, at the maximum throughput, there will be too many sessions lingering on the system for so long that the memory will be exhausted quickly.

Analyze the Baseline Test Results

The data you capture will help you identify possible trouble spots in the system. The following are examples of things to look for in the baseline test results.

Compare the maximum authentication throughput of individual Access Manager instances with no load balancer in place.

If identical hardware is used in the test, the number of authentication transactions per second should be roughly the same for each Access Manager instance. If there is a large variance in throughput, investigate why one server behaves differently than another.

Compare the maximum authentication throughput of individual Access Manager instances that have a load balancer in front of them.

Using a load balancer should not cause a decrease in the maximum throughput. In the example above, test 3 should yield results similar to test 1 results, and test 4 should yield results similar to test 2 results. If the maximum throughput numbers go down when a load balancer is added to the system, investigate why the load balancer introduces significant overhead. For example, you could conduct a further test with static pages through the load balancer.

Verify that the maximum throughput on a load balancer with two Access Manager instances is roughly twice the throughput on a load balancer with one Access Manager instance behind it.

Verify that for each test, the Access Manager transaction counts report indicates no unexpected Access Manager requests.

For example, if you perform the Access Manager login and logout test, your test results may look similar to this:

    1581   15810  139128
       0       0       0
       0       0       0
       0       0       0
       0       0       0
       0       0       0
    1609   16090  146419
    3198   31972  286043 a

This output indicates three important pieces of information. First, the system processed 1581 login requests and 1609 logouts request. They are roughly equal. This is expected as each login is followed by one logout. Secondly, all other types of AM requests were absent. This is expected. Lastly, the total number of requests received, 3198, is roughly the sum of 1581 and 1609. This indicates there are no unexpected requests that we didn't grepin the command.

Troubleshoot the Problems You Find

A common problem is that when two Access Manager instances are both running, you see not only login and logout requests, but session requests as well. The test results may look similar to this:

    3159   31590  277992
       0       0       0
    5096   50960  486676
       0       0       0
       0       0       0
    1305   13050  127890
    3085   30850  280735
   12664  126621 1174471 a

In this example, for each logout request, there are now extra session and notification requests. The total number of requests does add up. This means there are no other unexpected requests. The reason for the session request is that the sticky load balancing is not working properly. A user logged in on one Access Manager instance, then is sent to another AM instance for logout. The second Access Manager instance must generate an extra session request to the originating AM instance to perform the request. The extra session request increases the system workload and reduces the maximum throughput the system can provide. In this case, the two Access Manager instances cannot double the throughout of the single Access Manager throughput. Instead, there is a mere 20% increase. You can address the problem at this point by reconfiguring the load balancer. This is an example of a problem should have been caught during modular verification steps in the system construction phase.

Run Extended Tests for System Stability

Once the system has passed all the basic authentication tests, it's a good practice to put the system under the test workload for an extended period of time to test the stability. You can use test 6 let it run over several hours. You may need to set up automated scripts to periodically remove excessive access logs generated so that they do not fill up the file systems.

Conducting Baseline Authorization Tests

You will need the following test scripts are to generate the basic authorization workload:

In this example, the baseline authorization test follows this sequence:

Perform login, page-access and logout test on each individual Access Manager instance, with no load balancer in place.

This test determines the Access Manager capacity without the influence of a network element such as the load balancer.
Perform login, page-access and logout test on the load balancer with only one Access Manager instance behind it.

This test determines the impact of the load balancer.
Perform login, page-access and logout test on the load balancer with two Access Manager instances behind it.

This test determines the baseline results when multiple Access Manager instances are running, and indicate whether the sticky load balancing is configured properly.

It is a good practice to set up a single URL policy that allows all authenticated users to access the wildcard URL protected by the policy agent. This simplified setup keep things simple in the baseline tests.

For all tests, randomly pick user IDs from a large user pool, from minimally 100K to one million users. The load test scripts log the user in, accesses a protected static page twice, and then logs the user out. A good practice is to remove all other static page or gif requests from the scripts. This will make the workload cleaner, well-defined, and the results are easier to interpret.

The test scripts should have zero think time to put the maximum workload on the system. The tests are not focused on response times in this phase. The baseline tests should determine the maximum system capacity based on maximum throughput. The number of test users, sometimes called test threads, is usually a few hundred. The exact number is unimportant. What is important is to achieve as close to 100% Access Manager CPU usage as possible while keeping the average response time to at least 500 milliseconds. A well executed test indicates the maximum system capacity while minimizing the impact of network latencies.

Determine the Number of Test Users

A typical 200 users per one Access Manager instance can be used . For example, you could use 200 users for one Access Manager instance, 400 users for two Access Manager instances, 600 users for three Access Manager instances, and so on. If the workload is too low, start with 100 users, and increase it by a 100—user increments to find out the minimum number. Once the number of test users per Access Manager instance is determined, continue to use this number for the rest of the tests to make the results more comparable. If you have two Access Manager instances behind a load balancer, the above tests actually involve at least five individual test runs. You conduct two runs each for tests 1 and 2, and conduct one run for test 3.

Verify that for each test, the response time of the second protected resource access is significantly lower than the response time of the first protected page access. On the first access to a protected resource, the agent needs to perform uncached session validation and authorization. This involves the agent communicating with Access Manager servers. On the second access to a protected resource, the agent can perform cached session validation and authorization. The agent does not need to communicate with the Access Manager servers. Thus the second access tends to be significantly faster. It's common to see the first page access takes 1 second (this highly depends on the number of test users used), while the second page access takes less than 10 ms (this does not depend too much on the number of test users used). If the second page access is not as fast as it should be, compared with the first page access, you should investigate to find out why. Is it because first page access being relatively too fast ? If so, you can increase the number of test users to increase the response time of the first page access. Is it because the agent machine is undersized so that no matter how much load you put on the system, Access Manager does not reach full capacity, and the agent machine reaches full capactiy first. In this case, since the agent machine is the bottleneck, and not the AccessManager, you can expect both the first and second page access to be slow while Access Manager responds quickly.

Analyze the Test Results

The data you capture will help you identify possible trouble spots in the system. The following are examples of things to look for in the baseline test results.

Compare the maximum authorization throughput of individual Access Manager instances with no load balancer in place.

If identical hardware is used in the test, the number of authorizations transactions per second should be roughly the same for each Access Manager instance. If there is a large variance in throughput, investigate why one server behaves differently than another.

Compare the maximum authorization throughput of individual Access Manager instances that have a load balancer in front of them.

Using a load balancer should not cause a decrease in the maximum throughput. In the example above, test 2 should yield results similar to test 1 results. If the maximum throughput numbers go down when a load balancer is added to the system, investigate why the load balancer introduces significant overhead. For example, you could conduct a further test with static pages through the load balancer.

Verify that the maximum throughput on a load balancer with two Access Manager instances is roughly twice the throughput on a load balancer with one Access Manager instance behind it.

If the throughput numbers to do not increase proportionately with the number of Access Manager instances, you have not configured sticky load balancing properly. Users logged in to one Access Manager instance are being redirected to another instance for logout. You must correct the load balancer configuration. When sticky load balancing is properly configured, each Access Manager should serve requests independently and thus the system would scale near linearly. If the throughput numbers to do not increase proportionately with the number of Access Manager instances, you have not configured sticky load balancing correctly. For related information, see Configuring the Access Manager Load Balancer in Deployment Example 1: Access Manager 7.1 Load Balancing, Distributed Authentication UI, and Session Failover.

Verify that for each test, the Access Manager transaction counts report indicates no unexpected Access Manager requests.

For example, if you perform the Access Manager login and logout test, your test results should look similar to this:

    1079   10790   94952
    1032   10320   99072
    1044   10440  101268
    1064   10640  101080
       0       0       0
       0       0       0
    1066   10660   97006
    5312   53093  495052 a

This output indicates three pieces of information. First, the system processed 1079 login, 1032 naming, 1044 session, 1064 policy and 1066 logout requests. These numbers are roughly equal. For each login, there is one naming call, one session call (to validate the user's session), one policy call (to authorize the user's access) and one logout. Secondly, all other types of Access Manager requests were absent. This is expected. Lastly, the total number of request received 5312 is roughly the sum of login, naming, session, policy and logout requests. This indicates there are no unexpected requests that we didn't grep in the command.

Troubleshoot Problems You Find

A common problem is that when two AM instances are both running, you see the number of session requests exceeds the number of logins. For example, the test output may look similar to this:

	
    4075   40750  358600
    4167   41670  400032
   19945  199450 1913866
    3979   39790  381984
       0       0       0
    3033   30330  297234
    3946   39460  359086
   39194  391891 3713840 a

Note that for each login request, there are now 5 session requests, and 0.75 notifications. The total number of requests do add up though. This indicates there are no other unexpected requests. There more session requests per login because the sticky load balancing is not working properly. A user logged in on one Access Manager instance is sometimes sent to another Access Manager instance for session validation and logout. The second Access Manager instance must generate extra session and notification requests to the originating Access Manager instance to perform the request. The extra requests increase the system workload and reduce the maximum throughput the system can provide. In this case, the two Access Manager instances cannot double the throughout of the single AM throughput. You can address the problem by reconfiguring the load balancer. The problem should have been caught during modular verification steps in the system construction phase.

Conduct Extended Stability Tests

Once you've passed all the basic authorization tests, it's a good idea to put the system under the workload for extended period of time to test the stability. You can use test 3 and let it run over several hours. You may need to set up automated scripts to periodically remove excessive access logs generated so that they do not fill up the file systems.

Advanced Performance Tuning

The amtune script is specifically designed to address most, if not all, of the performance tuning needs. This means that you almost never need to manually tweak performance parameters. With the large number of performance related parameters, tweaking them invite more problems instead of solving them. However, there are a few special situations that amtune currently does not tune or tune well. This is documented in Chapter 7, Advanced Performance Tuning. For each special situation, there is an explanation of what amtune is doing today, how to identify whether you need to manually tune the parameters, and how to tune them. It is worth repeating here that most, if not all, of your performance tuning should be addressed by the amtune script. Performance problems are usually caused by poor system configuration. The special tuning cases should be used only if they actually apply to your specific case.

Targeted Performance Testing

By the time you've reached this test phase, you've already done enough baseline tests that give you both the confidence the system performs properly, and a rough idea of how the system should perform in your targeted performance test scenarios. Target performance tests typically have the projected real-world workload in mind. They usually include many more test users, but also slower users (by introducing realistic think time). The test also tries not to test the system at maximum CPU usage. Instead, the tests usually focus on several scenarios. Examples:

Average workload that gauge the users' experience in terms of the average response time.
Peak workload when demands peak or one or more servers are down, and load transfer has occurred, to gauge the users perceived average response time, and the system stability.
Stability tests that use average or peak workload to run extended period of time, such as a day or a week.

Regardless what scenarios you are testing, if a problem occurs, it always helps to go back to the baseline tests to validate if certain things have changed in the environment, and to isolate the new elements (hardware or software configuration changes) that may have contributed to the problem. Unless you've isolated the problem, haphazardly tweaking performance related parameters is not productive, and usually do more harm and cause more confusion. Detailed troubleshooting methodology and techniques are beyond the scope of this document. See section name for suggestions on troubleshooting some common performance problems.

Chapter 7 Advanced Performance Tuning

After conducting basic performance tuning and following the best practices recommendations described in previous chapters, you may still encounter performance issues. This chapter helps you troubleshoot the most common Access Manager performance issue. Topics in this chapter include:

Tuning the LDAP Connection Pool and LDAP Configurations

The amtune script provided by AccessManager recommends parameter values for the following three LDAP connection pools:

Root Realm User Authentication LDAP Connection Pool
Root Realm Data Store LDAP Connection Pool
Access Manager Configuration Store and SMS LDAP Connection Pools

But the script does not actually tune the LDAP connection pools for you. You need to make the changes manually. In addition, in deployments with a subrealm, you must also manually tune the subrealm's connection pools. Just like the root realm, each sub-realm can have its own user authentication LDAP connection pool and data store LDAP connection pool. You must manually tune these as well.

You can modify one or more of the three LDAP connection pool configurations . In each configuration, the recommended values are MIN=8 and MAX=32. Under some conditions, you can increase the MAX value up to 64. The following sections describe how to manually tune the connection pools:

To Tune the User Authentication LDAP Configuration

You can modify the settings on one of the following depending upon the module you use for user authentication.

LDAP Authentication Module: This module is used only to authenticate the user. In the Access Manager console, under Configuration, click Authentication > Core.
Data Store Authentication Module: When the Data Store is as the authentication module, the Data Store LDAP connection pool settings are used. No additional Authentication connection pool settings are used.

To Tune the Data Store LDAP Configuration

The Data Store LDAP Configuration is used for retrieving user profiles and can also be used for authentication. By default, Access Manager 7.1 supports two types of Data Store plug-ins: AMSDK and LDAPv3. If the Data Store Authentication module is used for authentication (see above), then the recommended Data Store LDAP configuration settings are MIN=8 and MAX=64. You can modify the settings on one of the following depending upon the Data Store plug-in you use:

AMSDK Configuration: The AMSDK LDAP configuration is stored in the serverconfig.xml file under the Access Manager config directory. The server group name is default.
LDAPv3 Configuration: To modify the LDAPv3 Configuration, in the Access Manager console, under Access Control, click Realm > DataStore.

To Tune the Access Manager Configuration Store and SMS LDAP Configuration

The Service Management (SMS) LDAP Configuration is used for storing and retrieving all Access Manager configuration and Policy Service configuration. The SMS LDAP Configuration is stored in the serverconfig.xml file under the Access Manager config directory. The server group is sms.

Start by setting all the connection pool configurations with MIN=8 and MAX=32.
If you must make adjustments based on performance test results, adhere to the following requirements:
- The MIN value should be at least 8.
- The MAX value for any pool should not be greater than 64. The MAX value of 32 is enough for most typical deployments.
Special requirements are outside the scope of this document.
After following steps 1 and 2, if low throughput or low response times persist, then try the following solutions:
- Verify that the Directory Server instance is not at 100% CPU usage. If the Directory Server instance is at 100% and the throughput is still low, revisit the indexing on the Directory Server entries. Be sure that Directory Server indexing is configured properly.
- Run load tests to verify that Access Manager login is not causing performance to slow down. First run the tests with logging enabled, and then run the tests with logging disabled. If you find that logging is causing low response time, then you can tune the logging service through the Access Manager console. See Logging in Sun Java System Access Manager 7.1 Administration Reference.

Resolving Memory Issues

The amtune script automatically tunes all memory related parameters. In most deployments, this is sufficient. However, occasionally the amtune tuning may not be sufficient and you may run into memory issues. Memory issues manifest themselves through excessively frequent garbage collection (GC) operations or frequent “Out of Memory” errors.

To resolve memory related issues, tune the following parameters:

com.iplanet.am.sdk.cache.maxSize

User cache/SDK cache.
com.iplanet.am.session.maxSessions

Max Active Session the system should allow.
com.iplanet.am.notification.threadpool.size

Number of threads to process session notifications.
com.iplanet.am.notification.threadpool.threshold

Notification Queue size.
com.iplanet.am.session.purgedelay

Number of minutes to delay the purge timed-out session.

All the parameters listed above can be tuned by editing the AMconfig.properties file which is located in under /etc/opt/SUNWam/config if installed using the JES installer. If the Access Manger is installed using the single WAR, than AMConfig.properties is located in directory you specified when you configured the WAR file.

The minimum required JVM heap size for Access Manager is 1024 mb.

Tuning com.iplanet.am.session.maxSessions: The tuning of this property entire depnds on the JVM Heap size configured in the web container where the Access Manager is deployed. The minimum required JVM heap size for Access Manager is 1024 mb and the # of sessions supported for 1024mb is 12000 and every additional 512mb can support upto 18000 sessions.
Tuning com.iplanet.am.sdk.cache.maxSize: The sdk cache size should be same as the value set for com.iplanet.am.session.maxSessions.
Tuning com.iplanet.am.notification.threadpool.threshold: This is the Notification Queue size. The Notification Queue size should be less than or equal to 30% of the Max Sessions.

The following table lists sample settings for the parameters listed above based on the rules described above.

Maximum JVM Heap Size	Maximum Active Sessions	SDK Cache Size	Notification Queue Size
1024mb	12000	12000	4000
1536mb	30000	30000	10000
2048mb	48000	48000	16000
2560mb	66000	66000	22000
3136mb	90000	90000	30000

The above settings may not be suitable for certain deployments. When the number of user attributes retrieved is large, the SDK cache size will increase. Similarly, if the Extra Session properties are set, the Session size will increase.

In these cases, use one of the following options to solve the memory related issues:

Reduce the Max Sessions limit and make sure you follow the above rules. If you reduce the Max Sessions you may need to add additional instances to support additional sessions. If you do not want to add additional instances you can use the 64bit JVM.
Reduce the SDK cache size. If you reduce the SDK cache size, your performance will go down. For better performance it is always better to set the SDK cache size equal to Max Sessions, and add additional instances to support more sessions.

To Tune the Notification Threadpool Size

Set the value of com.iplanet.am.notification.threadpool.size based on number of CPUs and based on the purgedelay value. See To Tune the Purge Delay Settingsfor related information.

If purgedelay is set to 0, the threadpool should be set using the following formula: (number of CPUs) x 3 = threadpool size. For example, for a machine with 8 CPUs, the threadpool size is 24. ForNiagra T1000/T2000 machines, use the formula: (number of cores) x 3 = threadpool size.
If the purgedelay value is set to greater than 0, then the threadpool should be set using the following formula: (number of CPUs) x 4 = threadpool size . ForNiagra T1000/T2000 machines, use the formula: (number of cores) x 4 = threadpool size. The amtune script currently does not set this value based on the above rules. This configuration should be done manually. With the above setting if you still see problems such as frequent "Cannont send notification" or "Notification task queue full" errors in the amSession debug file, this indicate that the SessionNotificationqueue is full. The problem could be related to the Policy Agent or SDK client which is receiving notifications. The Policy Agent or SDK client is not able to process notifications properly. Consider disabling notification mode on the Policy Agent.

To Tune the Purge Delay Settings

The purgedelay property is used to keep the session in memory in a timed-out state after the session has timed out. If the value is set to 0, then the session is removed from memory immediately. If the value is greater than zero, then the session is maintained in the memory until the purgedelay time elapses.

In almost all deployments, purgedelay should be set to 0. The amtune script will set the value to 0 when run.
In special cases when the purgedelay value is greater than 0, reduce the number of active sessions (com.iplanet.am.session.maxSessions). Additionally, increase the notification threadpool size (com.iplanet.am.notification.threadpool.size)

The property com.iplanet.am.session.maxSessions describes the maximum number of active sessions that the system will allow. When the purgedelay is set to 0, the total number of sessions (active sessions and timed-out sessions) in memory will be equal to the value set for com.iplanet.am.session.maxSessions. If purgedelay is greater than 0, then the total number of sessions (active and timed-out sessions) in memory can be greater than active sessions. The difference will be based on three factors: the purgedelay time , the percentage of timed-out sessions, and the authentication rate. Therefore, when purgedelay is greater than zero, the maximum active sessions value should be reduced accordingly.

The simple way to do this is to look in the AccessManager 7.1sp1 session stats file. The amMasterSessionTable shows the current and peak values for maxSessions (active sessions + timed-out sessions) and maxActive (only active sessions) sessions in memory . Based on this information, the maxSessions value in the stats file limit should not exceed the 90000 limit for a JVM heap size of 3136mb. When the purgedelay is set to 0, only one notification is sent when a session is removed from memory. When the purgedelay is greater than 0, then there will be two notifications for each timed-out session. The number of notifications for timed-out sessions are increased, and now more notification threads are needed. So the notification thread pool size should also be increased.

Performance Tuning Questions and Answers

Question:

How can I improve authentication performance against any LDAP v3 data repository?

Answer:

If the Profile Ignored option is selected in the Access Manager console (go to Realm > Authentication >Advanced Properties), performance may improve. However, improved performance is not guaranteed because the Profile Ignored option prevents applications and policy agents from retrieving the user's profile attributes. The amtune script automatically tunes the LDAP connection pool for the Access Manager root realm which points to the configuration Directory Server instance. But the amtune script will not tune the subrealm you created, the subrealm where the LDAP v3 data repository is configured. You may need to manually tune the LDAP connection pool. After tuning the LDAP connection pool, if poor performance persists, troubleshoot the LDAP v3 repository itself.

Consider limiting the time you spend troubleshooting authentication performance issues. Authentication usually contributes only a small portion of the overall system overhead. Authorizations tend to be slower and a lot more frequent than other processes. But each user session involves only one authentication and multiple authorizations.

Question:

How do I set the JVM heap sizes and other JVM option tuning parameters for a Distributed Authentication UI web application?

Answer:

The web container that will load the Access Manager Distributed Authentication UI web application should have the same heap sizes and the same JVM tuning settings as the web container that runs Access Manager. You can use amtune-ws7, amtune-ws61 or amtune-as8 which come with Access Manager 7.1. You don't need much CPU usage as for Access Manager server machines. It is hard to tell by what ratio one can reduce the number of CPU usage on a Distributed Authentication UI machine. The ratio can be 1:4 or less. Run some load tests for a specific scenario to determine a good ratio.

The reason why a Distributed Authentication UI web container needs the same JVM heap sizes and garbage collection tuning parameters as those for the Access Manager server web containers is that amclientsdk maintains the same number of Access Manager sessions on the client side as on the Access Manager server itself. A Request for Enhancement (CR 6465831) has been filed for removal of the Access Manager sessions in Distributed Authentication UI amclientsdk deployments.

Question:

What is the impact of checking notenforced_list for a set of URIs or URLs on J2EE policy agents?

Answer:

The performance impact of checking notenforced_list is negligible. In general, having a notenforced_list of commonly requested and static content improves the overall system performance.

Question:

What is the impact of using the SSL , for example the NSS library version that comes with JES 4 installer, on the performance of Access Manager 7.0 deployed on Niagara boxes such as T1000 or T2000?

Answer:

If Access Manager 7.0 was installed using JES 4 installer and its default SSL libraries, then the markedly improved performance that comes with NSS 3.11 may not be present and won't be used. Use the NSS libraries version 3.11 or higher when Access Manager 7.0 is deployed on Niagra T1000 or T2000 systems. Go to the Sunsolve web site for downloading the NSS libraries. Note that starting with JES 5 and Access Manager 7.1, the NSS libraries have been upgraded to a version higher than 3.11.

Question:

Why is it so slow to create or delete users if I use a program based on amsdk, but much faster if I use the ldapmodify command?

Answer:

If the same policy is modified for each user, the XML parsing and processing must occur for every user. So you should group as many users as possible with the same one policy, and then add the users to that policy. You should use the same LDAP group or role for as many users as possible in an organization.

Be sure that a policy is not modified or updated for each user. Modifying a policy is an expensive operation since the policy is stored as XML data.

Question:

Is Sun Java Message Queue tuning necessary when session failover is configured for Access Manager?

Answer:

In most deployments using Access Manager session failover, Java Message Queue tuning only requires setting adequate JVM heap and stack sizes. See the Sun Java System Message Queue 2005Q1 Administration Guide at http://docs.sun.com/app/docs/coll/MessageQueue_2005Q1 for further information.

Question:

When the amtune script tunes the Directory Server with the recommended values, an onscreen message says the tuning parameters such as minConnPool and maxConnPool in serverconfig.xml are dependent on the number of Access Manager instances and other factors. How exactly should I tune the Directory Server with these factors taken into account?

Answer:

Values recommended by the amtune script for minConnPool and maxConnPool are per Access Manager server instance. The parameters are stored in /etc/opt/SUNWam/config/serverconfig.xml. The recommended values are based on the following assumptions:

One AM server instance is in front of one Directory Server.
The Directory Server contains both Access Manager configuration data and user data.

When multiple Access Manager instances exist, the total number of persistent LDAP connections may be too high for the Directory Server to handle. Each Access Manager instance establishes its own pool of the same size. Memory allocation is also on the high side if the user data is not stored there. The amtune script assumes the user data is stored together with Access Manager configuration data in the Directory Server.

For example, consider the typical real-world deployment scenario illustrated in the document Deployment Example 1: Access Manager 7.1 Load Balancing, Distributed Authentication UI, and Session Failover. This deployment differs from the amtune script assumptions in the following ways:

Multiple Access Manager instances are in front of the Directory Server.
The Directory Server instance stores only Access Manager configuration data, and does not store user data.
A separate Directory Server stores the user data.

First, if you have a large number of Access Manager instances, you can reduce the recommended pool size for the configuration Directory Server. This only applies when you have large number of Access Manager instances. When you have only two or three Access Manager instances, it may not be necessary to reduce the pool size.

Secondly, you can significantly reduce the memory allocation to the configuration Directory Server . The configuration data is minimal with usually only a few thousands entries. Reducing the memory allocation is particularly important if the configuration Directory Server runs on the same host as the user data Directory Server. You do not want the smaller configuration Directory Server to compete with the larger user data Directory Server for the system memory.

Thirdly, be sure to tune the user data Directory Server. This directory contains a large data set. You can use the amtune recommended Directory Server tuning changes as a starting point. For more information, see step 3 of Related Systems Tuning.

Ultimately, you have to look at your directory data and tune it specifically. This is the standard Directory Server tuning procedure. See the DS Performance Tuning Guide.

Finally, the amtune script does not tune the LDAPv3 data store connection pool which is used by Access Manager to access the user data Directory Server. You have to manually tune the data store connection pool. See step 4 of Related Systems Tuning.

Question:

Where do I find specific performance tuning guidelines for Access Manager implementations on the T2000 platform?

Answer:

The Access Manager amtune script does the automatic tuning specifically for the T2000 platform. No manual tuning is necessary. The following is the tuning specific to T2000, done automatically by the amtune script.

Sun Fire CoolThreads technology servers, specifically Sun Fire T1000 and Sun Fire T2000 servers, contain a single Ultrasparc T1 chip or processor. The T1 processor is a unique design of 8 individual processing units, called cores, sharing one on-chip interconnection. It is somewhat like an 8-way system on a single chip.

Each core supports 4 hardware threads of execution. These hardware threads are scheduled on the core processing unit in round-robin order. A different software thread can run on each one of these hardware threads. Thirty-two software threads can run in parallel on a single T1 processor.

You can determine the number of cores by dividing the number of hardware threads (run psrinfo -v) by 4. The T1000 and T2000 can have a maximum of 4 hardware threads per core. So the number of cores is usually 6 (a 24 thread system) or 8 (a 32-thread system).

The only JVM parameter that would be different for Chip Multi-threading (CMT) servers is the following parameter

-XX:ParallelGCThreads=N

By default, if the parameter is not set, the value of ParallelGCThreads would be the same number as the number of hardware threads (either 24 or 32) on the T1000 and 2000. This is unnecessarily high. The amtune script today automatically sets the number of these parallel GC threads to be equal the number of cores in a T1000 or T2000 box.

For more information, see the document Java Tuning Whitepaper at http://java.sun.com/performance/reference/whitepapers/tuning.html#section4.2.1 .

More Resources

For more information on performance tuning and troubleshooting, see the following resources:

Java Performance portal site

http://java.sun.com/javase/technologies/performance.jsp
Java Tuning Whitepaper

http://java.sun.com/performance/reference/whitepapers/tuning.html
Java Hotspot VM Options

http://java.sun.com/javase/technologies/hotspot/vmoptions.jsp
Solaris TCP Tuning Parameters

http://docs.sun.com/app/docs/doc/817-0404/6mg74vsaj?a=view
Understanding Tuning TCP

http://www.sun.com/blueprints/1205/819-5144.pdf
Tuning for Linux platforms

http://docs.sun.com/app/docs/doc/819-4742/6n6sfgme9?l=en&a=view
Java 5.0 Troubleshooting and Diagnostic Guide

http://java.sun.com/j2se/1.5/pdf/jdk50_ts_guide.pdf