E.1 Introduction to Oracle Access Management Troubleshooting

Oracle Access Management is a business critical system; downtime comes with a potentially high cost to your business. The goal of system analysis is to quickly isolate and correct the cause of any problem. This requires a big picture view of your system and the tools to observe the live system and correlate components to the bigger picture.

E.1.1 System Analysis and Problem Scenarios

System analysis includes understanding how the product works, what can go wrong, how likely the scenarios are, and the consequences or observable issues.

System problems can be divided into two basic categories:

  • Cascading catastrophic failure

  • Gradual breakdown in performance

Cascading catastrophic failure might be caused by:

  • LDAP server is loaded and unresponsive

  • Morning peak load starts

  • Webgates send requests to the primary OAM Server

  • Webgate requests time-out and Webgates retry to secondary OAM Server

Gradual breakdown in performance might occur over time when, for example:

  • OAM is sized and rolled out for 10,000 users and 500 groups

  • Over the course of a year, the number of users and groups increases significantly (to 50,000 users and 250 groups for example)

For information on the most commonly encountered issues, see the following topics:

E.1.2 LDAP Server or Identity Store Issues

This topic provides symptoms, probable cause, and steps to diagnose the following issues:

Symptoms: Operational Slowness

  • Poor user experience

  • Agent time outs lead to retries

Cause

  • Non-OAM load might be impacting OAM operations

  • Capacity problems due to gradual increase in peak load

Symptoms: Total loss of service

Total loss of service

Cause

  • Outage of all LDAP servers

  • The load balancer is timing out old connections

Diagnosis

  1. Shut down the LDAP server.

  2. Restart your browser.

  3. Try to access a protected site.

  4. Review errors in the OAM Server log file, as described in Logging Component Event Messages (alternatively, in Monitoring Performance and Logs with Fusion Middleware Control).

  5. Try to access Oracle Access Management Console.

  6. Observe errors in WebLogic AdminServer log file.

  7. Bring up the LDAP server again.

  8. Retry access to a protected application.

  9. Retry access to the Oracle Access Management Console.

  10. Correct the issue based on the requirements in your environment.

E.1.3 OAM Server or Host Issues

This topic provides symptoms, probable cause, and steps to diagnose the following issues:

Symptoms: Capacity Problems

  • Poor user experience due to slow operations

  • Agent time outs and retry can result in extra load

Cause

  • CPU cycles

  • Memory issues

Symptoms: Interference with Other Services on the Host

  • Poor user experience due to slow operations

  • Agent time outs and retry may result in extra load

Cause

  • CPU cycle contention

  • Memory contention

  • File system full

Diagnosis: OAM Server

  1. Shut down the OAM Server

  2. Try to access a Webgate or mod_osso protected resource

  3. Bring up the OAM Server

  4. Use the Access Tester to test authentication and authorization as described in Validating Connectivity and Policies Using the Access Tester.

  5. Use 'top' to figure out the CPU and Memory consumption of the OAM Server as you use the access tester

  6. Get a thread dump of the OAM Server.

Diagnosis: AdminServer

  1. Shut down the AdminServer.

  2. Restart your browser and access a protected resource, which should work.

  3. Use remote registration to register a new partner, as described in Registering and Managing OAM 11g Agents (this should fail).

  4. Startup OAM AdminServer.

E.1.4 Agent-Side Configuration and Load Issues

This topic provides symptoms, probable cause, and steps to diagnose time issues between agents and servers.

Symptoms

Difference in Clock time Between Agent and Server

  • High CPU usage at both agent and server

  • User experiences a system hang

Cause

  • Agent thinks the token issued by the server is invalid

  • Agent keeps going back to the server to re-issue the token

Diagnosis

  1. Access protected resource.

  2. Confirm: Client access hangs.

  3. Confirm: High CPU usage on agent and server.

E.1.5 Runtime Database (Audit or Session Data) Issues

The audit and session functions are both write intensive operations. The policy database can be tuned for read intensive service.

Symptoms

  • Audit and session operations are slow

  • File system on the OAM Server is full with audit data that is not yet written to the database

  • Loss of in-memory session data when one of the servers in the cluster fails

Cause

  • Database is not tuned for write intensive operations

  • Database is unavailable due to maintenance

  • Space issues in the database

Diagnosis

  1. Shut down the database used to store Audit and Session data.

  2. Try to access a protected resource.

  3. Review error and warning messages in the OAM Server log files, as described in Logging Component Event Messages (alternatively, in Monitoring Performance and Logs with Fusion Middleware Control).

E.1.6 Change Propagation or Activation Issues

This topic provides symptoms, probable cause, and steps to diagnose the following issues:

Symptoms

  • Changes to policy do not take immediate effect

  • Changes to system configuration do not take immediate effect

Cause

  • Servers being too busy handling runtime requests (CPU contention)

  • Coherence network slowness

E.1.7 Policy Store Database Issues

This topic provides symptoms, probable cause, and steps to diagnose policy database issues.

Symptoms

No policy changes are allowed; no impact on runtime

Cause

  • Database is unavailable (down for maintenance)

  • Space issues in the database

Diagnosis

  1. Shut down the database containing OAM policies.

  2. Try to access a protected resource and observe the runtime access is not impacted.

  3. Try to access the Oracle Access Management Console to edit policies, and then observe errors in the AdminServer log file.