Oracle Access Management is a business critical system; downtime comes with a potentially high cost to your business. The goal of system analysis is to quickly isolate and correct the cause of any problem. This requires a big picture view of your system and the tools to observe the live system and correlate components to the bigger picture.
To assist Administrators in performing a quick diagnosis, this section provides the following topics:
System analysis includes understanding how the product works, what can go wrong, how likely the scenarios are, and the consequences or observable issues.
System problems can be divided into two basic categories:
Cascading catastrophic failure
Gradual breakdown in performance
Cascading catastrophic failure might be caused by:
LDAP server is loaded and unresponsive
Morning peak load starts
Webgates send requests to the primary OAM Server
Webgate requests time-out and Webgates retry to secondary OAM Server
Gradual breakdown in performance might occur over time when, for example:
OAM is sized and rolled out for 10,000 users and 500 groups
Over the course of a year, the number of users and groups increases significantly (to 50,000 users and 250 groups for example)
For information on the most commonly encountered issues, see the following topics:
This topic provides symptoms, probable cause, and steps to diagnose the following issues:
Symptoms: Operational Slowness
Poor user experience
Agent time outs lead to retries
Cause
Non-OAM load might be impacting OAM operations
Capacity problems due to gradual increase in peak load
Symptoms: Total loss of service
Total loss of service
Cause
Outage of all LDAP servers
The load balancer is timing out old connections
Diagnosis
Shut down the LDAP server.
Restart your browser.
Try to access a protected site.
Review errors in the OAM Server log file, as described in Logging Component Event Messages (alternatively, in Monitoring Performance and Logs with Fusion Middleware Control).
Try to access Oracle Access Management Console.
Observe errors in WebLogic AdminServer log file.
Bring up the LDAP server again.
Retry access to a protected application.
Retry access to the Oracle Access Management Console.
Correct the issue based on the requirements in your environment.
This topic provides symptoms, probable cause, and steps to diagnose the following issues:
Symptoms: Capacity Problems
Poor user experience due to slow operations
Agent time outs and retry can result in extra load
Cause
CPU cycles
Memory issues
Symptoms: Interference with Other Services on the Host
Poor user experience due to slow operations
Agent time outs and retry may result in extra load
Cause
CPU cycle contention
Memory contention
File system full
Diagnosis: OAM Server
Shut down the OAM Server
Try to access a Webgate or mod_osso protected resource
Bring up the OAM Server
Use the Access Tester to test authentication and authorization as described in Validating Connectivity and Policies Using the Access Tester.
Use 'top' to figure out the CPU and Memory consumption of the OAM Server as you use the access tester
Get a thread dump of the OAM Server.
Diagnosis: AdminServer
Shut down the AdminServer.
Restart your browser and access a protected resource, which should work.
Use remote registration to register a new partner, as described in Registering and Managing OAM 11g Agents (this should fail).
Startup OAM AdminServer.
This topic provides symptoms, probable cause, and steps to diagnose time issues between agents and servers.
Symptoms
Difference in Clock time Between Agent and Server
High CPU usage at both agent and server
User experiences a system hang
Cause
Agent thinks the token issued by the server is invalid
Agent keeps going back to the server to re-issue the token
Diagnosis
Access protected resource.
Confirm: Client access hangs.
Confirm: High CPU usage on agent and server.
The audit and session functions are both write intensive operations. The policy database can be tuned for read intensive service.
Symptoms
Audit and session operations are slow
File system on the OAM Server is full with audit data that is not yet written to the database
Loss of in-memory session data when one of the servers in the cluster fails
Cause
Database is not tuned for write intensive operations
Database is unavailable due to maintenance
Space issues in the database
Diagnosis
Shut down the database used to store Audit and Session data.
Try to access a protected resource.
Review error and warning messages in the OAM Server log files, as described in Logging Component Event Messages (alternatively, in Monitoring Performance and Logs with Fusion Middleware Control).
This topic provides symptoms, probable cause, and steps to diagnose the following issues:
Symptoms
Changes to policy do not take immediate effect
Changes to system configuration do not take immediate effect
Cause
Servers being too busy handling runtime requests (CPU contention)
Coherence network slowness
Diagnosis
This topic provides symptoms, probable cause, and steps to diagnose policy database issues.
Symptoms
No policy changes are allowed; no impact on runtime
Cause
Database is unavailable (down for maintenance)
Space issues in the database
Diagnosis
Shut down the database containing OAM policies.
Try to access a protected resource and observe the runtime access is not impacted.
Try to access the Oracle Access Management Console to edit policies, and then observe errors in the AdminServer log file.