Skip Headers

Oracle9i Database Performance Planning
Release 2 (9.2)

Part Number A96532-01
Go To Documentation Library
Home
Go To Product List
Book List
Go To Table Of Contents
Contents
Go To Index
Index

Master Index

Feedback

Go to previous page Go to next page

3
Emergency Performance Techniques

This chapter contains the following sections:

Introduction to Emergency Performance Techniques

This chapter provides techniques for dealing with performance emergencies. Hopefully, you have had the opportunity to read the first two chapters of this book, where a detailed methodology is defined for establishing and improving application performance. However, in an emergency situation, a component of the system has changed to transform it from a reliable, predictable system to one that is unpredictable and not satisfying user requests.

In this case, the role of the performance engineer is to rapidly determine what has changed and take appropriate actions to resume normal service as quickly as possible. In many cases, it is necessary to take immediate action, and a rigorous performance improvement project is unrealistic.

After addressing the immediate performance problem, the performance engineer must collect sufficient debugging information either to get better clarity on the performance problem or to at least ensure that it does not happen again.

The method for debugging emergency performance problems is the same as the method described in the performance improvement method earlier in this book. However, shortcuts are taken in various stages because of the timely nature of the problem. Keeping detailed notes and records of facts found as the debugging process progresses is essential for later analysis and justification of any remedial actions. This is analogous to a doctor keeping good patient notes for future reference.

Steps in the Emergency Performance Method

The Emergency Performance Method is as follows:

  1. Survey the performance problem and collect the symptoms of the performance problem. This process should include the following:
    • User feedback on how the system is underperforming. Is the problem throughput or response time?
    • Ask the question, "What has changed since we last had good performance?" This answer can give clues to the problem; however, getting unbiased answers in an escalated situation can be difficult.
  2. Sanity-check the hardware utilization of all components of the application system. Check where the highest CPU utilization is, and check the disk, memory usage, and network performance on all the system components. This quick process identifies which tier is causing the problem. If the problem is in the application, then shift analysis to application debugging. Otherwise, move on to database server analysis.
  3. Determine if the database server is constrained on CPU or if it is spending time waiting on wait events. If the database server is CPU-constrained, then investigate the following:
    • Sessions that are consuming large amounts of CPU at the operating system level
    • Sessions or statements that perform many buffer gets at the database level (check V$SESSTAT, V$SQL)
    • Execution plan changes causing sub-optimal SQL execution (these can be difficult to locate)
    • Incorrect setting of initialization parameters
    • Algorithmic issues as a result of code changes or upgrades of all components

    If the database sessions are waiting on events, then follow the wait events listed in V$SESSION_WAIT to determine what is causing serialization. In cases of massive contention for the library cache, it might not be possible to logon or submit SQL to the database. In this case, use historical data to determine why there is suddenly contention on this latch. If most waits are for I/O, then sample the SQL being run by the sessions that are performing all of the I/Os.

  4. Apply emergency action to stabilize the system. This could involve actions that take parts of the application off-line or restrict the workload that can be applied to the system. It could also involve a system restart or the termination of job in process. These naturally have service level implications.
  5. Validate that the system is stable. Having made changes and restrictions to the system, validate that the system is now stable, and collect a reference set of statistics for the database. Now follow the rigorous performance method described earlier in this book to bring back all functionality and users to the system. This process may require significant application re-engineering before it is complete.

    See Also:

    Chapter 2, "Monitoring and Improving Application Performance" for detailed information on the performance improvement method and a list of the most common mistakes made with Oracle


Go to previous page Go to next page
Oracle
Copyright © 2000, 2002 Oracle Corporation.

All Rights Reserved.
Go To Documentation Library
Home
Go To Product List
Book List
Go To Table Of Contents
Contents
Go To Index
Index

Master Index

Feedback