2
Monitoring and Improving Application Performance

This chapter contains the following sections:

Importance of Statistics

Before reacting to a problem, collect all possible statistics and get an overall picture of the application. Getting a complete landscape of the system may take considerable effort. But, if data has already been collected and embedded into the application, then this process is much easier.

After collecting as much initial data as possible, outline issues found from the statistics, the same way doctors collect symptoms from patients. Reacting to symptoms too early in the performance analysis process generally results in an incorrect analysis, which wastes time later. For example, it is extremely risky for a doctor to prescribe open heart surgery for a patient who complains of chest pains on the initial consultation.

Operating System Statistics

Operating system statistics provide information on the usage and performance of the main hardware components of the system, as well as the performance of the operating system itself. This information is crucial for detecting potential resource exhaustion, such as CPU cycles and physical memory, and for detecting bad performance of peripherals, such as disk drives.

Operating system statistics are only an indication of how the hardware and operating system are working. Many system performance analysts react to a hardware resource shortage by installing more hardware. This is a reactionary response to a series of symptoms shown in the operating system statistics. It is always best to consider operating system statistics as a diagnostic tool, similar to the way many doctors use body temperature, pulse rate, and patient pain when making a diagnosis. To help identify bottlenecks, gather operating system statistics for all servers in the system under performance analysis.

Operating system statistics include the following:

CPU Statistics

CPU utilization is the most important operating system statistic in the tuning process. Get CPU utilization for the entire system and for each individual CPU on multi-processor environments. Utilization for each CPU can detect single-threading and scalability issues.

Most operating systems report CPU usage as time spent in user space or mode and time spent in kernel space or mode. These additional statistics allow better analysis of what is actually being executed on the CPU.

On an Oracle data server system, where there is generally only one application running, the server runs database activity in user space. Activities required to service database requests (such as scheduling, synchronization, I/O, memory management, and process/thread creation and tear down) run in kernel mode. In a system where all CPU is fully utilized, a healthy Oracle system runs between 65% and 95% in user space.

Note:

On UNIX systems, where wait for I/O is derived for part of the CPU statistics, this value should be treated as idle time.

Virtual Memory Statistics

Virtual memory statistics should mainly be used as a check to validate that there is very little paging or swapping activity on the system. System performance degrades rapidly and unpredictably when paging or swapping occurs.

Individual process memory statistics can detect memory leaks due to a programming failure to deallocate memory taken from the process heap. These statistics should be used to validate that memory usage does not increase after the system has reached a steady state after startup. This problem is particularly acute on shared server applications on middle tier machines where session state may persist across user interactions, and on completion state information that is not fully deallocated.

Disk Statistics

Because the database resides on a set of disks, the performance of the I/O subsystem is very important to the performance of the database. Most operating systems provide extensive statistics on disk performance. The most important disk statistics are the current response time and the length of the disk queues. These statistics show if the disk is performing optimally or if the disk is being overworked. If a disk shows response times over 20 milliseconds, then it is performing badly or is overworked. This is your bottleneck. If disk queues start to exceed two, then the disk is a potential bottleneck of the system.

Network Statistics

Network statistics can be used in much the same way as disk statistics to determine if a network or network interface is overloaded or not performing optimally. In today's networked applications, network latency can be a large portion of the actual user response time. For this reason, these statistics are a crucial debugging tool.

Database Statistics

Database statistics provide information on the type of load on the database, as well as the internal and external resources used by the database. When database resources become exhausted, it is possible to identify bottlenecks in the application.

Database statistics can be queried directly from the database in a relational manner using SQL. These statistics can be inserted back into the database with the INSERT INTO x AS SELECT ... or CREATE TABLE x AS SELECT ... statements. This is the basis of most snapshot mechanisms that allow statistical gathering over time. Most statistics are contained in a series of virtual tables or views known as the V$ tables, because they are prefixed with V$. These are read only, and they are owned by SYS. Many of the tables contain identifiers and keys that can be joined to other V$ tables.

In order to get meaningful database statistics, the TIMED_STATISTICS parameter must be enabled for the database instance. The performance impact of having TIMED_STATISTICS enabled is minimal compared to instance performance. The performance improvements and debugging value of a complete set of statistics make this parameter crucial to effective performance analysis.

The core database statistics are:

Buffer Cache

The buffer cache manages blocks read from disk into buffers in memory. It also holds information on the most recently used buffers and those modified in normal database operation. To get best query performance, a user query accesses all required data blocks within the buffer cache, thus satisfying the query from memory. However, this might not always happen, because the database is many multiples the size of the buffer cache. With this in mind, it is easy to see that the buffer cache requires management and tuning.

The objective in tuning the buffer cache is to get acceptable user query time by having as many of the required blocks in the cache as possible. Also, eliminate time consuming I/Os without inducing any serialization points or performance spikes as old blocks are aged out of the cache. This process requires a working knowledge of the buffer cache mechanism, the database writer, and the checkpointing mechanism. Most information can be extracted from the V$SYSSTAT table.

Shared Pool

The shared pool contains information about user sessions, shared data structures used by all database sessions, and the dictionary cache.

Querying the shared pool allows analysis of the SQL statements run in the database. This is particularly important if you have limited or no knowledge of the application source code. In addition to the actual SQL, you can determine how many times it is run and how much CPU and disk I/Os are performed by the SQL. This information can be extracted from the V$SQL table. Analyzing this information is crucial in objective bottleneck identification when debugging an unknown application.

Wait Events

In the process of usual database server operations, there are times when processes need to share resources or synchronize with other processes; for example, allocating memory in the shared pool or waiting for a lock. Similarly, there are times when the database process gives control to external code or other processes out of its control; for example, performing I/O and waiting for the log writer to synchronize the redo log.

In these cases, the user process stops working and starts waiting. This wait time becomes part of the eventual user response time. If there are multiple processes queuing on a shared resource or demanding the same external resource, then the database starts to single-thread, and scalability is impacted. Performance analysis should determine why queuing on resources in the database is happening.

The V$SYSTEM_EVENT, V$SESSION_EVENT, and V$SESSION_WAIT tables allow querying of historical wait events or wait events in real time. The V$SESSION_WAIT table has additional columns that can be joined to other V$ tables based on the wait event recorded. These additional join columns specified in V$SESSION_WAIT allow focused drill down and analysis of the wait event.

See Also:

Oracle9i Database Reference for reference information on the V$ tables

Application Statistics

Application statistics are probably the most difficult statistics to get, but they are the most important statistics in measuring any performance improvements made to the system. At a minimum, application statistics should provide a daily summary of user transactions processed for each working period. More complete statistics provide precise details of what transactions were processed and the response times for each transaction type. Detailed statistics also provide statistics on the decomposition of each transaction time spent in the application server, the network, the database, and so on.

The best statistics require considerable instrumentation of the application. This is best built into the application from the start, because it is difficult to retrofit into existing applications.

Statistics Gathering Tools

Operating System Data Gathering Tools

Table 2-1 shows the various tools for gathering operating statistics on UNIX.

Table 2-1 UNIX Tools for Operating Statistics

	UNIX
CPU	sar, vmstat, mpstat, iostat
Memory	sar, vmstat
Disk	sar, iostat
Network	netstat

For Windows NT/2000, use the Performance Monitor tool.

Database Data Gathering Tools

Oracle provides three primary data gathering tools. These tools are increasingly more complex to install and run. However, as they increase in complexity, they provide better reporting output. The tools are:

Statspack

Statspack builds on the BSTAT/ESTAT scripts, but it extends the data capture to store all statistics in a database repository, which allows better baseline setting and offline analysis. The statspack report provides considerably more information than BSTAT/ESTAT in a format useful for bottleneck detection. This mechanism is the best way to record and collect database statistics.
Oracle Enterprise Manager (EM)

Oracle Enterprise Manager provides a graphical user interface for collecting, storing, and reporting performance data. The EM Intelligent Agent data gathering service can collect this performance data on a scheduled basis. A single agent can manage the data collections for all Oracle databases and the operating system of the target node. The data is automatically stored in an historical data repository for performance reporting. Data stored in the repository can be used to analyze many facets of database performance, such as database load, cache allocations and efficiency, resource contention, and high-impact SQL.

Performance data collections can be initiated directly from the EM Console or through the EM Diagnostics Pack - Capacity Planner application. HTML reports of historical performance data can be generated from the EM Console. These reports provide a comprehensive analysis of database system usage and performance, which can be easily accessed and navigated from a browser. EM also provides a graphical real-time Performance Overview for monitoring a subset of these performance metrics using line charts, bar graphs, and so forth.

The Performance Overview charts let you troubleshoot existing performance problems by drilling into performance data to track down the source of a performance bottleneck. For example, a decline in the memory sort percentage can be immediately investigated by drilling down to the sessions and corresponding SQL responsible for high-volume sort activity. High-impact SQL statements discovered through this process can be further investigated by launching SQL diagnostic tools in the context of the problem.

BSTAT/ESTAT scripts

These scripts are located in the $ORACLE_HOME/rdbms/admin directory in files UTLBSTAT.SQL and UTLESTAT.SQL. They produce a simple report of database activity between two points in time. The reports can then be archived over time. These statistics represent the bare minimum of statistics that should be kept.

Note:

BSTAT/ESTAT is not up to date with many of the new features introduced since 8.0, and eventually will be desupported.

Users who require help for performance tuning, and are running any release earlier than Oracle 8.0, are asked to provide Statspack output rather than BSTAT/ESTAT. Statspack is much easier to interpret, provides more detailed information, and makes tuning faster and more effective.

Importance of Historical Data and Baselines

One of the biggest challenges for performance engineers is determining what changed in the system to cause a satisfactory application to start having performance problems. The list of possibilities on a modern complex system is extensive.

Historical performance data is crucial in eliminating as many variables as possible. This means that you should collect operating system, database, and application statistics from the first day an application is rolled out into production. This applies even if the performance is unsatisfactory. As the applications stabilize and the performance characteristics are better understood, a set of statistics becomes the baseline for future reference. These statistics can be used to correlate against a day when performance is not satisfactory. They are also essential for future capacity and growth planning.

Performance Intuition

Database and operating system statistics provide an indication of how well a system is performing. By correlating statistics with actual throughput, you can see how the system is performing and determine where future bottlenecks and resource shortages could exist. This is a skill acquired through the experience of monitoring systems and working with the Oracle server.

CPU utilization is the easiest system usage statistic to understand and monitor. Monitoring this statistic over time, you can see how the system is used during the work day and over a number of weeks. However, this statistic provides no indication of how many business transactions were executed or what resources were used for each transaction.

Two other statistics that give a better indication of actual business transactions executed are the number of commits and the volume of redo generated. These are found in the V$SYSSTAT view under USER COMMITS and REDO SIZE. These statistics show the number of actual transactions and the volume of data changed in the database. If these statistics increase in number over time, and if application and transaction logic are not altered, then you know that more business transactions were executed. The number of logical blocks read (V$SYSSTAT statistic 'session logical reads') also indicates the query workload on a system. Be careful interpreting this number. A change in the number of logical blocks read can be a result of an execution plan change rather than an increase in workload.

With experience, it becomes easy to correlate database statistics with the application workload. A performance DBA learns to use intuition with database statistics and the application profile to determine a system's workload characteristics. A DBA must also anticipate the expected performance of frequently executed transactions. Understanding the core SQL statements in an application is key to performance diagnosis. Much of this activity can be done informally.

For example, a core business transaction is required to run in a subsecond response time. Initial investigation of the transaction shows that this transaction performs 200 logical reads, of which 40 are always obtained from disk. Taking a disk response time of 20 milliseconds, the likely I/O time is 40 x .02 = 0.8 seconds, which probably fails the response time target. The DBA requests that the transaction be rewritten, and the number of logical I/Os is reduced to 80, with an average of five coming from disk.

To avoid poor performance, it is wise to perform this type of calculation before production roll out. The process should be repeated after the system is in production, because the data volumes grow and the transaction statistics can change.

The Oracle Performance Improvement Method

Introduction to Performance Improvement

Oracle's performance methodology helps you to pinpoint performance problems in your Oracle system. This involves identifying bottlenecks and fixing them. It is recommended that changes be made to a system only after you have confirmed that there is a bottleneck.

Performance improvement, by its nature, is iterative. For this reason, removing the first bottleneck might not lead to performance improvement immediately, because another bottleneck might be revealed. Also, in some cases, if serialization points move to a more inefficient sharing mechanism, then performance could degrade. With experience, and by following a rigorous method of bottleneck elimination, applications can be debugged and made scalable.

Performance problems generally result from either a lack of throughput, unacceptable user/job response time, or both. The problem might be localized between application modules, or it might be for the entire system.

Before looking at any database or operating system statistics, it is crucial to get feedback from the most important components of the system: the users of the system and the people ultimately paying for the application. Typical user feedback includes statements like the following:

"The online performance is so bad that it prevents my staff from doing their jobs."
"The billing run takes too long."
"When I experience high amounts of Web traffic, the response time becomes unacceptable, and I am losing customers."
"I am currently performing 5000 trades a day, and the system is maxed out. Next month, we roll out to all our users, and the number of trades is expected to quadruple."

From candid feedback, it is easy to set critical success factors for any performance work. Determining the performance targets and the performance engineer's exit criteria make managing the performance process much simpler and more successful at all levels. These critical success factors are better defined in terms of real business goals rather than system statistics.

Some real business goals for these typical user statements might be:

"The billing run must process 1,000,000 accounts in a three-hour window."
"At a peak period on a Web site, the response time will not exceed five seconds for a page refresh."
"The system must be able to process 25,000 trades in an eight-hour window."

The ultimate measure of success is the user's perception of system performance. The performance engineer's role is to eliminate any bottlenecks that degrade performance. These bottlenecks could be caused by inefficient use of limited shared resources or by abuse of shared resources, causing serialization. Because all shared resources are limited, the goal of a performance engineer is to maximize the number of business operations with efficient use of shared resources. At a very high level, the entire database server can be seen as a shared resource. Conversely, at a low level, a single CPU or disk can be seen as shared resources.

The Oracle performance improvement method can be applied until performance goals are met or deemed impossible. This process is highly iterative, and it is inevitable that some investigations will be made that have little impact on the performance of the system. It takes time and experience to develop the necessary skills to accurately pinpoint critical bottlenecks in a timely manner. However, prior experience can sometimes work against the experienced engineer who neglects to use the data and statistics available to him. It is this type of behavior that encourages database tuning by myth and folklore. This is a very risky, expensive, and unlikely to succeed method of database tuning.

Today's systems are so different and complex that hard and fast rules for performance analysis cannot be made. In essence, the Oracle performance improvement method defines a way of working, but not a definitive set of rules. With bottleneck detection, the only rule is that there are no rules! The best performance engineers use the data provided and think laterally to determine performance problems.

Steps in The Oracle Performance Improvement Method

Get candid feedback from users. Determine the performance project's scope and subsequent performance goals, as well as performance goals for the future. This process is key in future capacity planning.
Get a full set of operating system, database, and application statistics from the system when the performance is both good and bad. If these are not available, then get whatever is available. Missing statistics are analogous to missing evidence at a crime scene: They make detectives work harder and it is more time-consuming.
Sanity-check the operating systems of all machines involved with user performance. By sanity-checking the operating system, you look for hardware or operating system resources that are fully utilized. List any over-used resources as symptoms for analysis later. In addition, check that all hardware shows no errors or diagnostics.
Check for the top ten most common mistakes with Oracle, and determine if any of these are likely to be the problem. List these as symptoms for analysis later. These are included because they represent the most likely problems.

See Also:
"Top Ten Mistakes Found in Oracle Systems"
Build a conceptual model of what is happening on the system using the symptoms as clues to understand what caused the performance problems.

See Also:
"A Sample Decision Process for Performance Conceptual Modeling"
Propose a series of remedy actions and the anticipated behavior to the system, and apply them in the order that can benefit the application the most. A golden rule in performance work is that you only change one thing at a time and then measure the differences. Unfortunately, system downtime requirements might prohibit such a rigorous investigation method. If multiple changes are applied at the same time, then try to ensure that they are isolated.
Validate that the changes made have had the desired effect, and see if the user's perception of performance has improved. Otherwise, look for more bottlenecks, and continue refining the conceptual model until your understanding of the application becomes more accurate.
Repeat the last three steps until performance goals are met or become impossible due to other constraints.

This method identifies the biggest bottleneck and uses an objective approach to performance improvement. The focus is on making large performance improvements by increasing application efficiency and eliminating resource shortages and bottlenecks. In this process, it is anticipated that minimal (less than 10%) performance gains are made from instance tuning, and large gains (100% +) are made from isolating application inefficiencies.

How to Check the Operating System

The following list should be considered when checking operating system symptoms.
- Check CPU utilization in user and kernel space for the total system and on each CPU.
- Confirm that there is no paging or swapping.
- Check that network latencies between machines are acceptable.
- Find disks with poor response times or long queues.
- Confirm that there are no hardware errors.
A Sample Decision Process for Performance Conceptual Modeling

Conceptual modeling is almost deterministic. However, as your performance tuning experience increases, you will appreciate that there are no real rules to follow. A flexible "heads up" approach is required to interpret the various statistics and make good decisions.

This section illustrates how a performance engineer might look for bottlenecks. Use this only as a guideline for the process. With experience, performance engineers add to the steps involved. This analysis assumes that statistics for both the operating system and the database have been gathered.
1. Is the response time/batch run time acceptable for a single user on an empty or lightly loaded machine?
  
  If it is not acceptable, then the application is probably not coded or designed optimally, and it will never be acceptable in a multiple user situation when system resources are shared. In this case, get application internal statistics, and get SQL Trace and SQL plan information. Work with developers to investigate problems in data, index, transaction SQL design, and potential deferral of work to batch/background processing.
2. Is all the CPU being utilized?
  
  If the kernel utilization is over 40%, then investigate the operating system for network transfers, paging, swapping, or process thrashing. Otherwise, move onto CPU utilization in user space. Check to see if there are any non-database jobs consuming CPU on the machine limiting the amount of shared CPU resources, such as backups, file transforms, print queues, and so on. After determining that the database is using most of the CPU, investigate the top SQL by CPU utilization. These statements form the basis of all future analysis. Check the SQL and the transactions submitting the SQL for optimal execution. In Oracle Server releases prior to 9i, use buffer gets as the measure for CPU usage. With release 9i, Oracle provides the actual CPU statistics in V$SQL.
  
  See Also:
  Oracle9i Database Reference for more information on V$SQL
  
  If the application is optimal and there are no inefficiencies in the SQL execution, consider rescheduling some work to off-peak hours or using a bigger machine.
3. At this point, the system performance is unsatisfactory, yet the CPU resources are not fully utilized.
  
  In this case, you have serialization and unscalable behavior within the server. Get the WAIT_EVENTS statistics from the server, and determine the biggest serialization point. If there are no serialization points, then the problem is most likely outside the database, and this should be the focus of investigation. Elimination of WAIT_EVENTS involves modifying application SQL and tuning database parameters. This process is very iterative and requires the ability to drill down on the WAIT_EVENTS systematically to eliminate serialization points.
Top Ten Mistakes Found in Oracle Systems

This section lists the most common mistakes found in Oracle systems. By following Oracle's performance improvement methodology, you should be able to avoid these mistakes altogether. If you find these mistakes in your system, then re-engineer the application where the performance effort is worthwhile.

See Also:
Oracle9i Database Performance Tuning Guide and Reference for more information on wait events
1. Bad Connection Management
  
  The application connects and disconnects for each database interaction. This problem is common with stateless middleware in application servers. It has over two orders of magnitude impact on performance, and it is totally unscalable.
2. Bad Use of Cursors and the Shared Pool
  
  Not using cursors results in repeated parses. If bind variables are not used, then there is hard parsing of all SQL statements. This has an order of magnitude impact in performance, and it is totally unscalable. Use cursors with bind variables that open the cursor and execute it many times. Be suspicious of applications generating dynamic SQL.
3. Getting Database I/O Wrong
  
  Many sites lay out their databases poorly over the available disks. Other sites specify the number of disks incorrectly, because they configure disks by disk space and not I/O bandwidth.
4. Redo Log Setup Problems
  
  Many sites run with too few redo logs that are too small. Small redo logs cause system checkpoints to continuously put a high load on the buffer cache and I/O system. If there are too few redo logs, then the archive cannot keep up, and the database will wait for the archive process to catch up.
5. Serialization of data blocks in the buffer cache due to lack of free lists, free list groups, transaction slots (INITRANS), or shortage of rollback segments.
  
  This is particularly common on INSERT-heavy applications, in applications that have raised the block size to 8K or 16K, or in applications with large numbers of active users and few rollback segments.
6. Long Full Table Scans
  
  Long full table scans for high-volume or interactive online operations could indicate poor transaction design, missing indexes, or poor SQL optimization. Long table scans, by nature, are I/O intensive and unscalable.
7. In Disk Sorting
  
  In disk sorts for online operations could indicate poor transaction design, missing indexes, or poor SQL optimization. Disk sorts, by nature, are I/O-intensive and unscalable.
8. High Amounts of Recursive (SYS) SQL
  
  Large amounts of recursive SQL executed by SYS could indicate space management activities, such as extent allocations, taking place. This is unscalable and impacts user response time. Recursive SQL executed under another user ID is probably SQL and PL/SQL, and this is not a problem.
9. Schema Errors and Optimizer Problems
  
  In many cases, an application uses too many resources because the schema owning the tables has not been successfully migrated from the development environment or from an older implementation. Examples of this are missing indexes or incorrect statistics. These errors can lead to sub-optimal execution plans and poor interactive user performance. When migrating applications of known performance, export the schema statistics to maintain plan stability using the DBMS_STATS package.
  
  Likewise, optimizer parameters set in the initialization parameter file can override proven optimal execution plans. For these reasons, schemas, schema statistics, and optimizer settings should be managed together as a group to ensure consistency of performance.
10. Use of Nonstandard Initialization Parameters
  
  These might have been implemented based on poor advice or incorrect assumptions. In particular, parameters associated with SPIN_COUNT on latches and undocumented optimizer features can cause a great deal of problems that can require considerable investigation.
Performance Characteristics of Hardware Configurations

Again, today's systems are so different and complex that hard and fast rules for performance analysis cannot be made. However, this section provides some of the numbers that you should consider.
1. Disk characteristics:
  - Size 512MB - 36GB
  - Seek 5 - 10msec
  - Transfer 5 - 10msec
  - Thoughput 20 - 40 I/O seconds for each disk
  - Controller throughput at 1750 I/Os a second
2. Speed reading from memory should be 1 - 10 microseconds.
3. Point-to-point network latencies should be 1 - 25msec.
4. Busy systems: (worst case)
  - Operational systems - 60% usr, 40% sys
  - Decision support systems - 90% usr, 10% sys

2 Monitoring and Improving Application Performance