|Oracle® Database Performance Tuning Guide
10g Release 1 (10.1)
Part Number B10752-01
This chapter discusses the gathering of performance statistics. This chapter contains the following topics:
Oracle Database 2 Day DBA for information on monitoring and tuning the database
To effectively diagnose performance problems, statistics must be available. Oracle generates many types of cumulative statistics for the system, sessions, and individual SQL statements. Oracle also tracks cumulative statistics on segments and services. When analyzing a performance problem in any of these scopes, you typically look at the change in statistics (delta value) over the period of time you are interested in. Specifically, you look at the difference between the cumulative value of a statistic at the start of the period and the cumulative value at the end.
Cumulative values for statistics are generally available through dynamic performance views, such as the
V$SYSSTAT views. Note that the cumulative values in dynamic views are reset when the database instance is shutdown. The Automatic Workload Repository (AWR) automatically persists the cumulative and delta values for most of the statistics at all levels except the session level. This process is repeated on a regular time period and the result is called an AWR snapshot. The delta values captured by the snapshot represent the changes for each statistic over the time period. See "Automatic Workload Repository".
Another type of statistic collected by Oracle is called a metric. A metric is defined as the rate of change in some cumulative statistic. That rate can be measured against a variety of units, including time, transactions, or database calls. For example, the number database calls per second is a metric. Metric values are exposed in some V$ views, where the values are the average over a fairly small time interval, typically 60 seconds. A history of recent metric values is available through V$ views, and some of the data is also persisted by AWR snapshots.
A third type of statistical data collected by Oracle is sampled data. This sampling is performed by the active session history (ASH) sampler. ASH samples the current state of all active sessions. This data is collected into memory and can be accessed by a V$ view. It is also written out to persistent store by the AWR snapshot processing. See "Active Session History (ASH)".
A powerful tool for diagnosing performance problems is the use of statistical baselines. A statistical baseline is collection of statistic rates usually taken over time period where the system is performing well at peak load. Comparing statistics captured during a period of bad performance to a baseline helps discover specific statistics that have increased significantly and could be the cause of the problem.
AWR supports the capture of baseline data by enabling you to specify and preserve a pair or range of AWR snapshots as a baseline. Carefully consider the time period you choose as a baseline; the baseline should be a good representation of the peak load on the system. In the future, you can compare these baselines with snapshots captured during periods of poor performance.
Oracle Enterprise Manager is the recommended tool for viewing both real time data in the dynamic performance views and historical data from the AWR history tables. Enterprise manager also is able to capture operating system and network statistical data that can be correlated with AWR data.
Database statistics provide information on the type of load on the database, as well as the internal and external resources used by the database. This section describes some of the more important statistics.
Wait events are statistics that are incremented by a server process/thread to indicate that it had to wait for an event to complete before being able to continue processing. Wait event data reveals various symptoms of problems that might be impacting performance, such as latch contention, buffer contention, and I/O contention.
To enable easier high-level analysis of the wait events, the events are grouped into classes. The wait event classes include: Administrative, Application, Cluster, Commit, Concurrency, Configuration, Idle, Network, Other, Scheduler, System I/O, and User I/O.
The wait classes are based on a common solution that usually applies to fixing a problem with the wait event. For example, exclusive TX locks are generally an application level issue and HW locks are generally a configuration issue.
The following list includes common examples of the waits in some of the classes:
Oracle Database Reference for more information about Oracle wait events
When tuning an Oracle system, each component has its own set of statistics. To look at the system as a whole, it is necessary to have a common scale for comparisons. Because of this, most Oracle advisories and reports describe statistics in terms of time. In addition, the
V$SYS_TIME_MODEL views provide time model statistics. Using the common time instrumentation helps to identify quantitative effects on the database operations.
The most important of the time model statistics is
time. This statistics represents the total time spent in database calls and is a indicator of the total instance workload. It is calculated by aggregating the CPU and wait times of all sessions not waiting on idle wait events (non-idle user sessions).
time is measured cumulatively from the time that the instance was started. Because
time it is calculated by combining the times from all non-idle user sessions, it is possible that the
time can exceed the actual time elapsed since the instance started up. For example, a instance that has been running for 30 minutes could have four active user sessions whose cumulative
time is approximately 120 minutes.
The objective for tuning an Oracle system could be stated as reducing the time that users spend in performing some action on the database, or simply reducing
time. Other time model statistics provide quantitative effects (in time) on specific actions, such as logon operations and hard and soft parses.
Oracle Database Reference for information about the
V$ACTIVE_SESSION_HISTORY view provides sampled session activity in the instance. Active sessions are sampled every second and are stored in a circular buffer in SGA. Any session that is connected to the database and is waiting for an event that does not belong to the Idle wait class is considered as an active session. This includes any session that was on the CPU at the time of sampling.
Each session sample is a set of rows and the
V$ACTIVE_SESSION_HISTORY view returns one row for each active session per sample, returning the latest session sample rows first. Because the active session samples are stored in a circular buffer in SGA, the greater the system activity, the smaller the number of seconds of session activity that can be stored in the circular buffer. This means that the duration for which a session sample appears in the V$ view, or the number of seconds of session activity that is displayed in the V$ view, is completely dependent on the database activity.
As part of the Automatic Workload Repository (AWR) snapshots, the content of
V$ACTIVE_SESSION_HISTORY is also flushed to disk. Because the content of this
V$ view can get quite large during heavy system activity, only a portion of the session samples is written to disk.
By capturing only active sessions, a manageable set of data is represented with the size being directly related to the work being performed rather than the number of sessions allowed on the system. Using the Active Session History enables you to examine and perform detailed analysis on both current data in the
V$ACTIVE_SESSION_HISTORY view and historical data in the
DBA_HIST_ACTIVE_SESS_HISTORY view, often avoiding the need to replay the workload to gather additional performance tracing information. The data present in ASH can be rolled up on various dimensions that it captures, including the following:
Oracle Database Reference for more information about the
A large number of cumulative database statistics are available on a system and session level through the
Oracle Database Reference for information about the
Operating system statistics provide information on the usage and performance of the main hardware components of the system, as well as the performance of the operating system itself. This information is crucial for detecting potential resource exhaustion, such as CPU cycles and physical memory, and for detecting bad performance of peripherals, such as disk drives.
Operating system statistics are only an indication of how the hardware and operating system are working. Many system performance analysts react to a hardware resource shortage by installing more hardware. This is a reactionary response to a series of symptoms shown in the operating system statistics. It is always best to consider operating system statistics as a diagnostic tool, similar to the way many doctors use body temperature, pulse rate, and patient pain when making a diagnosis. To help identify bottlenecks, gather operating system statistics for all servers in the system under performance analysis.
Operating system statistics include the following:
For information on tools for gathering operating statistics, see "Operating System Data Gathering Tools".
CPU utilization is the most important operating system statistic in the tuning process. Get CPU utilization for the entire system and for each individual CPU on multi-processor environments. Utilization for each CPU can detect single-threading and scalability issues.
Most operating systems report CPU usage as time spent in user space or mode and time spent in kernel space or mode. These additional statistics allow better analysis of what is actually being executed on the CPU.
On an Oracle data server system, where there is generally only one application running, the server runs database activity in user space. Activities required to service database requests (such as scheduling, synchronization, I/O, memory management, and process/thread creation and tear down) run in kernel mode. In a system where all CPU is fully utilized, a healthy Oracle system runs between 65% and 95% in user space.
V$OSSTAT view captures machine level information in the database making it easier for you to determine if there are hardware level resource issues. The
V$SYS_TIME_MODEL supplies statistics on the CPU usage by the Oracle database. Using both sets of statistics enable you to determine whether the Oracle database or other system activity is the cause of the CPU problems.
Virtual memory statistics should mainly be used as a check to validate that there is very little paging or swapping activity on the system. System performance degrades rapidly and unpredictably when paging or swapping occurs.
Individual process memory statistics can detect memory leaks due to a programming failure to deallocate memory taken from the process heap. These statistics should be used to validate that memory usage does not increase after the system has reached a steady state after startup. This problem is particularly acute on shared server applications on middle tier machines where session state may persist across user interactions, and on completion state information that is not fully deallocated.
Because the database resides on a set of disks, the performance of the I/O subsystem is very important to the performance of the database. Most operating systems provide extensive statistics on disk performance. The most important disk statistics are the current response time and the length of the disk queues. These statistics show if the disk is performing optimally or if the disk is being overworked.
Measure the normal performance of the I/O system; typical values for a single block read range from 5 to 20 milliseconds, depending on the hardware used. If the hardware shows response times much higher than the normal performance value, then it is performing badly or is overworked. This is your bottleneck. If disk queues start to exceed two, then the disk is a potential bottleneck of the system.
Network statistics can be used in much the same way as disk statistics to determine if a network or network interface is overloaded or not performing optimally. In today's networked applications, network latency can be a large portion of the actual user response time. For this reason, these statistics are a crucial debugging tool. See "Using Dynamic Performance Views for Network Performance".
Table 5-1 shows the various tools for gathering operating statistics on UNIX. For Windows NT/2000, use the Performance Monitor tool.
sar, vmstat, mpstat, iostat
When initially examining performance data, you can formulate potential theories by examining your statistics. One way to ensure that your interpretation of the statistics is correct is to perform cross-checks with other data. This establishes whether a statistic or event is really of interest.
Some pitfalls are discussed in the following sections:
When tuning, it is common to compute a ratio that helps determine whether there is a problem. Such ratios include the buffer cache hit ratio, the soft-parse ratio, and the latch hit ratio. These ratios should not be used as 'hard and fast' identifiers of whether there is or is not a performance bottleneck. Rather, they should be used as indicators. In order to identify whether there is a bottleneck, other related evidence should be examined. See "Calculating the Buffer Cache Hit Ratio".
TIMED_STATISTICS to true at the instance level directs the Oracle server to gather wait time for events, in addition to wait counts already available. This data is useful for comparing the total wait time for an event to the total elapsed time between the performance data collections. For example, if the wait event accounts for only 30 seconds out of a two hour period, then there is probably little to be gained by investigating this event, even though it may be the highest ranked wait event when ordered by time waited. However, if the event accounts for 30 minutes of a 45 minute period, then the event is worth investigating. See "Wait Events Statistics".
Timed statistics are automatically collected for the database if the initialization parameter
If you explicitly set
When looking at statistics, it is important to consider other factors that influence whether the statistic is of value. Such factors include the user load and the hardware capability. Even an event that had a wait of 30 minutes in a 45 minute snapshot might not be indicative of a problem if you discover that there were 2000 users on the system, and the host hardware was a 64 node machine.
TIMED_STATISTICS is false, then the amount of time waited for an event is not available. Therefore, it is only possible to order wait events by the number of times each event was waited for. Although the events with the largest number of waits might indicate the potential bottleneck, they might not be the main bottleneck. This can happen when an event is waited for a large number of times, but the total time waited for that event is small. The converse is also true: an event with fewer waits might be a problem if the wait time is a significant proportion of the total wait time. Without having the wait times to use for comparison, it is difficult to determine whether a wait event is really of interest.
Oracle uses some wait events to indicate if the Oracle server process is idle. Typically, these events are of no value when investigating performance problems, and they should be ignored when examining the wait events. See "Idle Wait Events".
When interpreting computed statistics (such as rates, statistics normalized over transactions, or ratios), it is important to cross-verify the computed statistic with the actual statistic counts. This confirms whether the derived rates are really of interest: small statistic counts usually can discount an unusual ratio. For example, on initial examination, a soft-parse ratio of 50% generally indicates a potential tuning area. If, however, there was only one hard parse and one soft parse during the data collection interval, then the soft-parse ratio would be 50%, even though the statistic counts show this is not an area of concern. In this case, the ratio is not of interest due to the low raw statistic counts.
The Automatic Workload Repository (AWR) collects, processes, and maintains performance statistics for problem detection and self-tuning purposes. This data is both in memory and stored in the database. The gathered data can be displayed in both reports and views. See "Workload Repository Views" and "Workload Repository Reports".
The statistics collected and processed by AWR include:
AWR automatically generates snapshots of the performance data once every hour and collects the statistics in the workload repository. You can also manually create snapshots, but this is usually not necessary. The data in the snapshot interval is then analyzed by the Automatic Database Diagnostic Monitor (ADDM). See "Automatic Database Diagnostic Monitor".
AWR compares the difference between snapshots to determine which SQL statements to capture based on the effect on the system load. This reduces the number of SQL statements that need to be captured over time.
The space consumed by the Automatic Workload Repository is determined by several factors:
The snapshot interval determines the frequency at which snapshots are captured. A smaller snapshot interval increases the frequency, which increases the volume of data collected by the Automatic Workload Repository.
The retention period determines how long this data is retained before being purged. A longer retention period increases the space consumed by the Automatic Workload Repository.
By default, the snapshots are captured once every hour and are retained in the database for 7 days. With these default settings, a typical system with an average of 10 concurrent active sessions can require approximately 200 to 300 MB of space for its AWR data. It is possible to change the default values for both snapshot interval and retention period. See "Accessing the Automatic Workload Repository with Oracle Enterprise Manager" and "Modifying Snapshot Settings" for information on modifying AWR settings.
The Automatic Workload Repository space consumption can be reduced by the increasing the snapshot interval and reducing the retention period. When reducing the retention period, note that several Oracle self-managing features depend on AWR data for proper functioning. Not having enough data can affect the validity and accuracy of these components and features, including the following:
If possible, Oracle Corporation recommends that you set the AWR retention period large enough to capture at least one complete workload cycle. If your system experiences weekly workload cycles, such as OLTP workload during weekdays and batch jobs during the weekend, you do not need to change the default AWR retention period of 7 days. However if your system is subjected to a monthly peak load during month end book closing, you may have to set the retention period to one month.
Under exceptional circumstances, the automatic snapshot collection can be completely turned off by setting the snapshot interval to 0. Under this condition, the automatic collection of the workload and statistical data is stopped and much of the Oracle self-management functionality is not operational. In addition, you will not be able to manually create snapshots. For this reason, Oracle Corporation strongly recommends that you do not turn off the automatic snapshot collection.
It is important that you create baselines from the Automatic Workload Repository to capture typical performance periods. The baselines, which are specified by a range of snapshots, are preserved for comparisons with other similar workload periods when performance problems occur.
STATISTICS_LEVEL initialization parameter must be set to the
ALL to enable the Automatic Workload Repository. If the value is set to
BASIC, you can manually capture AWR statistics using procedures in the
DBMS_WORKLOAD_REPOSITORY package. However, because setting the
STATISTICS_LEVEL parameter to
BASIC turns off in-memory collection of many system statistics, such as segments statistics and memory advisor information, manually captured snapshots will not contain these statistics and will be incomplete.
Oracle Database Reference for information on the
In addition to the data collection by the AWR, Automatic Optimizer Statistics Collection is performed by the
DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC procedure as a scheduled job of the Maintenance Window. See "Automatic Statistics Gathering".
To access Automatic Workload Repository through Oracle Enterprise Manager Database Control:
Oracle Enterprise Manager Concepts and Oracle Enterprise Manager online help for information about monitoring and diagnostic tools available with Oracle Enterprise Manager
While the primary interface for managing the Automatic Workload Repository is the Oracle Enterprise Manager Database Control, monitoring functions can be managed with procedures in the
Snapshots are automatically generated for an Oracle database; however, you can use
DBMS_WORKLOAD_REPOSITORY procedures to manually create, drop, and modify the snapshots and baselines that are used by automatic database diagnostic monitoring. Snapshots and baselines are sets of historical data for specific time periods that are used for performance comparisons.
To invoke these procedures, a user must be granted the DBA role.
PL/SQL Packages and Types Reference for detailed information on the
You can manually create snapshots with the
CREATE_SNAPSHOT procedure if you want to capture statistics at times different than those of the automatically generated snapshots. For example:
In this example, a snapshot for the instance is created immediately with the flush level specified to the default flush level of
TYPICAL. You can view this snapshot in the
You can drop a range of snapshots using the
DROP_SNAPSHOT_RANGE procedure. To view a list of the snapshot Ids along with database Ids, check the
DBA_HIST_SNAPSHOT view. For example, you can drop the following range of snapshots:
BEGIN DBMS_WORKLOAD_REPOSITORY.DROP_SNAPSHOT_RANGE (low_snap_id => 22, high_snap_id => 32, dbid => 3310949047); END; /
In the example, the range of snapshot Ids to drop is specified from 22 to 32. The optional database identifier is
3310949047. If you do not specify a value for
dbid, the local database identifier is used as the default value.
Active Session History data (ASH) that belongs to the time period specified by the snapshot range is also purged when the
DROP_SNAPSHOT_RANGE procedure is called.
You can adjust the interval and retention of snapshot generation for a specified database Id, but note that this can affect the precision of the Oracle diagnostic tools.
INTERVAL setting affects how often in minutes that snapshots are automatically generated. The
RETENTION setting affects how long in minutes that snapshots are stored in the workload repository. To adjust the settings, use the
MODIFY_SNAPSHOT_SETTINGS procedure. For example:
BEGIN DBMS_WORKLOAD_REPOSITORY.MODIFY_SNAPSHOT_SETTINGS( retention => 43200, interval => 30, dbid => 3310949047); END; /
In this example, the retention period is specified as 43200 minutes (30 days) and the interval between each snapshot is specified as 30 minutes. If NULL is specified, the existing value is preserved. The optional database identifier is
3310949047. If you do not specify a value for
dbid, the local database identifier is used as the default value. You can check the current settings for your database instance with the
A baseline is created with the
CREATE_BASELINE procedure. A baseline is simply performance data for a set of snapshots that is preserved and used for comparisons with other similar workload periods when performance problems occur. You can review the existing snapshots in the
DBA_HIST_SNAPSHOT view to determine the range of snapshots that you want to use. For example:
BEGIN DBMS_WORKLOAD_REPOSITORY.CREATE_BASELINE (start_snap_id => 270, end_snap_id => 280, baseline_name => 'peak baseline', dbid => 3310949047); END; /
In this example,
270 is the start snapshot sequence number and
280 is the end snapshot sequence.
baseline is the name of baseline and
3310949047 is an optional database identifier. If you do not specify a value for
dbid, the local database identifier is used as the default value.
The system automatically assign a unique baseline Id to the new baseline when the baseline is created. The baseline Id and database identifier are displayed in the
The pair of snapshots associated with the baseline are retained until you explicitly drop the baseline. You can drop a baseline with the
DROP_BASELINE procedure. For example:
BEGIN DBMS_WORKLOAD_REPOSITORY.DROP_BASELINE (baseline_name => 'peak baseline', cascade => FALSE, dbid => 3310949047); END; /
In the example,
baseline is the name of baseline and
FALSE specifies that only the baseline is dropped.
TRUE specifies that drop operation should remove the pair of snapshots associated with baseline along with the baseline.
3310949047 is an optional database identifier.
Typically, you would view the AWR data through Oracle Enterprise Manager screens or AWR reports. However, you can view the statistics with the following views:
This view displays active database session activity, sampled once every second. See "Active Session History (ASH)".
The metric views are organized into various groups, such as event, event class, system, session, service, file, and tablespace metrics. These groups are identified in the
DBA_HIST views contain historical data stored in the database. This group of views includes:
DBA_HIST_ACTIVE_SESS_HISTORYdisplays the history of the contents of the in-memory active session history for recent system activity.
DBA_HIST_BASELINEdisplays information about the baselines captured on the system
DBA_HIST_DATABASE_INSTANCEdisplays information about the database environment
DBA_HIST_SNAPSHOTdisplays information on snapshots in the system
DBA_HIST_SQL_PLANdisplays the SQL execution plans
DBA_HIST_WR_CONTROLdisplays the settings for controlling AWR
Oracle Database Reference for information on dynamic and static data dictionary views
You can view the AWR reports with Oracle Enterprise Manager or by running the following SQL scripts:
awrrpt.sqlSQL script generates an HTML or text report that displays statistics for a range of snapshot Ids.
awrrpti.sqlSQL script generates an HTML or text report that displays statistics for a range of snapshot Ids for a specified database and instance.
To run an AWR report, a user must be granted the DBA role.
The reports are divided into multiple sections. The HTML report includes links that can be used to navigate quickly between sections. The content of the report contains the workload profile of the system for the selected range of snapshots.
If you run a report on a database that does not have any workload activity during the specified range of snapshots, calculated percentages for some report statistics can be less than 0 or greater than 100. This result simply means that there is no meaningful value for the statistic.
To generate a text report for a range of snapshot Ids, run the
awrrpt.sql script at the SQL prompt:
First, you need to specify whether you want an HTML or a text report.
Specify the number days for which you want to list snapshot Ids.
After the list displays, you are prompted for the beginning and ending snapshot Id for the workload repository report.
Next, accept the default report name or enter a report name. The default name is accepted in the following example:
The workload repository report is generated.
If you want to specify a database and instance before entering a range of snapshot Ids, run the
awrrpti.sql script at the SQL prompt to generate a text report:
First, specify whether you want an HTML or a text report. After that, a list of the database Ids and instance numbers displays, similar to the following:
Instances in this Workload Repository schema ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ DB Id Inst Num DB Name Instance Host ----------- -------- ------------ ------------ ------------ 3309173529 1 MAIN main dlsun1690 3309173529 1 TINT251 tint251 stint251
Enter the values for the database identifier (
dbid) and instance number (
inst_num) at the prompts.
Next you are prompted for the number of days and snapshot Ids, similar to the
awrrpt.sql script, before the text report is generated. See "Running the awrrpt.sql Report".