22
Instance Tuning

This chapter discusses the method used for performing tuning. This chapter contains the following sections:

Performance Tuning Principles

Performance tuning requires a different, although related, method to the initial configuration of a system. Configuring a system involves allocating resources in an ordered manner so that the initial system configuration is functional.

Tuning is driven by identifying the most significant bottleneck and making the appropriate changes to reduce or eliminate the effect of that bottleneck. Usually, tuning is performed reactively, either while the system is preproduction or after it is live.

Baselines

The most effective way to tune is to have an established performance baseline that can be used for comparison if a performance issue arises. Most DBAs know their system well and can easily identify peak usage periods. For example, the peak periods could be between 10.00am and 12.00pm and also between 1.30pm and 3.00pm. This could include a batch window of 12.00am midnight to 6am.

It is important to identify these high-load times at the site and install a monitoring tool that gathers performance data for those times. Optimally, data gathering should be configured from when the application is in its initial trial phase (during the QA cycle). Otherwise, this should be configured when the system is first in production.

Note:

Oracle recommends using the Enterprise Manager (EM) Diagnostics Pack for systems monitoring and tuning due to its extended feature list. However, if your site does not have EM, then Statspack can be used to gather Oracle instance statistics.

For illustration purposes, a combination of Statspack report output and direct queries from the V$ views are used in examples, because they are available on all installations.

See Also:

Chapter 20, "Oracle Tools to Gather Database Statistics" for detailed information on Oracle instance performance tuning tools

Ideally, baseline data gathered should include the following:

Application statistics (transaction volumes, response time)
Database statistics
OS statistics
Disk I/O statistics
Network statistics

The Symptoms and the Problems

A common pitfall in performance tuning is to mistake the symptoms of a problem for the actual problem itself. It is important to recognize that many performance statistics indicate the symptoms, and that identifying the symptom is not sufficient data to implement a remedy. For example:

Slow physical I/O

Generally, this is caused by poorly-configured disks. However, it could also be caused by a significant amount of unnecessary physical I/O on those disks issued by poorly-tuned SQL.
Latch contention

Rarely is latch contention tunable by reconfiguring the instance. Rather, latch contention usually is resolved through application changes.
Excessive CPU usage

Excessive CPU usage usually means that there is little idle CPU on the system. This could be caused by an inadequately-sized system, by untuned SQL statements, or by inefficient application programs.

See Also:
Table 22-1, "Wait Events and Potential Causes"

When to Tune

There are two distinct types of tuning: proactive monitoring and bottleneck elimination.

Proactive Monitoring

Proactive monitoring usually occurs on a regularly scheduled interval, where a number of performance statistics are examined to identify whether the system behavior and resource usage has changed. Proactive monitoring also can be called proactive tuning.

Usually, monitoring does not result in configuration changes to the system, unless the monitoring exposes a serious problem that is developing. In some situations, experienced performance engineers can identify potential problems through statistics alone, although accompanying performance degradation is usual.

'Tweaking' a system when there is no apparent performance degradation as a proactive action can be a dangerous activity, resulting in unnecessary performance drops. Tweaking a system should be considered reactive tuning, and the steps for reactive tuning should be followed.

Monitoring is usually part of a larger capacity planning exercise, where resource consumption is examined to see the changes in the way the application is being used and the way the application is using the database and host resources.

Bottleneck Elimination: Tuning

Tuning usually implies fixing a performance problem. However, tuning should be part of the lifecycle of an application, through the analysis, design, coding, production, and maintenance stages. Many times, the tuning phase is left until the system is in production. At this time, tuning becomes a reactive fire-fighting exercise, where the most important bottleneck is identified and fixed.

Usually, the purpose for tuning is to reduce resource consumption or to reduce the elapsed time for an operation to complete. Either way, the goal is to improve the effective use of a particular resource. In general, performance problems are caused by the over-use of a particular resource. That resource is the bottleneck in the system. There are a number of distinct phases in identifying the bottleneck and the potential fixes. These are discussed below.

Remember that the different forms of contention are symptoms that can be fixed by making changes in the following places:

Changes in the application, or the way the application is used
Changes in Oracle
Changes in the host hardware configuration

Often, the most effective way of resolving a bottleneck is to change the application.

Performance Tuning Steps

Below are the main steps in the Oracle Performance Method:

Get candid feedback from users about the scope of the performance problem. This step is to Define the Problem.
Obtain a full set of OS, database, and application statistics. Then Examine the Host System and Examine the Oracle Statistics for any evidence.
Consider the list of common performance errors to see whether the data gathered suggests that they are contributing to the problem.
Build a conceptual model of what is happening on the system using the performance data gathered.
Propose changes to be made and the expected result of implementing the changes. Then, Implement and Measure Change in application performance.
Determine whether the performance objective defined in step 1 has been met. If not, then repeat steps 5 and 6 until the performance goals are met.

See Also:
Oracle9i Database Performance Methods for a list of common errors and for a theoretical description of this performance method

The remainder of this chapter covers the steps of the Oracle performance method in detail.

Define the Problem

It is vital to develop a good understanding of the purpose of the tuning exercise and the nature of the problem before attempting to implement a solution. Without this understanding, it is virtually impossible to implement effective changes. The data gathered during this stage helps determine the next step to take and what evidence to examine.

Gather the following data:

Identify the performance objective.

What is the measure of acceptable performance? How many transactions per hour, or seconds, response time will meet the required performance level?
Identify the scope of the problem.

What is affected by the slowdown? For example, is the whole instance slow? Is it a particular application, program, specific operation, or a single user?
Identify the time frame when the problem occurs.

Is the problem only evident during peak hours? Does performance deteriorate over the course of the day? Was the slowdown gradual (over the space of months or weeks) or sudden?
Quantify the slowdown.

This helps identify the extent of the problem and also acts as a measure for comparison when deciding whether changes implemented to fix the problem have actually made an improvement. Find a consistently reproducible measure of the response time or job run time. How much worse are the timings than when the program was running well?
Identify any changes.

Identify what has changed since performance was acceptable. This may narrow the potential cause quickly. For example, has the operating system software, hardware, application software, or Oracle release been upgraded? Has more data been loaded into the system, or has the data volume or user population grown?

At the end of this phase, you should have a good understanding of the symptoms. If the symptoms can be identified as local to a program or set of programs, then the problem is handled in a different manner than instance-wide performance issues.

See Also:

Chapter 6, "Optimizing SQL Statements" for information on solving performance problems specific to an application or user

Examine the Host System

Look at the load on the database server, as well as the database instance. Consider the operating system, the I/O subsystem, and network statistics, because examining these areas helps determine what might be worth further investigation. In multitier systems, also examine the application server middle-tier hosts.

Examining the host hardware often gives a strong indication of the bottleneck in the system. This determines which Oracle performance data could be useful for cross-reference and further diagnosis.

Data to examine includes the following:

CPU Usage

If there is a significant amount of idle CPU, then there could be an I/O, application, or database bottleneck. (Note that wait I/O should be considered as idle CPU).

If there is high CPU usage, then determine whether the CPU is being used effectively. Is the majority of CPU usage attributable to a small number of high-CPU using programs, or is the CPU consumed by an evenly distributed workload?

If the CPU is used by a small number of high-usage programs, then look at the programs to determine the cause.

Non-Oracle Processes

If the programs are not Oracle programs, then identify whether they are legitimately requiring that amount of CPU. If so, then can their execution can be delayed to off-peak hours?

Oracle Processes

If a small number of Oracle processes consumes most of the CPU resources, then use SQL_TRACE and TKPROF to identify the SQL or PL/SQL statements to see if a particular query or PL/SQL program unit can be tuned. For example, a SELECT statement could be CPU-intensive if it's execution involves many reads of data in cache (logical reads) that could be avoided with better SQL optimization.

Oracle CPU Statistics

Oracle CPU statistics are available in three V$ views:

V$SYSSTAT shows Oracle CPU usage for all sessions. The statistic "CPU used by this session" shows the aggregate CPU used by all sessions.
V$SESSTAT shows Oracle CPU usage per session. Use this view to determine which particular session is using the most CPU.
V$RSRC_CONSUMER_GROUP shows CPU utilization statistics on a per consumer group basis when the Oracle Database Resource Manager is running.

Interpreting CPU Statistics

It is important to recognize that CPU time and real time are distinct. With eight CPUs, for any given minute in real time, there are eight minutes of CPU time available. On NT and UNIX, this can be either user time or system time (privileged mode on NT). Thus, CPU time utilized by all processes (threads) on the system could be greater than one minute per one minute real time interval.

At any given moment, you know how much time Oracle has used on the system. So, if eight minutes are available and Oracle uses four minutes of that time, then you know that 50% of all CPU time is used by Oracle. If your process is not consuming that time, then some other process is. Identify the processes that are using CPU time, figure out why, and then attempt to tune them.

If the CPU usage is evenly distributed over many Oracle server processes, then examine the Statspack report for other evidence.

Detecting I/O Problems

An overly active I/O system can be evidenced by disk queue lengths greater than two, or disk service times that are over 20-30ms. If the I/O system is overly active, then check for potential hot spots that could benefit from distributing the I/O across more disks. Also identify whether the load can be reduced by lowering the resource requirements of the programs using those resources.

Use operating system monitoring tools to determine what processes are running on the system as a whole and to monitor disk access to all files. Remember that disks holding datafiles and redo log files can also hold files that are not related to Oracle. Reduce any heavy access to disks that contain database files. Access to non-Oracle files can be monitored only through operating system facilities, rather than through the V$ views.

Tools, such as sar -d (or iostat) on many UNIX systems and Performance Monitor on Windows 2000 systems, examine I/O statistics for the entire system.

Check the Oracle wait event data in V$SYSTEM_EVENT to see whether the top wait events are I/O related. I/O related events include db file sequential read, db file scattered read, db file single write, and db file parallel write. These are all events corresponding to I/Os performed against the data file headers, control files, or data files. If any of these wait events correspond to high average time, then investigate the I/O contention.

Cross reference the host I/O system data with the I/O sections in the Statspack report to identify hot datafiles and tablespaces. Also compare the I/O times reported by the OS with the times reported by Oracle to see if they are consistent.

Before investigating whether the I/O system should be reconfigured, determine if the load on the I/O system can be reduced. To reduce Oracle I/O load, look at SQL statements that perform many physical reads by querying the V$SQLAREA view or by reviewing the 'SQL ordered by physical reads' section of the Statspack report. Examine these statements to see how they can be tuned to reduce the number of I/Os.

If there are Oracle-related I/O problems caused by SQL statements, then tune them. If the Oracle server is not consuming the available I/O resources, then identify the process that is using up the I/O. Determine why the process is using up the I/O, and then tune this process.

See Also:

Network

Using OS utilities, look at the network round-trip ping time and the number of collisions. If the network is causing large delays in response time, then investigate possible causes.

Network load can be reduced by scheduling large data transfers to off-peak times, or by coding applications to batch requests to remote hosts, rather than accessing remote hosts once (or more) per request.

See Also:

Oracle9i Database Performance Methods for a description of important operating system statistics

Examine the Oracle Statistics

Oracle statistics are examined and cross-referenced with OS statistics to ensure a consistent diagnosis of the problem. OS statistics can indicate a good place to begin tuning. However, if the goal is to tune the Oracle instance, then look at the Oracle statistics to identify the resource bottleneck from Oracle's perspective before implementing corrective action.

See Also:

"Interpreting Oracle Statistics"

Below are the common Oracle data sources used while tuning. The sources can be divided into two types of statistics: wait events and system statistics.

Wait Events

Wait events are statistics that are incremented by a server process/thread to indicate that it had to wait for an event to complete before being able to continue processing. Wait event data reveals various symptoms of problems that might be impacting performance, such as latch contention, buffer contention, and I/O contention. Remember that these are only symptoms of problems--not the actual causes.

A server process can wait for the following:

A resource to become available (such as a buffer or a latch)
An action to complete (such as an I/O)
More work to do (such as waiting for the client to provide the next SQL statement to execute). Events that identify that a server process is waiting for more work are known as idle events.

Wait event statistics include the number of times an event was waited for and the time waited for the event to complete. The views V$SESSION_WAIT, V$SESSION_EVENT, and V$SYSTEM_EVENT can be queried for wait event statistics. If the configuration parameter TIMED_STATISTICS is set to true, then you can also see how long each resource was waited for. To minimize user response time, reduce the time spent by server processes waiting for event completion. Not all wait events have the same wait time. Therefore, it is more important to examine events with the most total time waited rather than wait events with a high number of occurrences. Usually, it is best to set the dynamic parameter TIMED_STATISTICS to true at least while monitoring performance.

Investigate wait events and related timing data when performing reactive performance tuning. The events with the most time listed against them are often strong indications of the performance bottleneck. For example, by looking at V$SYSTEM_EVENT, you might notice lots of buffer busy waits. It might be that many processes are inserting into the same block and must wait for each other before they can insert. The solution could be to introduce freelists for the object in question.

System Statistics

System statistics are typically used in conjunction with wait event data to find further evidence of the cause of a performance problem.

For example, if V$SYSTEM_EVENT indicates that the largest wait event (in terms of wait time) is the event buffer busy waits, then look at the specific buffer wait statistics available in the view V$WAITSTAT to see which block type has the highest wait count and the highest wait time. After the block type has been identified, also look at V$SESSION_WAIT real-time while the problem is occurring to identify the contended-for object(s) using the file number and block number indicated. The combination of this data indicates the appropriate corrective action.

Statistics are available in many V$ views. Some common views include the following:

V$SYSSTAT

This contains overall statistics for many different parts of Oracle, including rollback, logical and physical I/O, and parse data. Data from V$SYSSTAT is used to compute ratios, such as the buffer cache hit ratio.

V$FILESTAT

This contains detailed file I/O statistics on a per-file basis, including the number of I/Os per file and the average read time.

V$ROLLSTAT

This contains detailed rollback and undo segment statistics on a per-segment basis.

V$ENQUEUE_STAT

This contains detailed enqueue statistics on a per-enqueue basis, including the number of times an enqueue was requested and the number of times an enqueue was waited for, and the wait time.

V$LATCH

This contains detailed latch usage statistics on a per-latch basis, including the number of times each latch was requested and the number of times the latch was waited for.

See Also:

Chapter 24, "Dynamic Performance Views for Tuning" for detailed descriptions of the V$ views used in tuning

Implement and Measure Change

Often at the end of a tuning exercise, it is possible to identify two or three changes that could potentially alleviate the problem. To identify which change provides the most benefit, it is recommended that only one change be implemented at a time. The effect of the change should measured against the baseline data measurements found in the problem definition phase.

Typically, most sites with dire performance problems implement a number of overlapping changes at once, and thus cannot identify which changes provided any benefit. Although this is not immediately an issue, this becomes a significant hindrance if similar problems subsequently appear, because it is not possible to know which of the changes provided the most benefit and which efforts to prioritize.

If it is not possible to implement changes separately, then try to measure the effects of dissimilar changes. For example, measure the effect of making an initialization change to optimize redo generation separately from the effect of creating a new index to improve the performance of a modified query. It is impossible to measure the benefit of performing an OS upgrade if SQL is tuned, the OS disk layout is changed, and the initialization parameters are also changed at the same time.

Performance tuning is an iterative process. It is unlikely to find a 'silver bullet' that solves an instance-wide performance problem. In most cases, excellent performance requires iteration through the performance tuning phases, because solving one bottleneck often uncovers another (sometimes worse) problem.

Knowing when to stop tuning is also important. The best measure of performance is user perception, rather than how close the statistic is to an ideal value.

Interpreting Oracle Statistics

Gather statistics that cover the time when the instance had the performance problem. If you previously captured baseline data for comparison, then you can compare the current data to the data from the baseline that most represents the problem workload.

When comparing two reports, ensure that the two reports are from times where the system was running comparable workloads.

See Also:

"Principles of Data Gathering"

Examine Load

Usually, wait events are the first data examined. However, if you have a baseline report, then check to see if the load has changed. Regardless of whether you have a baseline, it is useful to see whether the resource usage rates are high.

Load-related statistics to examine include redo size, session logical reads, db block changes, physical reads, physical writes, parse count (total), parse count (hard), and user calls. This data is queried from V$SYSSTAT. It is best to normalize this data per second and per transaction.

In the Statspack report, look at the Load Profile section. (The data has been normalized per transaction and per second.)

Changing Load

The per second statistics show the changes in throughput (that is, whether the instance is performing more work per second). The per transaction statistics identify changes in the application characteristics by comparing these to the corresponding statistics from the baseline report.

High Rates of Activity

Examine the per second statistics to identify whether the 'rates' of activity are very high. It is difficult to make blanket recommendations on 'high' values, because the thresholds are different on each site and are contingent on the application characteristics, the number and speed of CPUs, the operating system, the I/O system, and the Oracle release.

Below are some generalized examples (acceptable values vary at each site):

A hard parse rate of more than 100 per second indicates that there is a very high amount of hard parsing on the system. High hard parse rates cause serious performance issues and must be investigated. Usually, a high hard parse rate is accompanied by latch contention on the shared pool and library cache latches. Check whether waits for 'latch free' appear in the top-5 wait events, and if so, examine the latching sections of the Statspack report.
A high soft parse rate could be in the rate of 300 or more per second. Unnecessary soft parses also limit application scalability. Optimally, a SQL statement should be soft parsed once per session and executed many times.

Using Wait Event Statistics to Drill Down to Bottlenecks

Whenever an Oracle process waits for something, it records the wait using one of a set of predefined wait events. (See V$EVENT_NAME for a list of all wait events.) Some of these events are termed idle events, because the process is idle, waiting for work to perform. Nonidle events indicate nonproductive time spent waiting for a resource or action to complete.

Note:

Not all symptoms can be evidenced by wait events. See "Additional Statistics" for the statistics that can be checked.

The most effective way to use wait event data is to order the events by the wait time. This is only possible if TIMED_STATISTICS is set to true. Otherwise, the wait events can only be ranked by the number of times waited, which is often not the ordering that best represents the problem.

To get an indication of where time is spent, follow these steps:

Examine the data collection for V$SYSTEM_EVENT. The events of interest should be ranked by wait time.

Identify the wait events that have the most significant percentage of wait time. To determine the percentage of wait time, add the total wait time for all wait events, excluding idle events (such as Null event, SQL*Net message from client, SQL*Net message to client, SQL*Net more data). Calculate the relative percentage of the five most prominent events by dividing each event's wait time by the total time waited for all events
.

See Also:
"Idle Wait Events" for the complete list of idle events

Alternatively, look at the Top 5 Wait Events section on the front page of the Statspack report; this section automatically orders the wait events (omitting idle events), and calculates the relative percentage:
```
Top 5 Wait Events                                                           
~~~~~~~~~~~~~~~~~                                           Wait     % Total
Event                                             Waits   Time (cs)  Wt Time
------------------------------------------ ------------ ------------ -------
latch free                                      217,224       65,056   63.55
db file sequential read                          39,836       31,844   31.11
db file scattered read                            3,679        2,846    2.78
SQL*Net message from dblink                       1,186          870     .85
log file sync                                       830          775     .76
```
In the example above, the highest ranking wait event is the latch free event. In some situations, there might be a few events with similar percentages. This can provide extra evidence if all the events all related to the same type of resource request (for example, all I/O related events).

Look at the number of waits for these events, and the average wait time. For example, for I/O related events, the average time might help identify whether the I/O system is slow. Below is an example of this data taken from the Wait Event section of the Statspack report:

                                                                 Avg        
                                                   Total Wait   wait  Waits 
Event                             Waits   Timeouts  Time (s)    (ms)   /txn 
-------------------------- ------------ ---------- ----------- ----- ------ 
latch free                    5,560,989  2,705,969      26,117     5  827.7 
db file sequential read         137,027          0       2,129    16   20.4 
SQL*Net break/reset to cli        1,566          0       1,707  1091    0.2

The top wait events identify the next places to investigate. A table of common wait events is listed in "Table of Wait Events and Potential Causes" below. For the example above, the appropriate data to check would be latch-related. (It is usually a good idea to also have quick look at high-load SQL).
Examine the related data indicated by the wait events to see what other information this data provides. Determine whether this information is consistent with the wait event data. In most situations, there is enough data to begin developing a theory about the potential causes of the performance bottleneck.
To determine whether this theory is valid, cross-check data you have already examined with other statistics available for consistency. (The appropriate statistics vary depending on the problem, but usually include load profile-related data in V$SYSSTAT, OS statistics, and so on). Perform cross-checks with other data to confirm or refute the developing theory.

Table of Wait Events and Potential Causes

The table below links wait events to possible causes and gives an overview of the Oracle data that could be most useful to review next.

See Also:

"Wait Events" for detailed information on each event listed below and for other information to cross-check
"Dynamic Performance Views for Tuning" for detailed information on querying V$ views

Wait Event	General Area	Possible Causes	Look For / Examine
buffer busy waits	Buffer cache, DBWR	Dependent on type of buffer: index block in a primary key that is based on an ascending sequence rollback segment header	Examine `V$SESSION_WAIT` while the problem is occurring to determine the type of block contended for.
free buffer waits	Buffer cache, DBWR, I/O	Slow DBWR (possibly due to I/O?) Cache too small	Examine write time using OS statistics. Check buffer cache statistics for evidence of too small cache.
db file scattered read	I/O, SQL statement tuning	Poorly tuned SQL Slow I/O system	Investigate `V$SQLAREA` to see whether there are SQL statements performing many disk reads. Cross-check I/O system and `V$FILESTAT` for poor read time.
db file sequential read	I/O, SQL statement tuning	Poorly tuned SQL Slow I/O system	Investigate `V$SQLAREA` to see whether there are SQL statements performing many disk reads. Cross-check I/O system and `V$FILESTAT` for poor read time.
enqueue	Locks	Depends on type of enqueue	Look at `V$ENQUEUE_STAT`.
latch free	Latch contention	Depends on latch	`Check V$LATCH.`
log buffer space	Log buffer, I/O	Log buffer small Slow I/O system	Check the statistic redo buffer allocation retries in `V$SYSSTAT`. Check configuring log buffer section in configuring memory chapter. Check the disks that house the online redo logs for resource contention.
log file sync	I/O, over- committing	Slow disks that store the online logs Un-batched commits	Check the disks that house the online redo logs for resource contention. Check the number of transactions (commits + rollbacks) per second, from `V$SYSSTAT`.

Table 22-1 Wait Events and Potential Causes

Wait Event General Area Possible Causes Look For / Examine

buffer busy waits

Buffer cache, DBWR

Dependent on type of buffer:

index block in a primary key that is based on an ascending sequence

rollback segment header

Examine V$SESSION_WAIT while the problem is occurring to determine the type of block contended for.

free buffer waits

Buffer cache, DBWR, I/O

Slow DBWR (possibly due to I/O?)
Cache too small

Examine write time using OS statistics.
Check buffer cache statistics for evidence of too small cache.

db file scattered read

I/O, SQL statement tuning

Poorly tuned SQL
Slow I/O system

Investigate V$SQLAREA to see whether there are SQL statements performing many disk reads.
Cross-check I/O system and V$FILESTAT for poor read time.

db file sequential read

I/O, SQL statement tuning

Poorly tuned SQL
Slow I/O system

Investigate V$SQLAREA to see whether there are SQL statements performing many disk reads.
Cross-check I/O system and V$FILESTAT for poor read time.

enqueue

Locks

Depends on type of enqueue

Look at V$ENQUEUE_STAT.

latch free

Latch contention

Depends on latch

Check V$LATCH.

log buffer space

Log buffer, I/O

Log buffer small
Slow I/O system

Check the statistic redo buffer allocation retries in V$SYSSTAT. Check configuring log buffer section in configuring memory chapter.
Check the disks that house the online redo logs for resource contention.

log file sync

I/O, over- committing

Slow disks that store the online logs
Un-batched commits

Check the disks that house the online redo logs for resource contention.
Check the number of transactions (commits + rollbacks) per second, from V$SYSSTAT.

Additional Statistics

There are a number of statistics that can indicate performance problems that do not have corresponding wait events.

Redo Log Space Requests Statistic

The V$SYSSTAT statistic redo log space requests indicates how many times a server process had to wait for space in the online redo log, not for space in the redo log buffer. A significant value for this statistic and the wait events should be used as an indication that checkpoints, DBWR, or archiver activity should be tuned, not LGWR. Increasing the size of log buffer does not help.

Read Consistency

Your system might spend excessive time rolling back changes to blocks in order to maintain a consistent view. Consider the following scenarios:

If there are many small transactions and an active long-running query is running in the background on the same table where the changes are happening, then the query might need to roll back those changes often, in order to obtain a read-consistent image of the table. Compare the following V$SYSSTAT statistics to determine whether this is happening:
- consistent changes statistic indicates the number of times a database block has rollback entries applied to perform a consistent read on the block. Workloads that produce a great deal of consistent changes can consume a great deal of resources.
- consistent gets statistic counts the number of logical reads in consistent mode.
If there are few very, large rollback segments, then your system could be spending a lot of time rolling back the transaction table during delayed block cleanout in order to find out exactly which SCN a transaction was committed. The ratio of the following V$SYSSTAT statistics should be close to 1:
```
ratio = transaction tables consistent reads undo records applied /
        transaction tables consistent read rollbacks
```
A solution is to create more, smaller rollback segments, or to use automatic undo management.
If there are insufficient rollback segments, then there is rollback segment (header or block) contention. Evidence of this problem is available by the following:
- Comparing the number of WAITS to the number of GETS in V$ROLLSTAT; the proportion of WAITS to GETS should be small.
- Examining V$WAITSTAT to see whether there are many WAITS for buffers of CLASS 'undo header'.
A solution is to create more rollback segments.

Table Fetch by Continued Row

You can detect migrated or chained rows by checking the number of table fetch continued row statistic in V$SYSSTAT. A small number of chained rows (less than 1%) is unlikely to impact system performance. However, a large percentage of chained rows can affect performance. This might not be relevant for segments that store LOBs, because these by nature can exceed the size of a block.

If an UPDATE statement increases the amount of data in a row so that the row no longer fits in its data block, then Oracle tries to find another block with enough free space to hold the entire row. If such a block is available, then Oracle moves the entire row to the new block. This is called migrating a row. If the row is too large to fit into any available block, then Oracle splits the row into multiple pieces and stores each piece in a separate block. This is called chaining a row. Rows can also be chained when they are inserted.

Migration and chaining are especially detrimental to performance with the following:

UPDATE statements that cause migration and chaining perform poorly.
Queries that select migrated or chained rows must perform more I/O.

Identify migrated and chained rows in a table or cluster using the ANALYZE statement with the LIST CHAINED ROWS clause. This statement collects information about each migrated or chained row and places this information in a specified output table.

The definition of a sample output table named CHAINED_ROWS appears in a SQL script available on your distribution medium. The common name of this script is UTLCHN1.SQL, although its exact name and location varies depending on your platform. Your output table must have the same column names, datatypes, and sizes as the CHAINED_ROWS table.

Increase PCTFREE to avoid migrated rows. If you leave more free space available in the block, then the row has room to grow. You can also reorganize or re-create tables and indexes with high deletion rates.

Note:
PCTUSED is not the opposite of PCTFREE.

See Also:

Oracle9i Database Concepts for more information on PCTUSED
Oracle9i Database Administrator's Guide for information on reorganizing tables

Parse-related Statistics

The more your application parses, the more contention exists, and the more time your system spends waiting. If parse time CPU represents a large percentage of the CPU time, then time is being spent parsing instead of executing statements. If this is the case, then it is likely that the application is using literal SQL and so SQL cannot be shared, or the shared pool is poorly configured.

There are a number of statistics available to identify the extent of time spent parsing by Oracle. Query the parse related statistics from V$SYSSTAT. For example:

SELECT NAME, VALUE
  FROM V$SYSSTAT
 WHERE NAME IN (  'parse time cpu', 'parse time elapsed'
                , 'parse count (hard)', 'CPU used by this session' );

There are various ratios that can be computed to assist in determining whether parsing may be a problem:

parse time CPU / parse time elapsed This ratio indicates how much of the CPU time spent parsing was due to the parse operation itself, rather than waiting for resources, such as latches. A ratio of one is good, indicating that the elapsed time was not spent waiting for highly contended resources.
parse time CPU / CPU used by this session This ratio indicates how much of the total CPU used by Oracle server processes was spent on parse-related operations. A ratio closer to zero is good, indicating that the majority of CPU is not spent on parsing.

Wait Events

The views V$SESSION_WAIT, V$SESSION_EVENT and V$SYSTEM_EVENT provide information on what resources were waited for, and, if the configuration parameter TIMED_STATISTICS is set to true, how long each resource was waited for.

The three views contain related, but different, views of the same data:

V$SESSION_WAIT is a current state view. It lists either the event currently being waited for or the event last waited for on each session
V$SESSION_EVENT lists the cumulative history of events waited for on each session. After a session exits, the wait event statistics for that session are removed from this view.
V$SYSTEM_EVENT lists the events and times waited for by the whole instance (that is, all session wait events data rolled up) since instance startup.

Because V$SESSION_WAIT is a current state view, it also contains a finer-granularity of information than V$SESSION_EVENT or V$SYSTEM_EVENT. It includes additional identifying data for the current event in three parameter columns: P1, P2, and P3.

For example, V$SESSION_EVENT can show that session 124 (SID=124) had many waits on the db file scattered read event, but it does not show which file and block number. However, V$SESSION_WAIT shows the file number in P1, the block number read in P2, and the number of blocks read in P3 (P1 and P2 let you determine for which segments the wait event is occurring).

This chapter concentrates on examples using V$SESSION_WAIT. However, Oracle recommends capturing performance data over an interval and keeping this data for performance and capacity analysis. This form of rollup data is queried from the V$SYSTEM_EVENT view by tools such as Enterprise Manager Diagnostics Pack and Statspack.

Most commonly encountered events are described in this chapter, listed in case-sensitive alphabetical order. Other event-related data to examine is also included. The case used for each event name is that which appears in the V$SYSTEM_EVENT view.

See Also:

Oracle9i Database Reference for a complete list of wait events

SQL*Net

The following events signify that the database process is waiting for acknowledgment from a database link or a client process:

SQL*Net break/reset to client
SQL*Net break/reset to dblink
SQL*Net message from client
SQL*Net message from dblink
SQL*Net message to client
SQL*Net message to dblink
SQL*Net more data from client
SQL*Net more data from dblink
SQL*Net more data to client
SQL*Net more data to dblink

If these waits constitute a significant portion of the wait time on the system or for a user experiencing response time issues, then the network or the middle-tier could be a bottleneck.

Events that are client-related should be diagnosed as described for the event SQL*Net message from client. Events that are dblink-related should be diagnosed as described for the event SQL*Net message from dblink.

SQL*Net message from client

Although this is an idle event, it is important to explain when this event can be used to diagnose what is not the problem. When a server process is waiting for work from the client process, it waits on this event. However, there are several situations where this event could accrue most of the wait time for a user experiencing poor response time.

Network Bottleneck

This could be the case if the application causes a lot of traffic between the server and the client, and the network latency (time for a round-trip) is high. Symptoms include the following:

Large number of waits for this event
Both the database and client process are idle (waiting for network traffic) most of the time

To alleviate network bottlenecks, try the following:

Tune the application to reduce round trips.
Explore options to reduce latency (for example, terrestrial lines opposed to VSAT links).
Change system configuration to move higher traffic components to lower latency links.

See Also:
Oracle9i Database Performance Methods

Resource Bottleneck on the Client Process

If the client process is using most of the resources, then there is nothing that can be done in the database. Symptoms include the following:

Number of waits might not be large, but the time waited might be significant.
Client process has a high resource usage.

In some cases, you can see the wait time for a waiting user tracking closely with the amount of CPU used by the client process. The term client here refers to any process other than the database process (middle-tier, desktop client) in the n-tier architecture.

SQL*Net Message from dblink

This event signifies that the session has sent a query to the remote node and is waiting for a response from the database link. This time could go up because of the following:

Network bottleneck

Time taken to run the query on the remote node

See Also:

For #1, see "SQL*Net message from client" above.
For #2, it is useful to see the query being run on the remote node. (Login to the remote database, find the session created by the database link, and examine the SQL statement being run by it).

buffer busy waits

This wait indicates that there are some buffers in the buffer cache that multiple processes are attempting to access concurrently. Query V$WAITSTAT for the wait statistics for each class of buffer. Common buffer classes that have buffer busy waits include data block, segment header, undo header, and undo block.

V$SESSION_WAIT Parameter Columns

P1 - File ID

P2 - Block ID

Causes

To determine the possible causes, identify the type of class contended for by querying V$WAITSTAT:

SELECT class, count
  FROM V$WAITSTAT
 WHERE count > 0
  ORDER BY count DESC;

Example output:

CLASS                   COUNT
------------------ ----------
data block              43383
undo header             10680
undo block               5237
segment header            785

To identify the segment and segment type contended for, query DBA_EXTENTS using the values for File Id and Block Id returned from V$SESSION_WAIT (p1 and p2 columns):

SELECT segment_owner, segment_name
  FROM DBA_EXTENTS
 WHERE file_id = <&p1>
   AND <&p2> BETWEEN block_id AND block_id + blocks - 1;

Actions

The action required depends on the class of block contended for and the actual segment.

segment header

If the contention is on the segment header, then this is most likely freelist contention.

Automatic segment-space management in locally managed tablespaces eliminates the need to specify the PCTUSED, FREELISTS, and FREELIST GROUPS parameters. If possible, switch from manual space management to automatic segment-space management.

The following information is relevant if you are unable to use automatic segment-space management (for example, because the tablespace uses dictionary space management).

A freelist is a list of free data blocks that usually includes blocks existing in a number of different extents within the segment. Blocks in freelists contain free space greater than PCTFREE. This is the percentage of a block to be reserved for updates to existing rows. In general, blocks included in process freelists for a database object must satisfy the PCTFREE and PCTUSED constraints. Specify the number of process freelists with the FREELISTS parameter. The default value of FREELISTS is one. The maximum value depends on the data block size.

To find the current setting for freelists for that segment, run the following:

SELECT SEGMENT_NAME, FREELISTS
  FROM DBA_SEGMENTS
 WHERE SEGMENT_NAME = <segment name>
   AND SEGMENT_TYPE = <segment type>;

Set freelists, or increase of number of freelists. If adding more freelists does not alleviate the problem, then use freelist groups (even in single instance this can make a difference). If using Oracle Real Application Clusters, then ensure that each instance has its own freelist group(s).

See Also:

Oracle9i Database Concepts for information on automatic segment-space management, freelists, PCTFREE, and PCTUSED
Oracle9i Real Application Clusters Installation and Configuration for information about using freelist groups in an Oracle Real Application Clusters environment

data block

If the contention is on tables or indexes (not the segment header):

Check for SQL statements using unselective indexes
Check for 'right-hand-indexes' (that is, indexes that are inserted into at the same point by many processes; for example, those which use sequence number generators for the key values)
Consider using automatic segment-space management, or increasing freelists to avoid multiple processes attempting to insert into the same block

undo header

For contention on rollback segment header:

If you are not using automatic undo management, then add more rollback segments

undo block

For contention on rollback segment block:

If you are not using automatic undo management, then consider making rollback segment sizes larger

db file scattered read

This event signifies that the user process is reading buffers into the SGA buffer cache and is waiting for a physical I/O call to return. A db file scattered read issues a scatter-read to read the data into multiple discontinuous memory locations. A scattered read is usually a multiblock read.

The db file scattered read wait event identifies that a full table scan is occurring. When performing a full table scan into the buffer cache, the blocks read are read into memory locations that are not physically adjacent to each other. Such reads are called scattered read calls, because the blocks are scattered throughout memory. This is why the corresponding wait event is called 'db file scattered read'. Multiblock (up to DB_FILE_MULTIBLOCK_READ_COUNT blocks) reads due to full table scans into the buffer cache show up as waits for 'db file scattered read'.

V$SESSION_WAIT Parameter Columns

P1 - The absolute file number

P2 - The block being read

P3 - The number of blocks (should be greater than 1)

Actions

On a healthy system, physical read waits should be the biggest waits after the idle waits. However, also consider the following:

Are there direct read waits (signifying full table scans with parallel query) or db file scattered read waits on an operational (OLTP) system that should be doing small indexed accesses?
Are there db file sequential reads on a large data warehouse that should be seeing mostly full table scans with parallel query?

Other things that could indicate excessive I/O load on the system include the following:

Poor buffer cache hit ratio
These events accruing most of the wait time for a user experiencing poor response time

Managing Excessive I/O

There are several ways to handle excessive I/O waits. In the order of effectiveness, these are as follows:

Reduce the I/O activity by SQL tuning.
Reduce the need to do I/O by managing the workload.
Add more disks to reduce the number of I/Os per disk.
Alleviate I/O hot spots by redistributing I/O across existing disks.

See Also:
"I/O Configuration and Design"

The first course of action should be to find opportunities to reduce I/O. Examine the SQL statements being run by sessions waiting for these events, as well as statements causing high physical I/Os from V$SQLAREA. Factors that can adversely affect the execution plans causing excessive I/O include the following:

Improperly optimized SQL
Missing indexes
High degree of parallelism for the table (skewing the optimizer towards scans)
Lack of accurate statistics for the optimizer

Inadequate I/O Distribution

Besides reducing I/O, also examine the I/O distribution of files across the disks. Is I/O distributed uniformly across the disks, or are there hot spots on some disks? Are the number of disks is sufficient to meet the I/O needs of the database?

See the total I/O operations (reads and writes) by the database, and compare those with the number of disks used. Remember to include the I/O activity of LGWR and ARCH processes.

Finding the SQL Statement executed by Sessions Waiting for I/O

Use the following query to find the SQL statement for sessions waiting for I/O:

SELECT s.sql_hash_value
  FROM V$SESSION s, V$SESSION_WAIT w
 WHERE w.event LIKE `db file%read'
   AND w.sid = s.sid ;

Finding the Object Requiring I/O

Use the following query to find the object being accessed:

SELECT segment_owner, segment_name
  FROM DBA_EXTENTS
 WHERE file_id = &p1
   AND &p2 between block_id AND block_id + blocks - 1 ;

db file sequential read

This event signifies that the user process is reading buffers into the SGA buffer cache and is waiting for a physical I/O call to return. This call differs from a scattered read, because a sequential read is reading data into contiguous memory space. A sequential read is usually a single-block read.

Single block I/Os are usually the result of using indexes. Rarely, full table scan calls could get truncated to a single block call due to extent boundaries, or buffers already present in the buffer cache. These waits would also show up as 'db file sequential read'.

V$SESSION_WAIT Parameter Columns

P1 - The absolute file number

P2 - The block being read

P3 - The number of blocks (should be 1)

See Also:

"db file scattered read" for information on managing excessive I/O, inadequate I/O distribution, and finding the SQL causing the I/O and the segment the I/O is performed on

Figure 22-1 depicts the differences between the following wait events:

db file sequential read (single block read into one SGA buffer)
db file scattered read (multiblock read into many discontinuous SGA buffers)
direct read (single or multiblock read into the PGA, bypassing the SGA)

Figure 22-1 Scattered Read, Sequential Read, and Direct Path Read

Text description of stu81196.gif follows

Text description of the illustration stu81196.gif

direct path read

When a session is reading buffers from disk directly into the PGA (opposed to the buffer cache in SGA), it waits on this event. If the I/O subsystem does not support asynchronous I/Os, then each wait corresponds to a physical read request.

If the I/O subsystem supports asynchronous I/O, then the process is able to overlap issuing read requests with processing the blocks already existing in the PGA. When the process attempts to access a block in the PGA that has not yet been read from disk, it then issues a wait call and updates the statistics for this event. Hence, the number of waits is not necessarily the same as the number of read requests (unlike 'db file scattered read' and 'db files sequential read').

V$SESSION_WAIT Parameter Columns

P1 - File_id for the read call

P2 - Start block_id for the read call

P3 - Number of blocks in the read call

Causes

This happens in the following situations:

The sorts are too large to fit in memory and go to disk.
Parallel slaves are used for scanning data.
The server process is processing buffers faster than the I/O system can return the buffers. This can indicate an overloaded I/O system.

Actions

The file_id shows if the reads are for an object in TEMP tablespace (sorts to disk) or full table scans by parallel slaves. This is the biggest wait for large data warehouse sites. However, if the workload is not a DSS workload, then examine why this is happening.

Sorts to Disk

Examine the SQL statement currently being run by the session experiencing waits to see what is causing the sorts. Query V$SORT_USAGE to find the SQL statement that is generating the sort. Also query the statistics from V$SESSTAT for the session to determine the size of the sort. See if it is possible to reduce the sorting by tuning the SQL statement. If WORKAREA_SIZE_POLICY is MANUAL, then consider increasing the SORT_AREA_SIZE for the system (if the sorts are not too big) or for individual processes. If WORKAREA_SIZE_POLICY is AUTO, then investigate whether to increase PGA_AGGREGATE_TARGET.

See Also:

"Configuring the PGA Working Memory"

Full Table Scans

If tables are defined with a high degree of parallelism, then this could skew the optimizer to use full table scans with parallel slaves. Check the object being read into using the direct path reads, as well as the SQL statement being run by the query-coordinator. If the full table scans are a valid part of the workload, then ensure that the I/O subsystem is sized adequately for the degree of parallelism.

Hash Area Size

For query plans that call for a hash join, excessive I/O could result from having HASH_AREA_SIZE too small. If WORKAREA_SIZE_POLICY is MANUAL, then consider increasing the HASH_AREA_SIZE for the system or for individual processes. If WORKAREA_SIZE_POLICY is AUTO, then investigate whether to increase PGA_AGGREGATE_TARGET.

See Also:

direct path write

When a process is writing buffers directly from PGA (as opposed to the DBWR writing them from the buffer cache), the process waits on this event for the write call to complete. Operations that could perform direct path writes include when a sort goes to disk, during parallel DML operations, direct-path INSERTs, parallel create table as select, and some LOB operations.

Like direct path reads, the number of waits is not the same as number of write calls issued if the I/O subsystem supports asynchronous writes. The session waits if it has processed all buffers in the PGA and is unable to continue work until an I/O request completes.

V$SESSION_WAIT Parameter Columns

P1 - File_id for the write call

P2 - Start block_id for the write call

P3 - Number of blocks in the write call

Causes

This happen in the following situations:

Sorts are too large to fit in memory and are going to disk
Parallel DML are issued to create/populate objects

Actions

For large sorts see "Sorts to Disk".

For parallel DML, check the I/O distribution across disks and make sure that the I/O subsystem is adequately sized for the degree of parallelism.

enqueue

Enqueues are locks that serialize access to database resources. This event indicates that the session is waiting for a lock that is held by another session.

V$SESSION_WAIT Parameter Columns

P1 - Name and type of the lock

P2 - Resource identifier ID1 for the lock

P3 - Resource identifier ID2 for the lock

Comparison with V$LOCK Columns

V$LOCK.ID1 = P2

V$LOCK.ID2 = P3

Performing the following SQL transformation of the P1 column results in the same value displayed in V$LOCK.TYPE:

V$LOCK.TYPE = chr(bitand(P1,-16777216)/16777215)|| 
chr(bitand(P1,16711680)/65535)

Actions

The appropriate action depends on the type of enqueue.

ST enqueue

If the contended-for enqueue is the ST enqueue, then the problem is most likely dynamic space allocation. Oracle dynamically allocates an extent to a segment when there is no more free space available in the segment. This enqueue is only used for dictionary managed tablespaces.

To solve contention on this resource:

Check to see whether the temporary (that is, sort) tablespace uses TEMPFILES. If not, then switch to using TEMPFILES.
Switch to using locally managed tablespaces if the tablespace that contains segments that are growing dynamically.

See Also:
Oracle9i Database Concepts for detailed information on TEMPFILEs and locally managed tablespaces
If it is not possible to switch to locally managed tablespaces, then ST enqueue resource usage can be decreased by changing the next extent sizes of the growing objects to be large enough to avoid constant space allocation. To determine which segments are growing constantly, monitor the EXTENTS column of the DBA_SEGMENTS view for all SEGMENT_NAMEs over time to identify which segments are growing and how quickly.
Preallocate space in the segment (for example, by allocating extents using the ALTER TABLE ALLOCATE EXTENT SQL statement).

Other Locks

Query V$LOCK to find the sessions holding the lock. For every session waiting for the event enqueue, there is a row in V$LOCK with REQUEST <> 0. Therefore, use either of the two queries to find the sessions holding the locks and waiting for the locks.

SELECT DECODE(l.request,0,'Holder: ','Waiter: ')||sid sess
     , id1, id2, lmode, request, type
  FROM V$LOCK l
 WHERE (l.id1,l.id2,l.type) IN
      ( SELECT w.p2, w.p3,   chr(bitand(w.p1,-16777216)/16777215)
                          || chr(bitand(w.p1,16711680)/65535)
          FROM V$SESSION_WAIT w
         WHERE w.wait_time = 0 AND w.event = 'enqueue' )
ORDER BY l.id1, l.request;

SELECT DECODE(request,0,'Holder: ','Waiter: ')||sid sess
     , id1, id2, lmode, request, type
  FROM V$LOCK
 WHERE (id1, id2, type) IN
             (SELECT id1, id2, type FROM V$LOCK WHERE lmode = 0)
 ORDER BY id1, request;

See Also:

"Dynamic Performance Views for Tuning" for more information on using V$LOCK
Oracle9i Database Reference for more information on enqueues

Other Locks - HW enqueue

The HW enqueue is used to serialize the allocation of space above the high-water mark of a segment.

V$SESSION_WAIT.P2 / V$LOCK.ID1 is the tablespace number

V$SESSION_WAIT.P2 / V$LOCK.ID2 is the relative dba of segment header of the object for which space is being allocated.

If this is a point of contention for an object, then manual allocation of extents solves the problem.

Other Locks - TM enqueue

The most common reason for waits on TM locks tend to involve foreign key constraints where the constrained columns are not indexed. Index the foreign key columns to avoid this problem.

Other Locks - TX enqueue

These are acquired exclusive when a transaction initiates its first change and held until the transaction does a COMMIT or ROLLBACK.

TX in mode 6: occurs when a session is waiting for a row level lock that is already held by another session. This occurs when one user is updating (or deleting) a row, which another session wishes to update or delete.

The solution is to have the first session already holding the lock perform a COMMIT or ROLLBACK.
TX in mode 4 can occur if the session is waiting for an ITL (interested transaction list) slot in a block. This happens when the session wants to lock a row in the block but one or more other sessions have rows locked in the same block, and there is no free ITL slot in the block. Usually, Oracle dynamically adds another ITL slot. This may not be possible if there is insufficient free space in the block to add an ITL. If so, the session waits for a slot with a TX enqueue in mode 4.

The solution is to increase the number of ITLs available, either by changing the INITTRANS or MAXTRANS for the table (either by using an ALTER statement, or by recreating the table with the higher values).
TX in mode 4 can also occur if a session is waiting due to potential duplicates in UNIQUE index. If two sessions try to insert the same key value the second session has to wait to see if an ORA-0001 should be raised or not.

The solution is to have the first session already holding the lock perform a COMMIT or ROLLBACK.
TX in mode 4 is also possible if the session is waiting due to shared bitmap index fragment. Bitmap indexes index key values and a range of ROWIDs. Each 'entry' in a bitmap index can cover many rows in the actual table. If two sessions want to update rows covered by the same bitmap index fragment, then the second session waits for the first transaction to either COMMIT or ROLLBACK by waiting for the TX lock in mode 4.

free buffer waits

This wait event indicates that a server process was unable to find a free buffer and has posted the database writer to make free buffers by writing out dirty buffers. A dirty buffer is a buffer whose contents have been modified. Dirty buffers are freed for reuse when DBWR has written the blocks to disk.

Causes

DBWR may not be keeping up with writing dirty buffers in the following situations:

The I/O system is slow.
There are resources it is waiting for, such as latches.
The buffer cache is so small that DBWR spends most of it's time cleaning out buffers for server processes.
The buffer cache is so big that one DBWR process is not enough to free enough buffers in the cache to satisfy requests.

Actions

If this event occurs frequently, then examine the session waits for DBWR to see whether there is anything delaying DBWR.

Writes

If it is waiting for writes, then determine what is delaying the writes and fix it. Check the following:

Examine V$FILESTAT to see where most of the writes are happening.
Examine the host OS statistics for the I/O system. Are the write times acceptable?

If I/O is slow:

Consider using faster I/O alternatives to speed up write times.
Spread the I/O activity across large number of spindles (disks) and controllers.

See Also:
Chapter 15, "I/O Configuration and Design" for information on balancing I/O

Cache is Too Small

It is possible DBWR is very active because of the cache is too small. Investigate whether this is a probable cause by looking to see if the buffer cache hit ratio is low. Also use the V$DB_CACHE_ADVICE view to determine whether a larger cache size would be advantageous.

See Also:

"Sizing the Buffer Cache"

Cache Is Too Big for One DBWR

If the cache size is adequate and the I/O is already evenly spread, then you can potentially modify the behavior of DBWR by using asynchronous I/O or by using multiple database writers.

Consider Multiple Database Writer (DBWR) Processes or I/O Slaves

Configuring multiple database writer processes, or using I/O slaves, is useful when the transaction rates are high or when the buffer cache size is so large that a single DBWn process cannot keep up with the load.

DB_WRITER_PROCESSES

The DB_WRITER_PROCESSES initialization parameter lets you configure multiple database writer processes (from DBW0 to DBW9). Configuring multiple DBWR processes distributes the work required to identify buffers to be written, and it also distributes the I/O load over these processes.

DBWR_IO_SLAVES

If it is not practical to use multiple DBWR processes, then Oracle provides a facility whereby the I/O load can be distributed over multiple slave processes. The DBWR process is the only process that scans the buffer cache LRU list for blocks to be written out. However, the I/O for those blocks is performed by the I/O slaves. The number of I/O slaves is determined by the parameter DBWR_IO_SLAVES. I/O slaves are also useful when asynchronous I/O is not available, because the multiple I/O slaves simulate nonblocking, asynchronous requests by freeing DBWR to continue identifying blocks in the cache to be written.

DBWR I/O slaves are allocated immediately following database open when the first I/O request is made. The DBWR continues to perform all of the DBWR-related work, apart from performing I/O. I/O slaves simply perform the I/O on behalf of DBWR. The writing of the batch is parallelized between the I/O slaves.

Note:

Implementing DBWR_IO_SLAVES requires that extra shared memory be allocated for I/O buffers and request queues. Multiple DBWR processes cannot be used with I/O slaves. Configuring I/O slaves forces only one DBWR process to start.

Choosing Between Multiple DBWR Processes and I/O Slaves

Configuring multiple DBWR processes benefits performance when a single DBWR process is unable to keep up with the required workload. However, before configuring multiple DBWR processes, check whether asynchronous I/O is available and configured on the system. If the system supports asynchronous I/O but it is not currently used, then enable asynchronous I/O to see if this alleviates the problem. If the system does not support asynchronous I/O, or if asynchronous I/O is already configured and there is still a DBWR bottleneck, then configure multiple DBWR processes.

Note:

If asynchronous I/O is not available on your platform, then asynchronous I/O can be disabled by setting the DISK_ASYNCH_IO initialization parameter to false.

Using multiple DBWRs parallelizes the gathering and writing of buffers. Therefore, multiple DBWn processes should deliver more throughput than one DBWR process with the same number of I/O slaves. For this reason, the use of I/O slaves has been deprecated in favor of multiple DBWR processes. I/O slaves should only be used if multiple DBWR processes cannot be configured.

See Also:

Chapter 17, "Configuring Instance Recovery Performance" for details on tuning checkpoints

latch free

A latch is a low-level internal lock used by Oracle to protect memory structures. The latch free event is updated when a server process attempts to get a latch, and the latch is unavailable on the first attempt.

See Also:

Oracle9i Database Concepts for more information on latches and internal locks

Actions

This event should only be a concern if latch waits are a significant portion of the wait time on the system as a whole, or for individual users experiencing problems.

To help determine the cause of this wait event, identify the latch(es) contended for. There are many types of latches used for different purposes. For example, the shared pool latch protects certain actions in the shared pool, and the cache buffers LRU chain protects certain actions in the buffer cache.
Examine the resource usage for related resources. For example, if the library cache latch is heavily contended for, then examine the hard and soft parse rates.
Examine the SQL statements for the sessions experiencing latch contention to see if there is any commonality.

V$SESSION_WAIT Parameter Columns

P1 - Address of the latch

P2 - Latch number

P3 - Number of times process has already slept, waiting for the latch

Example: Find Latches Currently Waiting For

SELECT n.name, SUM(w.p3) Sleeps
  FROM V$SESSION_WAIT w, V$LATCHNAME n
 WHERE w.event = `latch free'
   AND w.p2 = n.latch#
 GROUP BY n.name;

Table 22-2 Latch Free Wait Event

Latch SGA Area Possible Causes Look For:

Shared pool, library cache

Shared pool

Lack of statement reuse

Statements not using bind variables

Insufficient size of application cursor cache

Cursors closed explicitly after each execution

Frequent logon/logoffs

Underlying object structure being modified (for example truncate)

Shared pool too small

Sessions (in V$SESSTAT) with high:

parse time CPU
parse time elapsed
Ratio of parse count (hard) / execute count
Ratio of parse count (total) / execute count

Cursors (in V$SQLAREA/V$SQL) with:

High ratio of PARSE_CALLS / EXECUTIONS
EXECUTIONS = 1 differing only in literals in the WHERE clause (that is, no bind variables used)
High RELOADS
High INVALIDATIONS
Large (> 1mb) SHARABLE_MEM

cache buffers lru chain

Buffer cache LRU lists

Excessive buffer cache throughput. For example, many cache-based sorts, inefficient SQL that accesses incorrect indexes iteratively (large index range scans), or many full table scans

DBWR not keeping up with the dirty workload; hence, foreground process spends longer holding the latch looking for a free buffer

Cache may be too small

Statements with very high LIO/PIO using unselective indexes

cache buffers chains

Buffer cache buffers

Repeated access to a block (or small number of blocks), known as 'hot block'

Sequence number generation code that updates a row in a table to generate the number, rather than using a sequence number generator

Identify the segment the hot block belongs to

Shared Pool and Library Cache Latch Contention

A main cause of shared pool or library cache latch contention is parsing. There are a number of techniques that can be used to identify unnecessary parsing and a number of types of unnecessary parsing:

Unshared SQL

This method identifies similar SQL statements that could be shared if literals were replaced with bind variables. The idea is to either:

Manually inspect SQL statements that have only one execution to see whether they are similar:
```
SELECT sql_text
  FROM V$SQLAREA
 WHERE executions = 1
 ORDER BY sql_text;
```
Or, automate this process by grouping together what may be similar statements. Do this by estimating the number of bytes of a SQL statement which will likely be the same, and group the SQL statements by that many bytes. For example, the example below groups together statements that differ only after the first 60 bytes.
```
SELECT SUBSTR(sql_text,1, 60), COUNT(*)
  FROM V$SQLAREA
 WHERE executions = 1
 GROUP BY SUBSTR(sql_text, 1, 60)
 HAVING COUNT(*) > 1;
```

Reparsed Sharable SQL

check the V$SQLAREA view. Enter the following query:

SELECT SQL_TEXT, PARSE_CALLS, EXECUTIONS 
FROM V$SQLAREA 
ORDER BY PARSE_CALLS;

When the PARSE_CALLS value is close to the EXECUTIONS value for a given statement, you might be continually reparsing that statement. Tune the statements with the higher numbers of parse calls.

By Session

Identify unnecessary parse calls by identifying the session in which they occur. It might be that particular batch programs or certain types of applications do most of the reparsing. To do this, run the following query:

column sid format 99999
column name format a20
SELECT ss.sid, sn.name, ss.value
  FROM V$SESSTAT  ss
     , V$STATNAME sn
 WHERE name IN ('parse count (hard)','execute count')
   AND ss.statistic# = sn.statistic#
   AND ss.value > 0
 ORDER BY value, sid;

The result is a list of all sessions and the amount of reparsing they do. For each system identifier (SID), go to V$SESSION to find the name of the program that causes the reparsing. The output is similar to the following:

   SID NAME                      VALUE
------ -------------------- ----------
     7 parse count (hard)            1
     8 parse count (hard)            3
     7 execute count                20
     6 parse count (hard)           26
    11 parse count (hard)           84
     6 execute count               325
    11 execute count              1619
     8 execute count             12690

cache buffer lru chain

The cache buffer lru chain latches protect the lists of buffers in the cache. When adding, moving, or removing a buffer from a list, a latch must be obtained.

For symmetric multiprocessor (SMP) systems, Oracle automatically sets the number of LRU latches to a value equal to one half the number of CPUs on the system. For non-SMP systems, one LRU latch is sufficient.

Contention for the LRU latch can impede performance on SMP machines with a large number of CPUs. LRU latch contention is detected by querying V$LATCH, V$SESSION_EVENT, and V$SYSTEM_EVENT. To avoid contention, consider bypassing the buffer cache or redesigning the application.

cache buffers chains

The cache buffers chains latches are used to protect a buffer list in the buffer cache. These latches are used when searching for, adding, or removing a buffer from the buffer cache. Contention on this latch usually means that there is a block that is greatly contended for (that is, 'hot)' block.

To identify the heavily accessed buffer chain, and hence the contended for block, look at latch statistics for the cache buffers chains latches using the view V$LATCH_CHILDREN. If there is a specific cache buffers chains child latch that has many more GETS, MISSES, and SLEEPS when compared with the other child latches, then this is the contended for child latch.

This latch has a memory address, identified by the ADDR column. Use the value in the ADDR column joined with the V$BH view to identify the blocks protected by this latch. For example, given the address (V$LATCH_CHILDREN.ADDR) of a heavily contended latch, this queries the file and block numbers:

SELECT file#, dbablk, class, state
  FROM X$BH
 WHERE HLADDR='address of latch';

There are many blocks protected by each latch. One of these buffers will likely be the hot block. Perform this query a number of times, and identify the block that consistently appears in the output, using the combination of file number (file#) and block number (dbablk). This is most likely the hot block. After the hot block has been identified, query DBA_EXTENTS using the file number and block number, to identify the segment.

See Also:

"Finding the Object Requiring I/O" for instructions on how to do this

log buffer space

This event occurs when server processes are waiting for free space in the log buffer, because you are writing redo to the log buffer faster than LGWR can write it out.

Actions

Modify the redo log buffer size. If the size of the log buffer is already reasonable, then ensure that the disks on which the online redo logs reside do not suffer from I/O contention. The log buffer space wait event could be indicative of either disk I/O contention on the disks where the redo logs reside, or of a too-small log buffer. Check the I/O profile of the disks containing the redo logs to investigate whether the I/O system is the bottleneck. If the I/O system is not a problem, then the redo log buffer could be too small. Increase the size of the redo log buffer until this event is no longer significant.

log file switch

There are two wait events commonly encountered:

log file switch (archiving needed)
log file switch (checkpoint incomplete)

In both of the events, the LGWR is unable to switch into the next online redo log, and all the commit requests wait for this event.

Actions

For the log file switch (archiving needed) event, examine why the archiver is unable to archive the logs in a timely fashion. It could be due to the following:

Archive destination is running out of free space.
Archiver is not able to read redo logs fast enough (contention with the LGWR).
Archiver is not able to write fast enough (contention on the archive destination, or not enough ARCH processes).

Depending on the nature of bottleneck, you might need to redistribute I/O or add more space to the archive destination to alleviate the problem. For the log file switch (checkpoint incomplete) event:

Check if DBWR is slow, possibly due to an overloaded or slow I/O system. Check the DBWR write times, check the I/O system, and distribute I/O if necessary.

See Also:
Chapter 15, "I/O Configuration and Design"
Check if there are too few, or too small redo logs. If you have a few and/or small redo logs (for example two x 100k logs), and your system produces enough redo to cycle through all of the logs before DBWR has been able to complete the checkpoint, then increase the size and/or number of redo logs.

See Also:
"Sizing Redo Log Files"

log file sync

When a user session commits (or rolls back), the session's redo information must be flushed to the redo logfile by LGWR. The server process performing the COMMIT or ROLLBACK waits under this event for the write to the redo log to complete.

Actions

If this event's waits constitute a significant wait on the system or a significant amount of time waited by a user experiencing response time issues or on a system, then examine the time per wait.

If the average time waited is low, but the number of waits are high, then the application might be committing after every INSERT, rather than batching COMMITs. Applications can reduce the wait by committing after 50 rows, rather than every row.

If the time per wait is high, then examine the session waits for the log writer and see what it is spending most of it's time doing and waiting for. If the waits are because of slow I/O, then try the following:

Reduce other I/O activity on the disks containing the redo logs, or use dedicated disks.
Alternate redo logs on different disks to minimize the effect of the archiver on the log writer.
Move the redo logs to faster disks or a faster I/O subsystem (for example, switch from RAID 5 to RAID 1).
Consider using raw devices (or simulated raw devices provided by disk vendors) to speed up the writes.
Depending on the type of application, it might be possible to batch COMMITs by committing every N rows, rather than every row, so that fewer log file syncs are needed.

rdbms ipc reply

This event is used to wait for a reply from one of the background processes.

Idle Wait Events

These events indicate that the server process is waiting because it has no work. This usually implies that if there is a bottleneck, then the bottleneck is not for database resources.

The majority of the idle events should be ignored when tuning, because they do not indicate the nature of the performance bottleneck. Some idle events can be useful in indicating what the bottleneck is not. An example of this type of event is the most commonly encountered idle wait-event 'SQL Net message from client'. This and other idle events (and their categories) are listed below.

Table 22-3 Idle Wait Events

Wait Name	Background Process Idle Event	User Process Idle Event	Parallel Query Idle Event	Shared Server Idle Event	Oracle Real Application Clusters Idle Event
`dispatcher timer`				X
`lock manager wait for remote message`					X
`pipe get`		X
`pmon timer`	X
`PX Idle Wait`			X
`PX Deq Credit: need buffer`			X
`PX Deq Credit: send blkd`			X
`rdbms ipc message`	X
`smon timer`	X
`SQL*Net message from client`		X
`virtual circuit status`				X

Note:

If Statspack is installed, then it is also possible to query the STATS$IDLE_EVENT table, which contains a list of idle events.

See Also:

Oracle9i Database Reference for explanations of each idle wait event

22 Instance Tuning

Performance Tuning Principles

Baselines

The Symptoms and the Problems

When to Tune

Proactive Monitoring

Bottleneck Elimination: Tuning

Performance Tuning Steps

Define the Problem

Examine the Host System

CPU Usage

Non-Oracle Processes

Oracle Processes

Oracle CPU Statistics

Interpreting CPU Statistics

Detecting I/O Problems

Network

Examine the Oracle Statistics

Wait Events

System Statistics

V$SYSSTAT

V$FILESTAT

V$ROLLSTAT

V$ENQUEUE_STAT

V$LATCH

Implement and Measure Change

Interpreting Oracle Statistics

Examine Load

Changing Load

High Rates of Activity

Using Wait Event Statistics to Drill Down to Bottlenecks

Table of Wait Events and Potential Causes

Additional Statistics

Redo Log Space Requests Statistic

Read Consistency

Table Fetch by Continued Row

Parse-related Statistics

Wait Events

SQL*Net

SQL*Net message from client

Network Bottleneck

Resource Bottleneck on the Client Process

SQL*Net Message from dblink

buffer busy waits

V$SESSION_WAIT Parameter Columns

Causes

Actions

segment header

data block

undo header

undo block

db file scattered read

V$SESSION_WAIT Parameter Columns

Actions

Managing Excessive I/O

Inadequate I/O Distribution

Finding the SQL Statement executed by Sessions Waiting for I/O

Finding the Object Requiring I/O

db file sequential read

V$SESSION_WAIT Parameter Columns

Figure 22-1 Scattered Read, Sequential Read, and Direct Path Read

direct path read

V$SESSION_WAIT Parameter Columns

Causes

Actions

Sorts to Disk

Full Table Scans

Hash Area Size

direct path write

V$SESSION_WAIT Parameter Columns

Causes

Actions

enqueue

V$SESSION_WAIT Parameter Columns

Comparison with V$LOCK Columns

Actions

ST enqueue

Other Locks

Other Locks - HW enqueue

Other Locks - TM enqueue

22
Instance Tuning