Oracle7 Tuning, release 7.3.3 Go to Product Documentation Library
Library
Go to books for this product
Product
Go to Contents for this book
Contents
Go to Index
Index



Go to previous file in sequence Go to next file in sequence

Tuning Resource Contention


Contention occurs when multiple processes try to access the same resource simultaneously. Some processes must then wait for access to various database structures. Topics discussed in this chapter include:

Understanding Contention Issues

Symptoms of resource contention problems can be found in V$SYSTEM_EVENT. This view reveals various system problems that may be impacting performance, problems such as latch contention, buffer contention, I/O contention. It is important to remember that these are only symptoms of problems--not the actual causes.

For example, by looking at V$SYSTEM_EVENT you might notice lots of buffer busy waits. It may be that many processes are inserting into the same block, but must wait for each other before they can insert. The solution might be to introduce free lists for the object in question.

Buffer busy waits may also have caused some latch free waits. Since most of these waits were caused by misses on the cache buffer hash chain latch, this was also a side effect of trying to insert into the same block. Rather than increasing SPINCOUNT to reduce the latch free waits (a symptom), you should change the object to allow for multiple processes to insert into free blocks. This approach will effectively reduce contention.

How to Detect Contention Problems

If the system has idle time, start your investigation by checking V$SYSTEM_EVENT. Examine the events with the highest average wait time, then take appropriate action on each.

For example, if there is a high number of latch free waits, look in V$LATCH to see which latch is the problem. For excessive buffer busy waits, look in V$WAITSTAT to see which block type has the highest wait count and the highest wait time. Look in V$SESSION_WAIT for cache buffer waits so you can decode the file and block number of an object.

The rest of this chapter describes common problems. Remember that the different forms of contention are symptoms which can be fixed by making changes in one of two places:

Sometimes you have no alternative but to change the application in order to overcome performance constraints.

How to Solve Contention Problems

The rest of the sections in this chapter provide strategies for solving contention problems. These include:

Reducing Contention for Rollback Segments

In this section, you will learn how to reduce contention for rollback segments. The following issues are discussed:

Identifying Rollback Segment Contention

Contention for rollback segments is reflected by contention for buffers that contain rollback segment blocks. You can determine whether contention for rollback segments is reducing performance by checking the dynamic performance table V$WAITSTAT.

V$WAITSTAT contains statistics that reflect block contention. By default, this table is only available to the user SYS and to other users who have SELECT ANY TABLE system privilege, such as SYSTEM. These statistics reflect contention for different classes of block:

system undo header

 

the number of waits for buffers containing header blocks of the SYSTEM rollback segment

 

system undo block

 

the number of waits for buffers containing blocks other than header blocks of the SYSTEM rollback segment

 

undo header

 

the number of waits for buffers containing header blocks of rollback segments other than the SYSTEM rollback segment

 

undo block

 

the number of waits for buffers containing blocks other than header blocks of rollback segments other than the SYSTEM rollback segment

 

Use the following query to monitor these statistics over a period of time while your application is running:

SELECT class, count
FROM v$waitstat
WHERE class IN ('system undo header', 'system undo block',
'undo header', 'undo block');

The result of this query might look like this:

CLASS              COUNT
------------------ ----------
system undo header 2089
system undo block 633
undo header 1235
undo block 942

Compare the number of waits for each class of block with the total number of requests for data over the same period of time. You can monitor the total number of requests for data over a period of time with this query:

SELECT SUM(value)
FROM v$sysstat
WHERE name IN ('db block gets', 'consistent gets');

The output of this query might look like this:

SUM(VALUE)
----------
929530

The information in V$SYSSTAT can also be obtained through SNMP.

If the number of waits for any class is greater than 1% of the total number of requests, you should consider creating more rollback segments to reduce contention.

Creating Rollback Segments

To reduce contention for buffers containing rollback segment blocks, create more rollback segments. The following table shows some general guidelines for choosing how many rollback segments to allocate based on the number of concurrent transactions on your database. These guidelines are appropriate for most application mixes.

Table 16-1: Choosing a Number of Rollback Segments
Number of
Current Transactions(n)
 
Recommended Number of
Rollback Segments
 

n < 16

 

4

 

16 <= n < 32

 

8

 

32 <= n

 

n/4

 

Reducing Contention for Multi-Threaded Server Processes

In this section, you will learn how to reduce contention for certain processes used by the Oracle's multi-threaded server architecture:

Reducing Contention for Dispatcher Processes

This section discusses how to identify contention for dispatcher processes and how to add dispatcher processes.

Identifying Contention for Dispatcher Processes

Contention for dispatcher processes can be reflected by either of these symptoms:

Examining Busy Rates for Dispatcher Processes V$DISPATCHER contains statistics reflecting the activity of dispatcher processes. By default, this table is only available to the user SYS and to other users who have SELECT ANY TABLE system privilege, such as SYSTEM. These columns reflect busy rates for dispatcher processes:

IDLE

 

This column shows the idle time for the dispatcher process in hundredths of a second.

 

BUSY

 

This column shows the busy time for the dispatcher process in hundredths of a second.

 

Use the following query to monitor these statistics over a period of time while your application is running:

SELECT network                               "Protocol",
SUM(busy) / ( SUM(busy) + SUM(idle) ) "Total Busy Rate"
FROM v$dispatcher
GROUP BY network;

This query returns the total busy rate for the dispatcher processes of each protocol; that is, the percentage of time the dispatcher processes of each protocol are busy. The result of this query might look like this:

Protocol  Total Busy Rate
-------- ---------------
decnet .004589828
tcp .029111042

From this result, you can make these observations:

If the database is only in use 8 hours per day, statistics need to be normalized by the effective work times. You cannot simply look at statistics from the time the instance started; rather, you must check statistics relevant to the workload you are applying. Thus, if the dispatcher processes for a specific protocol are busy more than 50% of the effective work time, then by adding dispatcher processes you may be able to improve performance for users connected to Oracle using that protocol.

Examining Wait Times for Dispatcher Process Response Queues V$QUEUE contains statistics reflecting the response queue activity for dispatcher processes. By default, this table is only available to the user SYS and to other users who have SELECT ANY TABLE system privilege, such as SYSTEM. These columns show wait times for responses in the queue:

WAIT

 

This column shows the total waiting time, in hundredths of a second, for all responses that have ever been in the queue.

 

TOTALQ

 

This column shows the total number of responses that have ever been in the queue.

 

Use the following query to monitor these statistics occasionally while your application is running:

SELECT network     "Protocol",
DECODE( SUM(totalq), 0, 'No Responses',
SUM(wait)/SUM(totalq) || ' hundredths of seconds')
"Average Wait Time per Response"
FROM v$queue q, v$dispatcher d
WHERE q.type = 'DISPATCHER'
AND q.paddr = d.paddr
GROUP BY network;

This query returns the average time, in hundredths of a second, that a response waits in each response queue for a dispatcher process to route it to a user process. This query uses the V$DISPATCHER table to group the rows of the V$QUEUE table by network protocol. The query also uses the DECODE syntax to recognize those protocols for which there have been no responses in the queue. The result of this query might look like this:

Protocol  Average Wait Time per Response
-------- ------------------------------
decnet .1739130 hundredths of seconds
tcp No Responses

From this result, you can tell that a response in the queue for DECNET dispatcher processes waits an average of 0.17 hundredths of a second and that there have been no responses in the queue for TCP dispatcher processes.

If the average wait time for a specific network protocol continues to increase steadily as your application runs, then by adding dispatcher processes you may be able to improve performance of those user processes connected to Oracle using that protocol.

Adding Dispatcher Processes

To add dispatcher processes while Oracle is running, use the MTS_DISPATCHERS parameter of the ALTER SYSTEM command.

The total number of dispatcher processes across all protocols is limited by the value of the initialization parameter MTS_MAX_DISPATCHERS. You may need to increase this value before adding dispatcher processes. The default value of this parameter is 5 and the maximum value varies depending on your operating system.

See Also: Oracle7 Server Administrator's Guide for more information on adding dispatcher processes.

Reducing Contention for Shared Server Processes

This section discusses how to identify contention for shared server processes and how to increase the maximum number of shared server processes.

Identifying Contention for Shared Server Processes

Contention for shared server processes can be reflected by a steady increase in waiting time for requests in the request queue. The dynamic performance table V$QUEUE contains statistics reflecting the request queue activity for shared server processes. By default, this table is only available to the user SYS and to other users who have SELECT ANY TABLE system privilege, such as SYSTEM. These columns show wait times for requests in the queue:

WAIT

 

This column shows the total waiting time, in hundredths of a second, for all requests that have ever been in the queue.

 

TOTALQ

 

This column shows the total number of requests that have ever been in the queue.

 

Use the following query to monitor these statistics occasionally while your application is running:

SELECT DECODE( totalq, 0, 'No Requests',
wait/totalq || ' hundredths of seconds')
"Average Wait Time Per Requests"
FROM v$queue
WHERE type = 'COMMON';

This query returns the total wait time for all requests and total number of requests for the request queue. The result of this query might look like this:

Average Wait Time per Request
-----------------------------
.090909 hundredths of seconds

From the result, you can tell that a request waits an average of 0.09 hundredths of a second in the queue before it is processed.

You can also determine how many shared server processes are currently running by issuing this query:

SELECT COUNT(*) "Shared Server Processes"
FROM v$shared_servers
WHERE status != 'QUIT';

The result of this query might look like this:

Shared Server Processes
-----------------------
10

Adding Shared Server Processes

Since Oracle automatically adds shared server processes if the load on existing ones increases drastically, you are unlikely to improve performance simply by explicitly adding more shared server processes. However, if the number of shared server processes has reached the limit established by the initialization parameter MTS_MAX_SERVERS and the average wait time in the requests queue is still increasing, you may be able to improve performance by increasing the MTS_MAX_SERVERS value. The default value of this parameter is 20 and the maximum value varies depending on your operating system. You can then either allow Oracle to automatically add shared server processes, or explicitly add shared processes through one of these means:

See Also: Oracle7 Server Administrator's Guide for more information on adding shared server processes.

Reducing Contention for Query Servers

This section describes how to detect and alleviate contention for query servers when using the parallel query option:

Identifying Query Server Contention

Statistics in the V$PQ_SYSSTAT view are useful for determining the appropriate number of query server processes for an instance. The statistics that are particularly useful are "Servers Busy", "Servers Idle", "Servers Started", and "Servers Shutdown".

Frequently, you will not be able to increase the maximum number of query servers for an instance because the maximum number is heavily dependent upon the capacity of your CPUs and your I/O bandwidth. However, if servers are continuously starting and shutting down, you should consider increasing the value of the parameter PARALLEL_MIN_SERVERS.

For example, if you have determined that the maximum number of concurrent query servers that your machine can manage is 100, you should set PARALLEL_MAX_SERVERS to 100. Next, determine how many query servers the average query needs, and how many queries are likely to be executed concurrently. For this example, assume you will have two concurrent queries with 20 as the average degree of parallelism. Thus at any given time there could be 80 query servers busy on an instance. Thus you should set the parameter PARALLEL_MIN_SERVERS to be 80.

Periodically examine V$PQ_SYSSTAT to determine if the 80 query servers for the instance are actually busy. To determine if the instance's query servers are active, issue the following query:

SELECT * FROM V$PQ_SYSSTAT 
WHERE statistic = "Servers Busy";
STATISTIC VALUE
--------------------- -----------
Servers Busy 70

Reducing Query Server Contention

If you find that typically there are fewer than PARALLEL_MIN_SERVERS busy at any given time, your idle query servers are additional system overhead that is not being used. Consider decreasing the value of the parameter PARALLEL_MIN_SERVERS. If you find that there are typically more query servers active than the value of PARALLEL_MIN_SERVERS and the "Servers Started" statistic is continuously growing, consider increasing the value of the parameter PARALLEL_MIN_SERVERS.

Reducing Contention for Redo Log Buffer Latches

Contention for redo log buffer access rarely inhibits database performance. However, Oracle provides you with methods to monitor and reduce any latch contention that does occur. This section covers:

Detecting Contention for Space in the Redo Log Buffer

When LGWR writes redo entries from the redo log buffer to a redo log file, user processes can then copy new entries over the entries that have been written to disk. LGWR normally writes fast enough to ensure that space is always available in the buffer for new entries, even when access to the redo log is heavy.

The statistic redo log space requests reflects the number of times a user process waits for space in the redo log buffer. This statistic is available through the dynamic performance table V$SYSSTAT. By default, this table is only available to the user SYS and to users granted SELECT ANY TABLE system privilege, such as SYSTEM. Use the following query to monitor these statistics over a period of time while your application is running:

SELECT name, value
FROM v$sysstat
WHERE name = 'redo log space requests';

The information in V$SYSSTAT can also be obtained through SNMP.

The value of redo log space requests should be near 0. If this value increments consistently, processes have had to wait for space in the buffer. This may be caused by the log buffer being too small, or it could be caused by checkpointing or log switching. Increase the size of the redo log buffer, if necessary, by changing the value of the initialization parameter LOG_BUFFER. The value of this parameter, expressed in bytes, must be a multiple of DB_BLOCK_SIZE. Alternatively, improve the checkpointing or archiving process.

Detecting Contention for Redo Log Buffer Latches

Access to the redo log buffer is regulated by latches. Two types of latch control access to the redo log buffer: the redo allocation latch and redo copy latches

The Redo Allocation Latch

The redo allocation latch controls the allocation of space for redo entries in the redo log buffer. To allocate space in the buffer, an Oracle user process must obtain the redo allocation latch. Since there is only one redo allocation latch, only one user process can allocate space in the buffer at a time. The single redo allocation latch enforces the sequential nature of the entries in the buffer.

After allocating space for a redo entry, the user process may copy the entry into the buffer. This is called "copying on the redo allocation latch". A process may only copy on the redo allocation latch if the redo entry is smaller than a threshold size.

The maximum size of a redo entry that can be copied on the redo allocation latch is specified by the initialization parameter LOG_SMALL_ENTRY_MAX_SIZE. The value of this parameter is expressed in bytes. The minimum, maximum, and default values vary depending on your operating system.

Redo Copy Latches

The copy latch is obtained first. Then the allocation latch is obtained, the allocation is performed, and the allocation latch is released. Next the copy is performed under the copy latch, and the copy latch is released. The allocation latch is thus held for only a very short period of time, and the system does not try to obtain the copy latch while holding the allocation latch.

If the redo entry is too large to copy on the redo allocation latch, the user process must obtain a redo copy latch before copying the entry into the buffer. While holding a redo copy latch, the user process copies the redo entry into its allocated space in the buffer and then releases the redo copy latch.

If your computer has multiple CPUs, your redo log buffer can have multiple redo copy latches. Multiple redo copy latches allow multiple processes to copy entries to the redo log buffer concurrently. The number of redo copy latches is determined by the initialization parameter LOG_SIMULTANEOUS_COPIES. The default value of LOG_SIMULTANEOUS_COPIES is the number of CPUs available to your Oracle instance.

On single-CPU computers, there should be no redo copy latches since only one process can by active at once. In this case, all redo entries are copied on the redo allocation latch, regardless of size.

Examining Redo Log Activity

Heavy access to the redo log buffer can result in contention for redo log buffer latches. Latch contention can reduce performance. Oracle collects statistics for the activity of all latches and stores them in the dynamic performance table V$LATCH. By default, this table is only available to the user SYS and to other users who have SELECT ANY TABLE system privilege, such as SYSTEM.

Each row in the V$LATCH table contains statistics for a different type of latch. The columns of the table reflect activity for different types of latch requests. The distinction between these types of requests is whether the requesting process continues to request a latch if it is unavailable:

willing-to-wait

 

If the latch requested with a willing-to-wait request is not available, the requesting process waits a short time and requests the latch again. The process continues waiting and requesting until the latch is available.

 

immediate

 

If the latch requested with an immediate request is not available, the requesting process does not wait, but continues processing.

 

These columns of the V$LATCH table reflect willing-to-wait requests:

GETS

 

This column shows the number of successful willing-to-wait requests for a latch.

 

MISSES

 

This column shows the number of times an initial willing-to-wait request was unsuccessful.

 

SLEEPS

 

This column shows the number of times a process waited and requested a latch after an initial willing-to-wait request.

 

For example, consider the case in which a process makes a willing-to-wait request for a latch that is unavailable. The process waits and requests the latch again and the latch is still unavailable. The process waits and requests the latch a third time and acquires the latch. This activity increments the statistics as follows:

These columns of the V$LATCH table reflect immediate requests:

IMMEDIATE GETS

 

This column shows the number of successful immediate requests for each latch.

 

IMMEDIATE MISSES

 

This column shows the number of unsuccessful immediate requests for each latch.

 

Use the following query to monitor the statistics for the redo allocation latch and the redo copy latches over a period of time:

SELECT ln.name, gets, misses, immediate_gets, immediate_misses
FROM v$latch l, v$latchname ln
WHERE ln.name IN ('redo allocation', 'redo copy')
AND ln.latch# = l.latch#;

The output of this query might look like this:

NAME               GETS     MISSES IMMEDIATE_GETS IMMEDIATE_MISSES
------------ ---------- ---------- -------------- ----------------
redo allo... 252867 83 0 0
redo copy 0 0 22830 0

From the output of the query, calculate the wait ratio for each type of request.

Contention for a latch may be affecting performance if either of these conditions is true:

If either of these conditions is true for a latch, try to reduce contention for that latch. These contention thresholds are appropriate for most operating systems, though some computers with many CPUs may be able to tolerate more contention without performance reduction.

Reducing Latch Contention

Most cases of latch contention occur when two or more Oracle processes concurrently attempt to obtain the same latch. Latch contention rarely occurs on single-CPU computers where only a single process can be active at once.

Reducing Contention for the Redo Allocation Latch

To reduce contention for the redo allocation latch, you should minimize the time that any single process holds the latch. To reduce this time, reduce copying on the redo allocation latch. Decreasing the value of the LOG_SMALL_ENTRY_MAX_SIZE initialization parameter reduces the number and size of redo entries copied on the redo allocation latch.

Reducing Contention for Redo Copy Latches

On multiple-CPU computers, multiple redo copy latches allow multiple processes to copy entries to the redo log buffer concurrently. The default value of LOG_SIMULTANEOUS_COPIES is the number of CPUs available to your Oracle instance.

If you observe contention for redo copy latches, add more latches. To increase the number of redo copy latches, increase the value of LOG_SIMULTANEOUS_COPIES. It can help to have up to twice as many redo copy latches as CPUs available to your Oracle instance.

Reducing Contention for the LRU Latch

The LRU (least recently used) latch controls the replacement of buffers in the buffer cache. For SMP systems, Oracle automatically sets the number of LRU latches to be one half the number of CPUs on the system. For non-SMP systems, one LRU latch is sufficient.

Contention for the LRU latch can impede performance on symmetric multiprocessor (SMP) machines with a large number of CPUs. You can detect LRU latch contention by querying V$LATCH, V$SESSION_EVENT, and V$SYSTEM_EVENT. To avoid contention, consider bypassing the buffer cache or redesigning the application.

You can specify the number of LRU latches on your system with the initialization parameter DB_BLOCK_LRU_LATCHES. This parameter sets the maximum value for the desired number of LRU latches. Each LRU latch controls a set of buffers; Oracle balances allocation of replacement buffers among the sets.

To select the appropriate value for DB_BLOCK_LRU_LATCHES, consider the following:

Note: You cannot dynamically change the number of sets during the lifetime of the instance.

Reducing Free List Contention

Free list contention can reduce the performance of some applications. This section covers:

Identifying Free List Contention

Contention for free lists is reflected by contention for free data blocks in the buffer cache. You can determine whether contention for free lists is reducing performance by querying the dynamic performance table V$WAITSTAT.

The V$WAITSTAT table contains block contention statistics. By default, this table is only available to the user SYS and to other users who have SELECT ANY TABLE system privilege, such as SYSTEM.

Use the following procedure to find the segment names and free lists which have contention:

  1. Check V$WAITSTAT for contention on "Data Blocks".
  2. Check V$SYSTEM_EVENT for "Buffer Busy Waits".
High numbers indicate that there is some contention.
  1. In this case, check V$SESSION_WAIT to see, for each Buffer Busy Wait, the values for File, Block and ID.
  2. Construct a query as follows to give you the name of the objects and free lists which have the buffer busy waits:
    SELECT SEGMENT_NAME, SEGMENT_TYPE
    FROM DBA_EXTENTS
    WHERE FILE_ID = file
    AND BLOCK BETWEEN block_id AND block_id + blocks
This will return the segment name (segment) and type (type).
  1. To find the free lists, query as follows:
    SELECT SEGMENT_NAME, FREELISTS
    FROM DBA_SEGMENTS
    WHERE SEGMENT_NAME = segment
    AND SEGMENT_TYPE = type

Adding More Free Lists

To reduce contention for the free lists of a table, re-create the table with a larger value for the FREELISTS storage parameter. Increasing the value of this parameter to the number of Oracle processes that concurrently insert data into the table may benefit performance for the INSERT statements.

Re-creating the table may simply involve dropping and creating it again. However, you may want to use one of these means instead:




Go to previous file in sequence Go to next file in sequence
Prev Next
Oracle
Copyright © 1997 Oracle Corporation.
All Rights Reserved.
Go to Product Documentation Library
Library
Go to books for this product
Product
Go to Contents for this book
Contents
Go to Index
Index