Oracle8i Parallel Server Administration, Deployment, and Performance
Release 2 (8.1.6)

Part Number A76970-01





Go to previous page Go to next page

Tuning Oracle Parallel Server and Inter-Instance Performance

The chapter describes Oracle Parallel Server and Cache Fusion-related statistics and provides procedures that explain how to use these statistics to monitor and tune performance. This chapter also briefly explains how Cache Fusion resolves reader/writer conflicts in Oracle Parallel Server. It describes Cache Fusion's benefits in general terms that apply to most types of systems and applications.

The topics in this chapter include:

How Cache Fusion Produces Consistent Read Blocks

When a data block requested by one instance is in the memory cache of a remote instance, Cache Fusion resolves the read/write conflict using remote memory access, not disk access. The requesting instance sends a request for a consistent-read copy of the block to the holding instance. The Block Server Process (BSP) on the holding instance transmits the consistent-read image of the requested block directly from the holding instance's buffer cache to the requesting instance's buffer cache across a high speed interconnect.

As Figure 12-1 illustrates, Cache Fusion enables the buffer cache of one node to send data blocks directly to the buffer cache of another node by way of low latency, high bandwidth interconnects. This reduces the need for expensive disk I/O in parallel cache management.

Cache Fusion also leverages new interconnect technologies for low latency, user-space based, interprocessor communication. This potentially lowers CPU usage by reducing operating system context switches for inter-node messages. Oracle manages write/write contention using conventional disk-based Parallel Cache Management (PCM).

Figure 12-1 Cache Fusion Ships Blocks from Cache to Cache Across the Interconnect


Cache Fusion is always enabled.  

Partitioning Data to Improve Write/Write Conflict Resolution

Cache Fusion only solves part of the block conflict resolution issue by providing improved scalability for applications that experience high levels of reader/writer contention. For applications with high writer/writer concurrency, you also need to accurately partition your application's tables to reduce the potential for writer/writer conflicts.

Improved Scalability with Cache Fusion

Cache Fusion improves application transaction throughput and scalability by providing:

Applications demonstrating high reader/writer conflict rates under disk-based PCM benefit the most from Cache Fusion. Packaged applications also scale more effectively as a result of Cache Fusion. Applications in which OLTP and reporting functions execute on separate nodes may also benefit from Cache Fusion.

Reporting functions that access data from tables modified by OLTP functions receive their versions of data blocks by way of high speed interconnects. This reduces the pinging of data blocks to disk. Performance gains are derived primarily from reduced X-to-S lock conversions and the corresponding reduction in disk I/O for X-to-S lock conversions.

Furthermore, the instance that was changing the cached data block before it received a read request for the same block from another instance would not have to request exclusive access to the block again for subsequent changes. This is because the instance retains the exclusive lock and the buffer after the block is shipped to the reading instance.


All applications achieve some performance gains from Cache Fusion. The degree of improvement depends upon the operating system, the application workload, and the overall system configuration.  

Consistent-Read Block Transfers By Way of High Speed Interconnects

Because Cache Fusion exploits high speed IPCs, Oracle Parallel Server benefits from the performance gains of the latest technologies for low latency communication across cluster interconnects. Further performance gains can be expected with even more efficient protocols, such as Virtual Interface Architecture (VIA) and user-mode IPCs.

Cache Fusion reduces CPU utilization by taking advantage of user-mode IPCs, also known as "memory-mapped IPCs", for both Unix and NT based platforms. If the appropriate hardware support is available, operating system context switches are minimized beyond the basic reductions achieved with Cache Fusion alone. This also eliminates costly data copying and system calls.

User-mode IPCs, if efficiently implemented by hardware support, can reduce CPU use because user processes can communicate without using the operating system kernel. In other words, there is no need to switch from user execution mode to kernel execution mode.

Reduced I/O for Block Pinging and Reduced X to S Lock Conversions

Cache Fusion reduces expensive lock operations and disk I/O for data and undo segment blocks by transmitting consistent-read blocks directly from one instance's buffer cache to another. This can reduce the latency required to resolve reader/writer conflicts by as much as 90 percent.

Cache Fusion resolves reader/writer concurrency with approximately one tenth of the processing effort required by disk-based PCM, using little or no disk I/O. To do this, Cache Fusion only incurs overhead for processing the consistent-read request and for constructing a consistent-read copy of the requested block in memory and transferring it to the requesting instance. On some platforms this can take less than one millisecond.

The Interconnect and Interconnect Protocols for Oracle Parallel Server

The primary components affecting Cache Fusion performance are the interconnect and the protocols that process inter-node communication. The interconnect bandwidth, its latency, and the efficiency of the IPC protocol determine the speed with which Cache Fusion processes consistent-read block requests.

Influencing Interconnect Processing

Once your interconnect is operative, you cannot significantly influence its performance. However, you can influence a protocol's efficiency by adjusting the IPC buffer sizes.

See Also:

For more information, consult your vendor-specific interconnect documentation.  

Supported Interconnect Software

Interconnects that support Oracle Parallel Server and Cache Fusion use one of these protocols:

Oracle Parallel Server can use any interconnect product that supports these protocols. The interconnect product must also be certified for Oracle Parallel Server hardware cluster platforms.

Performance Expectations

Cache Fusion performance levels may vary in terms of latency and throughput from application to application. Performance is further influenced by the type and mixture of transactions your system processes.

The performance gains from Cache Fusion also vary with each workload. The hardware, the interconnect protocol specifications, and the operating system resource usage also affect performance.

If your application did not demonstrate a significant amount of consistent-read contention prior to Cache Fusion, your performance with Cache Fusion will likely remain unchanged. However, if your application experienced numerous lock conversions and heavy disk I/O as a result of consistent-read conflicts, your performance with Cache Fusion should improve significantly.

A comparison of the locking and I/O statistics for Oracle 8.1. and Oracle 8.0 reveals a major reduction of exclusive to shared lock requests and physical write I/O. The following section, "Monitoring Cache Fusion and Inter-Instance Performance", describes how to evaluate Cache Fusion performance in more detail.

See Also:

Chapter 7 for more information on lock types.  

Monitoring Cache Fusion and Inter-Instance Performance

This section describes how to obtain and analyze Oracle Parallel Server and Cache Fusion statistics to monitor inter-instance performance. Topics in this section include:

Cache Fusion and Oracle Parallel Server Performance Monitoring Goals

The main goal of monitoring Cache Fusion and Oracle Parallel Server performance is to determine the cost of global processing and quantify the resources required to maintain coherency and synchronize the instances. Do this by analyzing the performance statistics from several views as described in the following sections. Use these monitoring procedures on an ongoing basis to observe processing trends and to maintain processing at optimal levels.

Many statistics are available to measure the work done by different components of the database kernel, such as the cache layer, the transaction layer or the I/O layer. Moreover, timed statistics allow you to accurately determine the time spent on processing certain requests or the time waited for specific events.

From these statistics sources, work rates, wait time and efficiency ratios can be derived.

See Also:

Chapter 11 for additional suggestions on which statistics to collect and how to use them to compute performance ratios.  

Statistics for Monitoring Oracle Parallel Server and Cache Fusion

Oracle collects Cache Fusion-related performance statistics from the buffer cache and DLM layers. Oracle also collects general Oracle Parallel Server statistics for lock requests and lock waits. You can use several views to examine these statistics.

Maintaining an adequate history of system performance helps you identify trends as these statistics change. This facilitates identifying contributors to increasing response times and reduced throughput. It would also be helpful in spotting workload changes and peak processing requirements.

Procedures in this section use statistics that are grouped according to the following topics:

As described in Chapter 11, consider maintaining statistics from the V$SYSSTAT view and the V$SYSTEM_EVENT view on a per second and per transaction basis to obtain a general profile of the workload. Relevant observations from these views are:

By maintaining these statistics, you can accurately estimate the effect of an increasing cost for a certain type of operation on transaction response times. Major increases in work rates or average delays also contribute to identifying capacity issues.

You must set the parameter TIMED_STATISTICS to TRUE for Oracle to collect statistics for most views discussed in the procedures in this section. The timed statistics from views discussed in this chapter are displayed in units of 1/100ths of a second.


You must also run CATPARR.SQL to create Oracle Parallel Server-related views and tables for storing and viewing statistics as described under the next heading.  

Creating Oracle Parallel Server Data Dictionary Views with CATPARR.SQL

The SQL script CATPARR.SQL creates parallel server data dictionary views. To run this script, you must have SYSDBA privileges.

CATALOG.SQL creates the standard V$ dynamic views, as well as:

You can rerun CATPARR.SQL if you want the EXT_TO_OBJ table to contain the latest information after you add extents. If you drop objects without rerunning CATPARR.SQL, EXT_TO_OBJ may display misleading information.

See Also:

Oracle8i Reference for more information on dynamic views. 

Global Dynamic Performance Views

Tuning and performance information for the Oracle database is stored in a set of dynamic performance tables known as the "V$ fixed views". Each active instance has its own set of fixed views. In Oracle Parallel Server, you can query a global dynamic performance (GV$) view to retrieve the V$ view information from all qualified instances. A global fixed view is available for all of the existing dynamic performance views except for V$ROLLNAME, V$CACHE_LOCK, V$LOCK_ACTIVITY, and V$LOCKS_WITH_COLLISIONS.

The global view contains all the columns from the local view, with an additional column, INST_ID (datatype INTEGER). This column displays the instance number from which the associated V$ information was obtained. You can use the INST_ID column as a filter to retrieve V$ information from a subset of available instances. For example, the query:


Retrieves information from the V$ views on instances 2 and 5.

Each global view contains a GLOBAL hint that creates a parallel query to retrieve the contents of the local view on each instance.

If you have reached the limit of PARALLEL_MAX_SERVERS on an instance and you attempt to query a GV$ view, one additional parallel server process will be spawned for this purpose. The extra process is not available for parallel operations other than GV$ queries.


If PARALLEL_MAX_SERVERS is set to zero for an instance, additional parallel server processes do not spawn to accommodate a GV$ query.  

If you have reached the limit of PARALLEL_MAX_SERVERS on an instance and issue multiple GV$ queries, all but the first query will fail. In most parallel queries, if a server process could not be allocated this would result in either an error or a sequential execution of the query by the query coordinator.

See Also:

  • Chapter 2

  • Oracle8i Reference for restrictions on GV$ views and complete descriptions of all related parameters and V$ dynamic performance views


Analyzing Global Cache and Cache Fusion Statistics

Oracle collects global cache statistics at the buffer cache layer within an instance. These statistics include counts and timings of requests for global resources.

Requests for global locks on data blocks originate in the buffer cache of the requesting instance. Before a request enters the DLM, Oracle allocates data structures in the System Global Area to track the state of the request. These structures are called "lock elements".

To monitor global cache statistics, query the V$SYSSTAT view and analyze its output as described in the following procedures.

Procedures for Monitoring Global Cache Statistics

Complete the following steps to analyze global cache statistics.

  1. Use this syntax to query V$SYSSTAT:

       SELECT * FROM V$SYSSTAT WHERE NAME LIKE 'global cache%';

Oracle responds with output similar to:

NAME                                                           VALUE
global cache cr blocks received                                 7372
global cache cr block receive time                              2293
global cache cr blocks served                                   7882
global cache cr block serve time                                  60
global cache cr block send time                                  239
global cache cr block log flushes                                119
global cache cr block log flush time                             140
global cache cr timeouts                                           2
global cache cr requests blocked                                   0

  • The average latency of a consistent block request, in other words, its round-trip time, can be calculated as:

    The result, which should typically be about 15 milliseconds depending on your system configuration and volume, is the average latency of a consistent-read request round trip from requesting instance, to holding instance, and back to the requesting instance. If your CPU has limited idle time and your system typically processes long-running queries, the latency may be higher. However, it is possible to have an average latency of less than one millisecond.

    Consistent-read server request latency can also be influenced by a high value for the DB_MULTI_BLOCK_READ_COUNT parameter. This is because a requesting process may issue more than one request for a block depending on the setting of this parameter. Correspondingly, the requesting process may wait longer.

  • For a high number of incoming requests, especially in report-intensive applications, or if there are multiple nodes from which requests can arrive at a BSP, the round-trip time can increase because BSP's service time increases. To determine whether the length of the delay is caused by BSP, compute the average service time per request.

    Track the average BSP service time per request and the total round-trip time per request as presented in this step.

    To determine which part of the service time correlates most with the total service time, derive the time waited for a log flush and the time spent in sending the completed request using the following two equations:

    By calculating these averages, you can account for almost all the processing steps of a consistent read block request. The remaining difference between the total round-trip time and the BSP service time per request falls onto processing time in the LMD processing and network IPC time.

  • Calculate the average convert times and average get times using one of these formulas:

    High convert times may indicate excessive global concurrency. A large number of global cache gets, global cache converts, and a rapid increase in average convert or get times indicates that there is excessive contention. Another cause may be that latencies for lock operations are high due to overall system workload or system problems. A reasonable value for a cache get is 20 to 30 milliseconds while converts should take 10 to 20 milliseconds on average.

    Oracle increments global cache gets when a new lock on a resource is opened. A convert is counted when there is already an open lock and Oracle converts it to another mode.

    The elapsed time for a get thus includes the allocation and initialization of new locks. If the average cache get or average convert times are excessive, your system may be experiencing timeouts.

    If the global cache convert times or global cache get times are high, refer to statistics in the V$SYSTEM_EVENTS view to identify events with a high value for TIME_WAITED statistics.

  • Analyze lock convert timeouts by examining the value for "global cache convert timeouts". If your V$SYSSTAT output shows a value other than zero for this statistic, check your system for congestion or high contention. In general, convert timeouts should not occur; their existence indicates serious performance problems.

  • Analyze the global cache consistent-read timeouts by examining the value for this statistic in your V$SYSSTAT output. Oracle increments this statistic after the system waits too long for the completion of a consistent-read request. If this statistic shows a value other than zero, too much time has elapsed after the initiation of a consistent-read request and a timeout has occurred. If this happens, you will also usually find that the average time for consistent-read request completions has increased. If you have timeouts and the latency is high, your system may have an excessive workload or there may be excessive contention for data blocks. It might also be used as an indicator of IPC or network problems.

    Other Useful Cache Fusion Statistics

    The following describes additional Cache Fusion statistics that you may find useful in diagnosing global cache and Cache Fusion operations. Use these statistics to monitor all the major operations of a consistent block request.

    global cache cr blocks received

    When a process requests a consistent read for a data block that it cannot satisfy from its local cache, it sends a request to another instance. Once the request is complete, in other words, the buffer has been received, Oracle decrements the request count.

    global cache cr block receive time

    This statistic records the total time it took for consistent read requests to complete, in other words, the accumulated round-trip time for all requests for consistent read blocks.

    global cache cr timeouts

    This statistic identifies a request for a consistent read block that has a long delay and that has timed out. This could be due to system performance problems, a slow interconnect network or dropped network packets. The value of this statistic should always be 0.

    global cache cr blocks served

    This is the number of requests for a consistent read block served by BSP. Oracle increments this statistic when the block is sent.

    global cache cr block serve time

    This statistic represents the accumulated time it took BSP to fill all incoming requests. For each request, the start time is recorded immediately after BSP takes a request off the request queue. The interval is computed after the blocks is sent.

    global cache cr block send time

    This is the time required by BSP to initiate a send of a consistent read block. For each request, timing starts when the block is sent and stops when the send has completed. It is a part of the serve time. Note that this statistic only measures the time it takes to initiate the send; it does not measure the time elapsed before the block arrives at the requestor.

    global cache cr block log flushes

    For changes to buffers containing a version of a data block that the block sever process has produced, a log flush must be initiated. BSP handles the wait asynchronously by managing a completion queue. Once LGWR has completed flushing the changes to a buffer that is on the log flush queue, BSP can send it. Therefore it periodically checks the queue. Oracle increments this statistic when a log flush is queued.

    global cache cr block log flush time

    This is the time waited for a log flush. It is part of the serve time.

    Analyzing Global Lock Statistics

    Global lock statistics provide counts and timings for both PCM and non-PCM lock activity. Oracle collects global lock statistics from the DLM API layer. All Oracle clients to the DLM, of which the buffer cache is only one, make their requests to the DLM through this layer. Thus, global lock statistics include lock requests originating from all layers of the kernel, while global cache statistics relate to buffer cache Oracle Parallel Server activity.

    Use the procedures in this section to monitor data from the V$SYSSTAT view to derive averages, latencies, and counts. This establishes a rough indicator of the Oracle Parallel Server workload generated by an instance.

    Procedures for Analyzing Global Lock Statistics

    Use the following procedures to view and analyze statistics from the V$SYSSTAT view for global lock processing.

    1. Use this syntax to query V$SYSSTAT:

         SELECT * FROM V$SYSSTAT WHERE NAME LIKE 'global lock%';

      Oracle responds with output similar to:

      NAME                                                             	VALUE
      ---------------------------------------------------------------- 	----------
      global lock sync gets                                                   	703
      global lock async gets                                                	12748
      global lock get time                                                   	1071
      global lock sync converts                                               	303
      global lock async converts                                               	41
      global lock convert time                                                 	93
      global lock releases                                                    	573

      Use your V$SYSSTAT output to perform the calculations and analyses described in the remaining procedures in this group of procedures.

    2. Calculate the average global lock gets using this formula:

      If the result is more than 20 or 30 milliseconds, query the TIME_WAITED column in the V$SYSTEM_EVENTS view using the DESCEND keyword to identify which lock events are waited for most frequently using this query:


      Oracle increments global lock gets when a new lock on a resource is opened. A convert is counted when there is already an open lock and Oracle converts it to another mode.

      The elapsed time for a get thus includes the allocation and initialization of new locks. If the average lock get or average lock convert times are excessive, your system may be experiencing timeouts.

      If the global lock convert times or global lock get times are high, refer to statistics in the V$SYSTEM_EVENTS view to identify events with a high value for TIME_WAITED statistics.

    3. Calculate the average global lock convert time using this formula:

      If the result is more than 20 milliseconds, query the TIME_WAITED column in the V$SYSTEM_EVENTS view using the DESCEND keyword to identify the event causing the delay.

    4. As mentioned, global lock statistics apply to buffer cache lock operations and lock operations for resources other than data blocks. To determine which type of resources may be a performance problem, divide the global lock get and global lock convert statistics into two categories:

      • Synchronous Operations

        Synchronous lock gets includes, for example, global lock sync gets. These are usually performed for lock requests for resources other than cached data blocks. To determine the proportion of the time required for synchronous lock gets, divide global lock get time or global lock convert time by the corresponding number of synchronous operations.

      • Asynchronous Operations

        Asynchronous lock operations include, for example, global lock async gets. These are typically lock operations for global cache locks. You can derive the proportion of the total time using the same calculation as for synchronous operations. In this way, the proportion of work and the cost of global cache lock requests and other lock requests can be determined.

      Normally, if the proportion of global lock requests for resources other than global cache lock requests dominates the cost for all lock operations, the V$SYSTEM_EVENTS view shows high wait times for row cache locks, enqueues or library cache pins.

    5. Analyze the V$LIBRARYCACHE and V$ROWCACHE views to observe DLM activity on non-PCM resources. These views have DLM-specific columns that identify DLM resource use. Analyze these views for DLM activity if you have frequent and extended waits for library cache pins, enqueues, or DFS lock handles.

    Analyzing DLM Resource, Lock, Message, and Memory Resource Statistics

    Oracle collects DLM resource, lock, and message statistics at the DLM level. Use these statistics to monitor DLM latency and workloads. These statistics appear in the V$DLM_CONVERT_LOCAL and V$DLM_CONVERT_REMOTE views.

    These views record average convert times, count information, and timed statistics for each type of lock request. The V$DLM_CONVERT_LOCAL view shows statistics for local lock operations. The V$DLM_CONVERT_REMOTE view shows values for remote conversions. The average convert times in these views are in 100ths of a second.


    Count information in these views is cumulative for the life of an instance.  

    How DLM Workloads Affect Performance

    The DLM workload is an important aspect of Oracle Parallel Server and Cache Fusion performance because each consistent-read request results in a lock request. High DLM workloads as a result of heavy request rates can adversely affect performance.

    The DLM performs local lock operations entirely within the local node, or in other words, without sending messages. Remote lock operations require sending messages to and waiting for responses from other nodes. Most down-converts, however, are local operations for the DLM.

    The following procedures for analyzing DLM resource, locks, and message statistics appear in two groups. The first group of procedures explains how to monitor DLM resources and locks. The second group explains how to monitor message statistics.

    Procedures for Analyzing DLM Resource and Lock Statistics

    Use the following procedures to obtain and analyze statistics from the V$DLM_CONVERT_LOCAL and V$DLM_CONVERT_REMOTE views for DLM resource processing.

    You must enable event 29700 to populate the V$DLM_CONVERT_LOCAL and V$DLM_CONVERT_REMOTE views. Do this by entering this syntax:

    1. Use this syntax to query the V$DLM_CONVERT_LOCAL view:


      Oracle responds with output similar to:

      -------------------------------------- -------------------- -------------
      NULL -> SS                                                0             0
      NULL -> SX                                                0             0
      NULL -> S                                                 1           146
      NULL -> SSX                                               0             0
      NULL -> X                                                 1            92
      SS   -> SX                                                0             0
      SS   -> S                                                 0             0
      SS   -> SSX                                               0             0
      SS   -> X                                                 0             0
      SX   -> S                                                 0             0
      SX   -> SSX                                               0             0
      SX   -> X                                                 0             0
      S    -> SX                                                0             0
      S    -> SSX                                               0             0
      S    -> X                                                 3            46
      SSX  -> X                                                 0             0
      16 rows selected.
    2. Use this syntax to query the V$DLM_CONVERT_REMOTE view:


      Oracle responds with output identical in format to the output for the V$DLM_CONVERT_LOCAL view.

      Use your output from the V$DLM_CONVERT_LOCAL and V$DLM_CONVERT_REMOTE views to perform the calculation described in the following procedure.

    3. Calculate the ratio of local-to-remote lock operations using this query:

            GROUP BY r.CONVERT_TYPE;
    4. It is useful to maintain a history of workloads and latencies for lock converts. Changes in lock requests per transaction without increases in average latencies usually result from changing application workload patterns.

      Deterioration of both request rates and latencies usually indicates an increased rate of lock conflicts or an increased workload due to message latencies, system problems, or timeouts. If the LMD process shows high CPU consumption, or consumption is greater than 20 percent of the CPU while overall system resource consumption is normal, consider binding the LMD process to a specific processor if the system has more than one CPU.

      If latencies increase, also examine CPU data and other operating system statistics that you can obtain using utilities such as "sar," "vmstat" and "netstat" on UNIX or Perfmon on Windows NT.

    5. Derive an estimate of CPU busy time for LMD from the V$SYSTEM_EVENT view.

      For a quick estimate of the CPU time spent by LMD, you can transform the wait time event for LMD presented in the V$SYSTEM_EVENT view. To do this, look for the event name "lkmgr wait for remote messages" that represents the time that the LMD process is idle. The TIME_WAITED column contains the accumulated idle time for LMD in units of hundredths of a second.

      To derive the busy time, divide the value for TIME_WAITED by the length of the measurement interval after normalizing it to seconds. In other words, a value of 17222 centiseconds is 172.22 seconds. The result is the idle time of the LMD process, or the percentage of idle time. Subtract that value from 1 and the result is the busy time for the LMD process. This is a fairly accurate estimate when compared with operating system utilities that provide information about CPU utilization per process.


      You should have beginning and ending snapshots to make an accurate calculation.  

    DLM Message Statistics

    The DLM sends messages either directly or by using flow control. For both methods, the DLM attaches markers known as "tickets" to each message. The allotment of tickets for each DLM is limited. However, the DLM can re-use tickets indefinitely.

    DLMs send messages directly until no more tickets are available. When an DLM runs out of tickets, messages must wait in a flow control queue until outstanding messages have been acknowledged and more tickets are available. Flow-controlled messaging is managed by the LMD process.

    The rationing of tickets prevents one node from sending an excessive amount of messages to another node during periods of heavy inter-instance communication. This also prevents one node with heavy remote consistent-read block requirements from assuming control of messaging resources throughout a cluster at the expense of other, less-busy nodes.

    The V$DLM_MISC view contains the following statistics about message activity:

    Procedures for Analyzing DLM Message Statistics

    Use the following procedures to obtain and analyze message statistics in the V$DLM_MISC view.

    1. Use this syntax to query the V$DLM_MISC view:


      Oracle responds with output similar to:

      STATISTIC# NAME                                VALUE
      ---------- ----------------------------------- -----------
               0 dlm messages sent directly           29520
               1 dlm messages flow controlled          1851
               2 dlm messages received                29668
               3 dlm total incoming msg queue length    297
      4 rows selected.


      Oracle support may request information from your V$DLM_MISC output for debugging purposes.  

      Use your output from the V$DLM_MISC view to perform the following procedure.

    2. Calculate the average receive queue length between two snapshots using this equation:

      Oracle increments the value for "total incoming message queue length" whenever a new request enters the LMD process' message queue. When messages leave the LMD queue to begin processing, Oracle increments the value for "messages received".

    The size of the queue may increase if a large number of requests simultaneously arrives at the LMD. This can occur when the volume of locking activity is high or when the LMD processes a large quantity of consistent-read requests. Typically, the average receive queue length is less than 10.

    Analyzing Oracle Parallel Server I/O Statistics

    In addition to the global cache and global lock statistics that were previously discussed, you can also use statistics in the V$SYSSTAT view to measure the I/O workload related to global cache synchronization. There are three important statistics in the V$SYSSTAT view for this purpose:

    DBWR forced writes occur when Oracle resolves inter-instance data block contention by writing the requested block to disk before the requesting node can use it.

    Cache Fusion minimizes the disk I/O for consistent-reads. This can lead to a substantial reduction in physical writes and reads performed by each instance. Before Cache Fusion, a consistent-read requesting data from a remote instance could result in up to three write I/Os on the remote instance and three corresponding read I/Os for the requesting instance: one for the data block, one for the rollback segment header, and one for a rollback segment block.

    Analyzing Oracle Parallel Server I/O Statistics in the V$SYSSTAT View

    You can obtain the following statistics to quantify the write I/Os required for global cache synchronization.

    1. Use this syntax to query the V$SYSSTAT view:

         WHERE NAME IN ('DBWR forced writes', 
         'remote instance undo block writes',
         'remote instance undo header writes',
         'physical writes');

      Oracle responds with output similar to:

      NAME                                                           VALUE
      --------------------------------------------------------- ----------
      physical writes                                                41802
      DBWR cross-instance writes                                      5403
      remote instance undo block writes                                  0
      remote instance undo header writes                                 2
      4 rows selected.

      Where the statistic "physical writes" refers to all physical writes that occurred from a particular instance performed by DBWR, the value for "DBWR cross-instance writes" accounts for all writes caused by writing a dirty buffer containing a data block that is requested for modification by another instance. As cross-instance writes are also handled by DBWR, it follows that "DBWR cross-instance writes" is a subset of all "physical writes".

      The other notable statistics, "remote instance undo block writes" and "remote instance undo header writes", refer to the number of times that Oracle writes a rollback segment block to disk because another instance intends to build a consistent read version of a data block but the information required to roll back the block are not in the instance's cache. Both are a subset of "DBWR cross-instance writes". Their significance for performance is less critical in Oracle8i because Cache Fusion reduces the need to "export" and "import" rollback information. In most cases, instances send the complete version of a data block by way of the interconnect to the requesting instance.

      Note that every lock conversion from Exclusive (X) to Null (N) or from Exclusive (X) to Shared (S) is associated with a write to disk when the buffer under the lock is dirty. However, in Oracle8i, the number of X to S lock conversions is reduced because Cache Fusion does not require them. In most cases, the holding instance retains the X lock.

    2. Calculate the ratio of Oracle Parallel Server-related I/O to overall physical I/O using this equation:

      You should see a noticeable decrease in this ratio between this calculation and pre-Cache Fusion statistics.

    3. Use this equation to calculate how many writes to rollback segments occur when a remote instance needs to read from rollback segments that are in use by a local instance:

      The ratio shows how much disk I/O is related to writes to rollback segments. With Cache Fusion, this ratio should be very low.

    4. To estimate the number or percentage of reads due to global cache synchronization, use the number of lock requests for conversions from NULL(N) to Shared mode (S) counted in V$LOCK_ACTIVITY and the "physical reads" statistics from V$SYSSTAT.

      The following formula computes the percentage of reads that are only for local work:

      Where "lock buffers for read" represents the N to S lock conversions.

      These so-called "forced reads" occur when a cached data block that was previously modified by the local instance had to be written to disk due to a "ping" from another instance and the block is then re-acquired by the local instance for a read.

      See Also:

      Chapter 11 for more observations regarding estimations of local and global work rates and percentages in Oracle Parallel Server clusters. 

    Analyzing Lock Conversions by Type

    This section describes how to analyze output from three views to quantify lock conversions by type. The tasks and the views discussed in this section are:

    Using the V$LOCK_ACTIVITY View to Analyze Lock Conversions

    The V$LOCK_ACTIVITY view summarizes how many lock up- and down-converts have occurred during an instance's lifetime. X-to-N down-converts denote the number of times a lock was down-converted because another instance wanted to modify a resource.

    The other major type of down-convert is X-to-S. This type of down-convert occurs when an instance reads a resource that was last modified by a local instance. Both types of lock conversions involve I/O. However, Cache Fusion should reduce X-to-S down-converts because they are not needed for buffer locks.

    Using the V$CLASS_PING View to Identify Pinging by Block Class

    The V$CLASS_PING view summarizes lock conversion activity by showing whether disk I/O is occurring on the following classes of blocks:

    All X_2_NULL_FORCED_WRITE and X_2_S_FORCED_WRITE conversions involve write I/O. In other words, values in the columns for each block class provide an indicator of the cause of the disk I/O.

    Using the V$PING View to Identify Hot Objects

    The V$PING view helps identify "hot" blocks and "hot" objects. The sum of each column, FORCED_READS and FORCED_WRITES, indicates the actual pinging activity on a particular block or object.

    All three views provide different levels of detail. If you suspect that pinging or Oracle Parallel Server itself is the cause of a performance problem, monitor the V$LOCK_ACTIVITY view to generate an overall Oracle Parallel Server workload profile. Use information from the V$LOCK_ACTIVITY view to record the rate at which lock conversions occur.

    For more details, use the V$CLASS_PING view to identify the type of block on which lock conversions and pinging are occurring. Once you have identified the class, use the V$PING view to obtain details about a particular table or index and the file and block numbers on which there is significant lock conversion activity.

    If your response time or throughput requirements are no longer being met, you would normally examine the V$LOCK_ACTIVITY, V$CLASS_PING, V$CACHE, V$PING or V$FILE_PING views. In addition, you might also examine:

    In summary, a change in the application profile and the work rates typically warrant a detailed analysis using the above-mentioned views. Apart from diagnosing performance problems of existing applications, these views are also useful when developing applications or when deciding on a partitioning strategy.

    Analyzing Latch, Oracle Parallel Server, and DLM Statistics

    Latches are low-level locking mechanisms that protect System Global Area data structures. Excessive contention for latches degrades performance.

    Use the V$DLM_LATCH and V$LATCH_MISSES views to monitor latch contention within the DLM. These views show information about a particular latch, its statistics, and the location in the code from where the latch is acquired.

    For normal operations, the value latch statistics is limited. In some cases, multiple latches can help increase the performance for certain layers by a small amount. High latch contention is often the result of either:

    The following procedures are suggestions as to which information is available. Oracle does not recommend that you monitor these statistics on a regular basis and derive conclusions solely on the basis of latching issues. However, gathering this information might be useful to Oracle Support or Oracle's Development Staff. Also, latch tuning can be the object of advanced tuning activities, but in the majority of cases latch tuning will not be your actual performance problem.

    On the other hand, record information from these procedures if the TIME_WAITED value for the "latch free" wait event is very high and ranks among the events that accrue the largest times as indicated by the V$SYSTEM_EVENT view.

    Procedures for Analyzing Latch, Parallel Server, and DLM Statistics

    Use the following procedures to analyze latch, Oracle Parallel Server, and DLM-related statistics.

    1. Query the V$LATCH view using this syntax:

         SELECT * FROM V$LATCH; 

      Oracle responds with output similar to the following where the columns from left to right show the statistic name, gets, misses, and sleeps:

      cache buffer handle 


      cache buffers chained 


      cache buffers lru 


      channel handle pool  


      channel operations  


      dlm ast latch  


      dlm cr bast queue  


      dlm deadlock list  


      dlm domain lock la 


      dlm domain lock ta 


      dlm group lock lat 


      dlm group lock tab 


      dlm lock table fre 


      dlm process hash l 


      dlm process table 


      dlm recovery domai  


      dlm resource hash 


      dlm resource scan  


      dlm resource table  



      dlm shared communication 




      dlm timeout list  



      dml lock allocation  


      22 rows selected. 





      The content of the five columns in this output example from left to right are: gets, hits, misses, sleeps, and the sleeps-to-misses ratio. 

    2. If the output from the previous procedure reveals a high ratio of sleeps-to-misses, attempt to determine where the sleeps occur. To do this, execute this query on the V$LATCH_MISSES view:


      Oracle responds with output similar to:







      dlm resource hash list  

      kjrrmas1: lookup master n 


      cache buffers chains  

      kcbgtcr: kslbegin 


      library cache  

      kglhdgn: child: 


      shared pool  

      kghfnd: min scan 


      cache buffers chains  

      kcbrls: kslbegin 


      shared pool  



      dlm ast latch  

      kjucll: delete lock from 


      7 rows selected. 



  • Go to previous page Go to next page
    Copyright © 1996-2000, Oracle Corporation.

    All Rights Reserved.