19
Tuning the System to Optimize Performance

Last of the gods, Contention ends her tale.

Aeschylus, Antigone

This chapter provides an overview of tuning issues. It covers the following topics:

General Guidelines

This section covers the following topics:

Overview

With experience, you can anticipate most parallel server application problems prior to rollout and testing of the application. This can be done using the methods described in this document. In addition, a number of tunable parameters can enhance system performance. Tuning parameters can have a major influence and improve system performance, but they cannot overcome problems caused by a poor analysis of potential LM lock contention.

In tuning OPS applications the techniques used for single-instance applications are still valid. It is still important, however, to effectively tune the buffer cache, shared pool and all the disk subsystems. OPS introduces some additional parameters and statistics that you must collect and understand.

Note: In Oracle8, locks are mastered and remastered dynamically, so the instances do not need to be started in any particular order.

When collecting statistics to monitor the performance of the OPS, the following general guidelines will make debugging and monitoring of the system simpler and more accurate.

Keep Statistics for All Instances

It is important to monitor all instances in the same way, but keep separate statistics for each instance. This is particularly true if the partitioning strategy results in a highly asymmetrical solution. By monitoring the instances you can determine the highest loaded node and test how well the system partitioning has been performed.

Statistics to Keep

The best statistics to monitor within the database are those kept within the SGA: the "V$" and "X$" tables, for example. It is best to snapshot these views over a period of time. In addition, good operating system statistics should be kept to assist in the monitoring and debugging process. The most important of these are CPU usage, disk I/O, virtual memory usage, network usage and lock manager statistics.

Change One Parameter at a Time

In benchmarking or capacity planning exercises it is important to manage effectively the changes to the setup of the system. By documenting each change and effectively quantifying its effect on the system, you can profile and understand the mechanics of the system and application. This is particularly important when debugging a system or determining whether more hardware resources are required. You must adopt a systematic approach for the measurement and tuning phases of a performance project. Although this approach may seem time consuming, it will save time and system resources in the long term.

Contention

This section covers the following topics:

Detecting Lock Conversions

To detect whether a large number of lock conversions is taking place, you can examine the "V$" tables which enable you to see that locks are being upgraded and downgraded. The best views for initially determining whether a lock contention problem exists are V$LOCK_ACTIVITY and V$SYSSTAT.

To determine the number of lock converts over a period of time, query the V$LOCK_ACTIVITY table. From this you should be able to determine whether you have reached the maximum lock convert rate for the LM. If this is the case, you must repartition the application to remove the bottleneck. In this situation, adding more hardware resources such as CPUs, disk drives, and memory is unlikely to improve system performance significantly.

Note: Maximum lock convert rate depends on the implementation of the IPC mechanism on your platform.

To determine whether lock converts are being performed too often, calculate how often the transaction requires a lock convert operation when a data block is accessed for either a read or a modification. Query the V$SYSSTAT table.

In this way you can calculate a lock hit ratio which may be compared to the cache hit ratio. The value calculated is the number of times there occur data block accesses that do not require lock converts, compared to the total number of data block accesses. The lock hit ratio is computed as:

A SQL statement that computes this ratio is as follows:

SELECT (b1.value - b2.value) / b1.value ops_ratio
  FROM V$SYSSTAT b1, V$SYSSTAT b2
 WHERE b1.name = `consistent gets'
   AND b2.name = `global lock converts (async)';

If this ratio drops below 95%, optimal scaling of performance may not be achieved as additional nodes are added.

Another indication of too many PCM lock conversions is the ping/write ratio, which is determined as follows:

Pinpointing Lock Contention within an Application

If an application shows performance problems and you determine that excessive lock convert operations are the major problem, you must identify the transactions and SQL statements which are causing the problem. When excessive lock contention occurs it is likely to be caused by one of three problem areas when setting up the OPS environment. These key areas are as follows:

contention for a common resource
lack of locks
constraints

Excessive Lock Convert Rates: Contention for a Common Resource

This section describes excessive lock conversion rates associated with contention for a common resource.

In some cases within OPS applications the system may not be performing as anticipated. This may be because one small area of the database setup or application design overlooked some database blocks that must be accessed in exclusive mode by all instances, for the entire time that the application runs. This forces the whole system to effectively single thread with respect to this resource.

This problem can also occur in single instance cases where all users require exclusive access to a single resource. In an inventory system, for example, all users may wish to modify a common stock level.

In OPS applications the most common points for contention are associated with contention for a common set of database blocks. To determine whether this is happening you can query an additional set of V$ tables (V$BH, V$CACHE and V$PING). All these tables yield basically the same data, but V$CACHE and V$PING have been created as views joining additional data dictionary tables to make them easier to use. These tables and views examine the status of the current data blocks within an instance's SGA. They enable you to construct queries to see how many times a block has been pinged between nodes, and how many revisions of the same data block exist within the SGA at the same time. You can use both of these features to determine whether excessive single threading upon a single database block is occurring.

Note: GV$BH, GV$CACHE, and GV$PING views are also available, enabling you to query across all instances.

The most common areas of high block contention tend to be:

Free list contention by INSERT statements requiring more free space to insert into a table. Often you can recognize this by querying V$PING and noticing that a single block has multiple copies in the SGA. If this is the second block in the file, free list contention is probably occurring. This problem may actually be solved by use of free list groups and multiple free lists. Free list contention may also occur on single-instance systems, especially SMP machines with a large number of CPUs. This problem can be determined by querying the V$WAITSTAT table.
Segment header contention for transactions sharing the same space header management block. This is likely to occur during parallel index creates, when the parallel query slaves allocate sorting space from the temporary tablespace. Each segment that undergoes simultaneous space management in a parallel server requires approximately 9 distributed locks dedicated to coordinating space management activities.
Index contention by INSERT and DELETE statements that operate upon an indexed table. By querying V$PING you can see that a number of data blocks within the first extent of the index have both multiple copies within the SGA and experience a high number of block pings. This problem cannot be solved by tuning; to solve this problem it is important to localize all access (read or write) to this index to a single instance.
This will involve routing transactions that alter this table to a single instance and running the system asymmetrically. If this cannot be done further consideration should be given to partitioning the table and using a data dependent routing strategy.

Excessive Lock Convert Rates through Lack of Locks

In tables that have random access for SELECT, UPDATE and DELETE statements, each node will need to perform a number of PCM lock upgrades and downgrades. If these lock convert operations require a disk I/O they will be particularly expensive and performance will be affected.

If, however, many of the lock converts can be satisfied by just converting the lock without a disk I/O, a performance improvement can be made. This is often referred to as an I/O less ping. The reason that the lock convert can be achieved without an I/O is that the database is able to age the data blocks out of the SGA via the database writer, as it would with a single instance. This is only likely when the table is very large in comparison to the size of the SGA. Small tables are likely to require a disk I/O, since they are unlikely to be aged out of the SGA.

With small tables where random access occurs you can still achieve performance improvements by reducing the number of rows stored in a data block. You can do this by increasing the table PCTFREE value and by reducing the number of data blocks managed by a PCM lock. The process of adjusting the number of rows managed per PCM lock can be performed until lock converts are minimized or the hardware configuration runs out of PCM locks.

The number of PCM locks managed by the LM is not an infinite resource. Each lock requires memory on each OPS node, and this resource may be quickly be exhausted. Within an OPS environment the addition of more PCM locks lengthens the time taken to restart or recover an OPS instance. In environments where high availability is required, the time taken to restart or recover an instance may determine the maximum number of PCM locks that you can practically allocate.

Excessive Lock Convert Rates Due to Constraints

In certain situations excessive lock conversion rates cannot be reduced due to certain constraints. In large tables, clusters, or indexes many gigabytes in size, it becomes impossible to allocate enough PCM locks to prevent high lock convert rates even if these are all false pings. This is mainly due to the physical limitations of allocating enough locks. In this situation a single PCM lock may effectively manage more than a thousand data blocks.

Where random access is taking place, lock converts are performed even if there is not contention for the same data block. In this situation tuning the number of locks is unlikely to enhance performance, since the number of locks required is far in excess of what can actually be created by the lock manager.

In such cases you must either restrict access to these database objects or else develop a partitioned solution.

Tuning for High Availability

Failure of an Oracle instance on one Parallel Server node may be caused by problems that may or may not require rebooting the failed node. If the node fails and requires reboot or restart, the recovery process on remaining nodes will take longer. Assuming a full recovery is required the recovery process will be performed in three discreet phases:

Detection of Error

The first phase of recovery is to detect that either a node or an OPS instance has failed. Complete node failure or failure of an Oracle instance is detected through the operating system node management facility.

Recovery and Re-mastering of IDLM Locks

If a complete node failure has occurred, the remaining nodes will be required to re-master the locks held by the failed node. On non-failed instances at this point all database processing will halt until recovery has completed. To speed this process for the Integrated DLM it is important to have the minimum number of PCM locks. This will eventually be reflected in a trade-off between database performance and availability requirements.

Recovery of Failed Instance

Once the IDLM has recovered all lock information, one of the remaining nodes can get an exclusive lock on the failed instance's IDLM instance lock. This node enables the failed instance to recover by providing roll forward/roll backward recovery of the failed instance's redo logs. This process is performed by the SMON background process. The time needed to perform this process depends upon the quantity of redo logs to be recovered, a function of how often the system was checkpointed at runtime. Again, this is a trade-off between system runtime performance, which favors a minimum of checkpoints, and system availability requirements.

19Tuning the System to Optimize Performance