5 Oracle Streams Advanced Queuing Performance and Scalability

This chapter discusses performance and scalability issues relating to Oracle Streams Advanced Queuing (AQ).

This chapter contains the following topics:

Persistent Messaging Performance Overview
Persistent Messaging Basic Tuning Tips
Propagation Tuning Tips
Buffered Messaging Tuning

Persistent Messaging Performance Overview

When persistent messages are enqueued, they are stored in database tables. The performance characteristics of queue operations on persistent messages are similar to underlying database operations. The code path of an enqueue operation is comparable to SELECT and INSERT into a multicolumn queue table with three index-organized tables. The code path of a dequeue operation is comparable to a SELECT operation on the multi-column table and a DELETE operation on the dequeue index-organized table. In many scenarios, for example when Oracle Real Application Clusters (Oracle RAC) is not used and there is adequate streams pool memory, the dequeue operation is optimized and is comparable to a SELECT operation on a multi-column table.

Note:

Performance is not affected by the number of queues in a table.

Oracle Streams Advanced Queuing and Oracle Real Application Clusters

Oracle Real Application Clusters (Oracle RAC) can be used to ensure highly available access to queue data. The entry and exit points of a queue, commonly called its tail and head respectively, can be extreme hot spots. Because Oracle RAC may not scale well in the presence of hot spots, limit usual access to a queue from one instance only. If an instance failure occurs, then messages managed by the failed instance can be processed immediately by one of the surviving instances.

You can associate Oracle RAC instance affinities with 8.1-compatible queue tables. If you are using q1 and q2 in different instances, then you can use ALTER_QUEUE_TABLE or CREATE_QUEUE_TABLE on the queue table and set primary_instance to the appropriate instance_id.

See Also:

Oracle Streams Advanced Queuing in a Shared Server Environment

Queue operation scalability is similar to the underlying database operation scalability. If a dequeue operation with wait option is applied, then it does not return until it is successful or the wait period has expired. In a shared server environment, the shared server process is dedicated to the dequeue operation for the duration of the call, including the wait time. The presence of many such processes can cause severe performance and scalability problems and can result in deadlocking the shared server processes. For this reason, Oracle recommends that dequeue requests with wait option be applied using dedicated server processes. This restriction is not enforced.

See Also:

"DEQUEUE_OPTIONS_T Type" in Oracle Database PL/SQL Packages and Types Reference for more information on the wait option

Persistent Messaging Basic Tuning Tips

Oracle Streams Advanced Queuing table layout is similar to a layout with ordinary database tables and indexes.

See Also:

Oracle Database Performance Tuning Guide for tuning recommendations

Memory Requirements

Streams pool size should be at least 20 MB for optimal multi-consumer dequeue performance in a non-RACdatabase. Persistent queuing dequeue operations use the streams pool to optimize performance, especially under concurrency situations. This is, however, not a requirement and the code automatically switches to a less optimal code path.

Using Storage Parameters

Storage parameters can be specified when creating a queue table using the storage_clause parameter. Storage parameters are inherited by other IOTs and tables created with the queue table. The tablespace of the queue table should have sufficient space to accommodate data from all the objects associated with the queue table. With retention specified, the history table as well as the queue table can grow to be quite big.

Oracle recommends you use automatic segment-space management (ASSM). Otherwise initrans, freelists and freelist groups must be tuned for AQ performance under high concurrency.

Increasing PCTFREE will reduce the number of messages in a queue table/IOT block. This will reduce block level contention when there is concurrency.

Storage parameters specified at queue table creation are shared by the queue table, IOTs and indexes. These may be individually altered by an online redefinition using DBMS_REDEFINTION.

I/O Configuration

Because Oracle Streams Advanced Queuing is very I/O intensive, you will usually need to tune I/O to remove any bottlenecks.

See Also:

"I/O Configuration and Design" in Oracle Database Performance Tuning Guide

Running Enqueue and Dequeue Processes Concurrently in a Single Queue Table

Some environments must process messages in a constant flow, requiring that enqueue and dequeue processes run concurrently. If the message delivery system has only one queue table and one queue, then all processes must work on the same segment area at the same time. This precludes reasonable performance levels when delivering a high number of messages.

The best number for concurrent processes depends on available system resources. For example, on a four-CPU system, it is reasonable to start with two concurrent enqueue and two concurrent dequeue processes. If the system cannot deliver the wanted number of messages, then use several subscribers for load balancing rather than increasing the number of processes.

Tune the enqueue and dequeue rates on the queue so that in the common case the queue size remains small and bounded. A queue that grows and shrinks considerably will have indexes and IOTs that are out of balance, which will affect performance.

With multi-consumer queues, using several subscribers for load balancing rather than increasing the number of processes will reduce contention. Multiple queue tables may be used garnering horizontal scalability.

Running Enqueue and Dequeue Processes Serially in a Single Queue Table

When enqueue and dequeue processes are running serially, contention on the same data segment is lower than in the case of concurrent processes. The total time taken to deliver messages by the system, however, is longer than when they run concurrently. Increasing the number of processes helps both enqueuing and dequeuing. The message throughput rate may be higher for enqueuers than for dequeuers when the number of processes is increased, especially with single consumer queues. Dequeue processes on multi-consumer queues scale much better.

Creating Indexes on a Queue Table

Creating an index on a queue table is useful if you:

Dequeue using correlation ID

An index created on the column corr_id of the underlying queue table AQ$_QueueTableName expedites dequeues.
Dequeue using a condition

This is like adding the condition to the where-clause for the SELECT on the underlying queue table. An index on QueueTableName expedites performance on this SELECT statement.

Other Tips

Ensure that statistics are being gathered so that the optimal query plans for retrieving messages are being chosen. By default, queue tables are locked out from automatic gathering of statistics. The recommended use is to gather statistics with a representative queue message load and lock them.
The queue table indexes and IOTs are automatically coalesced by AQ background processes. However, they must continue to be monitored and coalesced if needed. In 10.2, with automatic space segment management (ASSM), an online shrink operation may be used for the same purpose. A well balanced index reduces queue monitor CPU consumption, and ensures optimal enqueue-dequeue performance.
Ensure that there are enough queue monitor processes running to perform the background tasks. The queue monitor must also be running for other crucial background activity. Multiple qmn processes share the load; make sure that there are enough of them. These are auto-tuned, but can be forced to a minimum number, if needed.
It is recommended that dequeue with a wait time is only used with dedicated server processes. In a shared server environment, the shared server process is dedicated to the dequeue operation for the duration of the call, including the wait time. The presence of many such processes can cause severe performance and scalability problems and can result in deadlocking the shared server processes.
Long running dequeue transactions worsen dequeue contention on the queue, and must be avoided.
Batching multiple dequeue operations on multi-consumer queues into a single transaction gives best throughput.
Use NEXT as navigation mode, if not using message priorities. This offers the same semantics but improved performance.
Use the REMOVE_NODATA dequeue mode if dequeuing in BROWSE mode followed by a REMOVE.

Propagation Tuning Tips

Propagation can be considered a special kind of dequeue operation with an additional INSERT at the remote (or local) queue table. Propagation from a single schedule is not parallelized across multiple job queue processes. Rather, they are load balanced. For better scalability, configure the number of propagation schedules according to the available system resources (CPUs).

Propagation rates from transactional and nontransactional (default) queue tables vary to some extent because Oracle Streams Advanced Queuing determines the batching size for nontransactional queues, whereas for transactional queues, batch size is mainly determined by the user application.

Optimized propagation happens in batches. If the remote queue is in a different database, then Oracle Streams Advanced Queuing uses a sequencing algorithm to avoid the need for a two-phase commit. When a message must be sent to multiple queues in the same destination, it is sent multiple times. If the message must be sent to multiple consumers in the same queue at the destination, then it is sent only once.

Buffered Messaging Tuning

Buffered messaging operations in a Oracle Real Application Clusters environment will be fastest on the OWNER_INSTANCE of the queue.

Performance Views

Oracle provides views to monitor system performance and troubleshooting:

These views are integrated with the Automatic Workload Repository (AWR). Users can generate a report based on two AWR snapshots to compute enqueue rate, dequeue rate, and other statistics per queue/subscriber.