Oracle9i Data Warehousing Guide
Release 1 (9.0.1)

Part Number A90237-01
Go To Documentation Library
Go To Product List
Book List
Go To Table Of Contents
Go To Index

Master Index


Go to previous page Go to next page

Using Parallel Execution

This chapter covers tuning in a parallel execution environment and discusses:

Introduction to Parallel Execution Tuning

Parallel execution dramatically reduces response time for data-intensive operations on large databases typically associated with decision support systems (DSS) and data warehouses. You can also implement parallel execution on certain types of online transaction processing (OLTP) and hybrid systems. Parallel execution improves processing for:

You can also use parallel execution to access object types within an Oracle database. For example, you can use parallel execution to access large objects (LOBs).

Parallel execution benefits systems with all of the following characteristics:

If your system lacks any of these characteristics, parallel execution might not significantly improve performance. In fact, parallel execution may reduce system performance on overutilized systems or systems with small I/O bandwidth.

When to Implement Parallel Execution

Parallel execution provides the greatest performance improvements in DSS and data warehousing environments. OLTP systems also benefit from parallel execution, but usually only during batch processing.

During the day, most OLTP systems should probably not use parallel execution. During off-hours, however, parallel execution can effectively process high-volume batch operations. For example, a bank might use parallelized batch programs to perform millions of updates to apply interest to accounts.

Operations That Can Be Parallelized

The Oracle server can use parallel execution for any of the following:

  1. Access Methods

    Table Scans, Index Full Scans, and Partitioned Index Range Scans

  2. Join Methods

    Nested Loop, Sort Merge, Hash, and Star Transformation

  3. DDL Statements


  4. DML Statements

    Inserts as Select, Updates, Deletes, and Merges

  5. Miscellaneous SQL Operations

    GROUP BY, NOT IN, SELECT DISTINCT, UNION, UNION ALL, CUBE, and ROLLUP, as well as Aggregate and Table Functions

The Parallel Execution Server Pool

When an instance starts up, Oracle creates a pool of parallel execution servers which are available for any parallel operation. The initialization parameter PARALLEL_MIN_SERVERS specifies the number of parallel execution servers that Oracle creates at instance startup.

When executing a parallel operation, the parallel execution coordinator obtains parallel execution servers from the pool and assigns them to the operation. If necessary, Oracle can create additional parallel execution servers for the operation. These parallel execution servers remain with the operation throughout job execution, then become available for other operations. After the statement has been processed completely, the parallel execution servers return to the pool.


The parallel execution coordinator and the parallel execution servers can only service one statement at a time. A parallel execution coordinator cannot coordinate, for example, a parallel query and a parallel DML statement at the same time.  

When a user issues a SQL statement, the optimizer decides whether to execute the operations in parallel and determines the degree of parallelism (DOP) for each operation. You can specify the number of parallel execution servers required for an operation in various ways.

If the optimizer targets the statement for parallel processing, the following sequence of events takes place:

  1. The SQL statement's foreground process becomes a parallel execution coordinator.

  2. The parallel execution coordinator obtains as many parallel execution servers as needed (determined by the DOP) from the server pool or creates new parallel execution servers as needed.

  3. Oracle executes the statement as a sequence of operations. Each operation is performed in parallel, if possible.

  4. When statement processing is completed, the coordinator returns any resulting data to the user process that issued the statement and returns the parallel execution servers to the server pool.

The parallel execution coordinator calls upon the parallel execution servers during the execution of the SQL statement, not during the parsing of the statement. Therefore, when parallel execution is used with the shared server, the server process that processes the EXECUTE call of a user's statement becomes the parallel execution coordinator for the statement.

See Also:

"Setting the Degree of Parallelism" 

Variations in the Number of Parallel Execution Servers

If the number of parallel operations processed concurrently by an instance changes significantly, Oracle automatically changes the number of parallel execution servers in the pool.

If the number of parallel operations increases, Oracle creates additional parallel execution servers to handle incoming requests. However, Oracle never creates more parallel execution servers for an instance than the value specified by the initialization parameter PARALLEL_MAX_SERVERS.

If the number of parallel operations decreases, Oracle terminates any parallel execution servers that have been idle for a threshold period of time. Oracle does not reduce the size of the pool below the value of PARALLEL_MIN_SERVERS, no matter how long the parallel execution servers have been idle.

Processing Without Enough Parallel Execution Servers

Oracle can process a parallel operation with fewer than the requested number of processes.

If all parallel execution servers in the pool are occupied and the maximum number of parallel execution servers has been started, the parallel execution coordinator switches to serial processing.

See Also:


How Parallel Execution Servers Communicate

To execute a query in parallel, Oracle generally creates a producer queue server and a consumer server. The producer queue server retrieves rows from tables and the consumer server performs operations such as join, sort, DML, and DDL on these rows. Each server in the producer execution process set has a connection to each server in the consumer set. This means that the number of virtual connections between parallel execution servers increases as the square of the DOP.

Each communication channel has at least one, and sometimes up to four memory buffers. Multiple memory buffers facilitate asynchronous communication among the parallel execution servers.

A single-instance environment uses at most three buffers per communication channel. An Oracle Real Application Cluster environment uses at most four buffers per channel. Figure 21-1 illustrates message buffers and how producer parallel execution servers connect to consumer parallel execution servers.

Figure 21-1 Parallel Execution Server Connections and Buffers

Text description of scn81138.gif follows
Text description of the illustration scn81138.gif

When a connection is between two processes on the same instance, the servers communicate by passing the buffers back and forth. When the connection is between processes in different instances, the messages are sent using external high-speed network protocols. In Figure 21-1, the DOP is equal to the number of parallel execution servers, which in this case is n. Figure 21-1 does not show the parallel execution coordinator. Each parallel execution server actually has an additional connection to the parallel execution coordinator.

Parallelizing SQL Statements

Each SQL statement undergoes an optimization and parallelization process when it is parsed. When the data changes, if a more optimal execution or parallelization plan becomes available, Oracle can automatically adapt to the new situation.

After the optimizer determines the execution plan of a statement, the parallel execution coordinator determines the parallelization method for each operation in the plan. For example, the parallelization method might be to parallelize a full table scan by block range or parallelize an index range scan by partition. The coordinator must decide whether an operation can be performed in parallel and, if so, how many parallel execution servers to enlist. The number of parallel execution servers is the DOP.

See Also:


Dividing Work Among Parallel Execution Servers

The parallel execution coordinator examines the redistribution requirements of each operation. An operation's redistribution requirement is the way in which the rows operated on by the operation must be divided or redistributed among the parallel execution servers.

After determining the redistribution requirement for each operation in the execution plan, the optimizer determines the order in which the operations must be performed. With this information, the optimizer determines the data flow of the statement.

Figure 21-2 illustrates the data flow for a query to join the emp and dept tables:

SELECT dname, MAX(sal), AVG(sal) 
FROM emp, dept 
WHERE emp.deptno = dept.deptno 
GROUP BY dname;

Figure 21-2 Data Flow Diagram for a Join of the EMP and DEPT Tables

Text description of scn81019.gif follows
Text description of the illustration scn81019.gif

Parallelism Between Operations

Operations that require the output of other operations are known as parent operations. In Figure 21-2 the GROUP BY SORT operation is the parent of the HASH JOIN operation because GROUP BY SORT requires the HASH JOIN output.

Parent operations can begin consuming rows as soon as the child operations have produced rows. In the previous example, while the parallel execution servers are producing rows in the FULL SCAN dept operation, another set of parallel execution servers can begin to perform the HASH JOIN operation to consume the rows.

Each of the two operations performed concurrently is given its own set of parallel execution servers. Therefore, both query operations and the data flow tree itself have parallelism. The parallelism of an individual operation is called intraoperation parallelism and the parallelism between operations in a data flow tree is called interoperation parallelism.

Due to the producer-consumer nature of the Oracle server's operations, only two operations in a given tree need to be performed simultaneously to minimize execution time.

To illustrate intraoperation and interoperation parallelism, consider the following statement:


The execution plan implements a full scan of the emp table. This operation is followed by a sorting of the retrieved rows, based on the value of the ename column. For the sake of this example, assume the ename column is not indexed. Also assume that the DOP for the query is set to 4, which means that four parallel execution servers can be active for any given operation.

Figure 21-3 illustrates the parallel execution of the example query.

Figure 21-3 Interoperation Parallelism and Dynamic Partitioning

Text description of scn81020.gif follows
Text description of the illustration scn81020.gif

As you can see from Figure 21-3, there are actually eight parallel execution servers involved in the query even though the DOP is 4. This is because a parent and child operator can be performed at the same time (interoperation parallelism).

Also note that all of the parallel execution servers involved in the scan operation send rows to the appropriate parallel execution server performing the SORT operation. If a row scanned by a parallel execution server contains a value for the ename column between A and G, that row gets sent to the first ORDER BY parallel execution server. When the scan operation is complete, the sorting processes can return the sorted results to the coordinator, which, in turn, returns the complete query results to the user.


When a set of parallel execution servers completes its operation, it moves on to operations higher in the data flow. For example, in Figure 21-3, if there was another ORDER BY operation after the ORDER BY, the parallel execution servers performing the table scan would perform the second ORDER BY operation after completing the table scan.  

Types of Parallelism

The following types of parallelism are discussed in this section:

Parallel Query

You can parallelize queries and subqueries in SELECT statements. You can also parallelize the query portions of DDL statements and DML statements (INSERT, UPDATE, and DELETE).

However, you cannot parallelize the query portion of a DDL or DML statement if it references a remote object. When you issue a parallel DML or DDL statement in which the query portion references a remote object, the operation is automatically executed serially.

See Also:


Parallel Queries on Index-Organized Tables

The following parallel scan methods are supported on index-organized tables:

These scan methods can be used for index-organized tables with overflow areas and for index-organized tables that contain LOBs.

Nonpartitioned Index-Organized Tables

Parallel query on a nonpartitioned index-organized table uses parallel fast full scan. The DOP is determined, in decreasing order of priority, by:

  1. A PARALLEL hint (if present)


  3. The parallel degree associated with the table, if the parallel degree is specified in the CREATE TABLE or ALTER TABLE statement

The allocation of work is done by dividing the index segment into a sufficiently large number of block ranges and then assigning the block ranges to parallel execution servers in a demand-driven manner. The overflow blocks corresponding to any row are accessed in a demand-driven manner only by the process which owns that row.

Partitioned Index-Organized Tables

Both index range scan and fast full scan can be performed in parallel. For parallel fast full scan, parallelization is exactly the same as for nonpartitioned index-organized tables. For parallel index range scan on partitioned index-organized tables, the DOP is the minimum of the degree picked up from the above priority list (like in parallel fast full scan) and the number of partitions in the index-organized table. Depending on the DOP, each parallel execution server gets one or more partitions (assigned in a demand-driven manner), each of which contains the primary key index segment and the associated overflow segment, if any.

Parallel Queries on Object Types

Parallel queries can be performed on object type tables and tables containing object type columns. Parallel query for object types supports all of the features that are available for sequential queries on object types, including:

There are no limitations on the size of the object types for parallel queries.

The following restrictions apply to using parallel query for object types.

In all cases where the query cannot execute in parallel because of any of the above restrictions, the whole query executes serially without giving an error message.

Parallel DDL

This section includes the following topics on parallelism for DDL statements:

DDL Statements That Can Be Parallelized

You can parallelize DDL statements for tables and indexes that are nonpartitioned or partitioned. Table 21-3 summarizes the operations that can be parallelized in DDL statements.

The parallel DDL statements for nonpartitioned tables and indexes are:

The parallel DDL statements for partitioned tables and indexes are:

All of these DDL operations can be performed in no-logging mode for either parallel or serial execution.

CREATE TABLE for an index-organized table can be parallelized either with or without an AS SELECT clause.

Different parallelism is used for different operations (see Table 21-3). Parallel CREATE TABLE ... AS SELECT statements on partitioned tables and parallel CREATE INDEX statements on partitioned indexes execute with a DOP equal to the number of partitions.

Partition parallel analyze table is made less necessary by the ANALYZE {TABLE, INDEX} PARTITION statements, since parallel analyze of an entire partitioned table can be constructed with multiple user sessions.

Parallel DDL cannot occur on tables with object columns or LOB columns.

See Also:



For performance reasons, decision support applications often require large amounts of data to be summarized or rolled up into smaller tables for use with ad hoc, decision support queries. Rollup occurs regularly (such as nightly or weekly) during a short period of system inactivity.

Parallel execution lets you parallelize the query and create operations of creating a table as a subquery from another table or set of tables.

Figure 21-4 illustrates creating a table from a subquery in parallel.


Clustered tables cannot be created and populated in parallel. 

Figure 21-4 Creating a Summary Table in Parallel

Text description of scn81018.gif follows
Text description of the illustration scn81018.gif

Recoverability and Parallel DDL

When summary table data is derived from other tables' data, recoverability from media failure for the smaller summary table may not be important and can be turned off during creation of the summary table.

If you disable logging during parallel table creation (or any other parallel DDL operation), you should back up the tablespace containing the table once the table is created to avoid loss of the table due to media failure.

Use the NOLOGGING clause of the CREATE TABLE, CREATE INDEX, ALTER TABLE, and ALTER INDEX statements to disable undo and redo log generation.

See Also:

Oracle9i Database Administrator's Guide for information about recoverability of tables created in parallel 

Space Management for Parallel DDL

Creating a table or index in parallel has space management implications that affect both the storage space required during a parallel operation and the free space available after a table or index has been created.

Storage Space When Using Dictionary-Managed Tablespaces

When creating a table or index in parallel, each parallel execution server uses the values in the STORAGE clause of the CREATE statement to create temporary segments to store the rows. Therefore, a table created with a NEXT setting of 5 MB and a PARALLEL DEGREE of 12 consumes at least 60 megabytes (MB) of storage during table creation because each process starts with an extent of 5 MB. When the parallel execution coordinator combines the segments, some of the segments may be trimmed, and the resulting table may be smaller than the requested 60 MB.

See Also:


Free Space and Parallel DDL

When you create indexes and tables in parallel, each parallel execution server allocates a new extent and fills the extent with the table or index data. Thus, if you create an index with a DOP of 3, the index will have at least three extents initially. Allocation of extents is the same for rebuilding indexes in parallel and for moving, splitting, or rebuilding partitions in parallel.

Serial operations require the schema object to have at least one extent. Parallel creations require that tables or indexes have at least as many extents as there are parallel execution servers creating the schema object.

When you create a table or index in parallel, it is possible to create pockets of free space--either external or internal fragmentation. This occurs when the temporary segments used by the parallel execution servers are larger than what is needed to store the rows.

For example, if you specify a DOP of 3 for a CREATE TABLE ... AS SELECT statement, but there is only one datafile in the tablespace, then internal fragmentation may occur, as shown in Figure 21-5. The pockets of free space within the internal table extents of a datafile cannot be coalesced with other free space and cannot be allocated as extents.

See Also:

Oracle9i Database Performance Guide and Reference for more information about creating tables and indexes in parallel 

Figure 21-5 Unusable Free Space (Internal Fragmentation)

Text description of scn81032.gif follows
Text description of the illustration scn81032.gif

Parallel DML

Parallel DML (PARALLEL, INSERT, UPDATE, and DELETE) uses parallel execution mechanisms to speed up or scale up large DML operations against large database tables and indexes.


Although DML generally includes queries, in this chapter the term DML refers only to inserts, updates, and deletes.

Also note that the partitioning option must be installed to enable parallel DML. 

This section discusses the following parallel DML topics:

Advantages of Parallel DML over Manual Parallelism

You can parallelize DML operations manually by issuing multiple DML statements simultaneously against different sets of data. For example, you can parallelize manually by:

However, manual parallelism has the following disadvantages:

Parallel DML removes these disadvantages by performing inserts, updates, and deletes in parallel automatically.

When to Use Parallel DML

Parallel DML operations are mainly used to speed up large DML operations against large database objects. Parallel DML is useful in a DSS environment where the performance and scalability of accessing large objects are important. Parallel DML complements parallel query in providing you with both querying and updating capabilities for your DSS databases.

The overhead of setting up parallelism makes parallel DML operations infeasible for short OLTP transactions. However, parallel DML operations can speed up batch jobs running in an OLTP database.

Some of the scenarios where parallel DML is used include:

Refreshing Tables in a Data Warehouse System

In a data warehouse system, large tables need to be refreshed (updated) periodically with new or modified data from the production system. You can do this efficiently by using parallel DML combined with updatable join views. You can also use the MERGE statement.

The data that needs to be refreshed is generally loaded into a temporary table before starting the refresh process. This table contains either new rows or rows that have been updated since the last refresh of the data warehouse. You can use an updatable join view with parallel UPDATE to refresh the updated rows, and you can use an anti-hash join with parallel INSERT to refresh the new rows.

See Also:

Chapter 14, "Maintaining the Data Warehouse" for further information 

Creating Intermediate Summary Tables

In a DSS environment, many applications require complex computations that involve constructing and manipulating many large intermediate summary tables. These summary tables are often temporary and frequently do not need to be logged. Parallel DML can speed up the operations against these large intermediate tables. One benefit is that you can put incremental results in the intermediate tables and perform parallel update.

In addition, the summary tables may contain cumulative or comparison information which has to persist beyond application sessions; thus, temporary tables are not feasible. Parallel DML operations can speed up the changes to these large summary tables.

Using Scoring Tables

Many DSS applications score customers periodically based on a set of criteria. The scores are usually stored in large DSS tables. The score information is then used in making a decision, for example, inclusion in a mailing list.

This scoring activity queries and updates a large number of rows in the large table. Parallel DML can speed up the operations against these large tables.

Updating Historical Tables

Historical tables describe the business transactions of an enterprise over a recent time interval. Periodically, the DBA deletes the set of oldest rows and inserts a set of new rows into the table. Parallel INSERT ... SELECT and parallel DELETE operations can speed up this rollover task.

Although you can also use parallel direct loader (SQL*Loader) to insert bulk data from an external source, parallel INSERT ... SELECT is faster for inserting data that already exists in another table in the database.

Dropping a partition can also be used to delete old rows. However, to do this, the table has to be partitioned by date and with the appropriate time interval.

Running Batch Jobs

Batch jobs executed in an OLTP database during off hours have a fixed time window in which the jobs must complete. A good way to ensure timely job completion is to parallelize their operations. As the work load increases, more machine resources can be added; the scaleup property of parallel operations ensures that the time constraint can be met.

Enabling Parallel DML

A DML statement can be parallelized only if you have explicitly enabled parallel DML in the session with the ENABLE PARALLEL DML clause of the ALTER SESSION statement. This mode is required because parallel DML and serial DML have different locking, transaction, and disk space requirements.

The default mode of a session is DISABLE PARALLEL DML. When parallel DML is disabled, no DML will be executed in parallel even if the PARALLEL hint is used.

When parallel DML is enabled in a session, all DML statements in this session will be considered for parallel execution. However, even if parallel DML is enabled, the DML operation may still execute serially if there are no parallel hints or no tables with a parallel attribute or if restrictions on parallel operations are violated.

The session's PARALLEL DML mode does not influence the parallelism of SELECT statements, DDL statements, and the query portions of DML statements. Thus, if this mode is not set, the DML operation is not parallelized, but scans or join operations within the DML statement may still be parallelized.

See Also:


Transaction Restrictions for Parallel DML

To execute a DML operation in parallel, the parallel execution coordinator acquires or spawns parallel execution servers, and each parallel execution server executes a portion of the work under its own parallel process transaction.

The coordinator also has its own coordinator transaction, which can have its own rollback segment. In order to ensure user-level transactional atomicity, the coordinator uses a two-phase commit protocol to commit the changes performed by the parallel process transactions.

A session that is enabled for parallel DML may put transactions in the session in a special mode: If any DML statement in a transaction modifies a table in parallel, no subsequent serial or parallel query or DML statement can access the same table again in that transaction. This means that the results of parallel modifications cannot be seen during the transaction.

Serial or parallel statements that attempt to access a table that has already been modified in parallel within the same transaction are rejected with an error message.

If a PL/SQL procedure or block is executed in a parallel DML enabled session, then this rule applies to statements in the procedure or block.

Rollback Segments

Oracle assigns transactions to rollback segments that have the fewest active transactions. To speed up both forward and undo operations, you should create and bring online enough rollback segments so that at most two parallel process transactions are assigned to one rollback segment.

The SET TRANSACTION USE ROLLBACK SEGMENT statement is ignored when parallel DML is used because parallel DML requires more than one rollback segment for performance.

You should create the rollback segments in tablespaces that have enough space for them to extend when necessary. You can then set the MAXEXTENTS storage parameters for the rollback segments to UNLIMITED. Also, set the OPTIMAL value for the rollback segments so that after the parallel DML transactions commit, the rollback segments are shrunk to the OPTIMAL size.

Recovery for Parallel DML

The time required to roll back a parallel DML operation is roughly equal to the time it takes to perform the forward operation.

Oracle supports parallel rollback after transaction and process failures, and after instance and system failures. Oracle can parallelize both the rolling forward stage and the rolling back stage of transaction recovery.

See Also:

Oracle9i Backup and Recovery Concepts for details about parallel rollback 

Transaction Recovery for User-Issued Rollback

A user-issued rollback in a transaction failure due to statement error is performed in parallel by the parallel execution coordinator and the parallel execution servers. The rollback takes approximately the same amount of time as the forward transaction.

Process Recovery

Recovery from the failure of a parallel execution coordinator or parallel execution server is performed by the PMON process. If a parallel execution server or a parallel execution coordinator fails, PMON rolls back the work from that process and all other processes in the transaction roll back their changes.

System Recovery

Recovery from a system failure requuires a new startup. Recovery is performed by the SMON process and any recovery server processes spawned by SMON. Parallel DML statements may be recovered using parallel rollback. If the initialization parameter COMPATIBLE is set to 8.1.3 or greater, Fast-Start On-Demand Rollback enables dead transactions to be recovered, on demand one block at a time.

Instance Recovery (Oracle Real Application Clusters)

Recovery from an instance failure in an Oracle Real Application Cluster is performed by the recovery processes (that is, the SMON processes and any recovery server processes they spawn) of other live instances. Each recovery process of the live instances can recover the parallel execution coordinator or parallel execution server transactions of the failed instance independently.

Space Considerations for Parallel DML

Parallel UPDATE uses the space in the existing object, while direct-path INSERT gets new segments for the data.

Space usage characteristics may be different in parallel than sequential execution because multiple concurrent child transactions modify the object.

Lock and Enqueue Resources for Parallel DML

A parallel DML operation's lock and enqueue resource requirements are very different from the serial DML requirements. Parallel DML holds many more locks, so you should increase the starting value of the ENQUEUE_RESOURCES and DML_LOCKS parameters.

See Also:


Restrictions on Parallel DML

The following restrictions apply to parallel DML (including direct-path INSERT):

Violations of these restrictions cause the statement to execute serially without warnings or error messages (except for the restriction on statements accessing the same table in a transaction, which can cause error messages). For example, an update is serialized if it is on a nonpartitioned table.

See Also:

Oracle9i Application Developer's Guide - Large Objects (LOBs) for more information about LOB restrictions 

Partitioning Key Restriction

You can only update the partitioning key of a partitioned table to a new value if the update does not cause the row to move to a new partition. The update is possible if the table is defined with the row movement clause enabled.

Function Restrictions

The function restrictions for parallel DML are the same as those for parallel DDL and parallel query.

See Also:

"Parallel Execution of Functions" 

Data Integrity Restrictions

This section describes the interactions of integrity constraints and parallel DML statements.


These types of integrity constraints are allowed. They are not a problem for parallel DML because they are enforced on the column and row level, respectively.


These types of integrity constraints are allowed.

FOREIGN KEY (Referential Integrity)

Restrictions for referential integrity occur whenever a DML operation on one table could cause a recursive DML operation on another table. These restrictions also apply when, in order to perform an integrity check, it is necessary to see simultaneously all changes made to the object being modified.

Table 21-1 lists all of the operations that are possible on tables that are involved in referential integrity constraints.

Table 21-1 Referential Integrity Restrictions
DML Statement  Issued on Parent  Issued on Child  Self-Referential 


(Not applicable)  

Not parallelized  

Not parallelized 


(Not applicable) 

Not parallelized 

Not parallelized 

UPDATE No Action 



Not parallelized 

DELETE No Action 



Not parallelized 

DELETE Cascade 

Not parallelized 

(Not applicable)  

Not parallelized 

Delete Cascade

Delete on tables having a foreign key with delete cascade is not parallelized because parallel execution servers will try to delete rows from multiple partitions (parent and child tables).

Self-Referential Integrity

DML on tables with self-referential integrity constraints is not parallelized if the referenced keys (primary keys) are involved. For DML on all other columns, parallelism is possible.

Deferrable Integrity Constraints

If any deferrable constraints apply to the table being operated on, the DML operation will not be parallelized.

Trigger Restrictions

A DML operation will not be parallelized if the affected tables contain enabled triggers that may get fired as a result of the statement. This implies that DML statements on tables that are being replicated will not be parallelized.

Relevant triggers must be disabled in order to parallelize DML on the table. Note that, if you enable or disable triggers, the dependent shared cursors are invalidated.

Distributed Transaction Restrictions

A DML operation cannot be parallelized if it is in a distributed transaction or if the DML or the query operation is against a remote object.

Example 21-1 Distributed Transaction Parallelization: Example 1

In this example, the DML statement queries a remote object:

INSERT /* APPEND PARALLEL (t3,2) */ INTO t3 SELECT * FROM t4@dblink; 

The query operation is executed serially without notification because it references a remote object.

Example 21-2 Distributed Transaction Parallelization: Example 2

In this example, the DML operation is applied to a remote object:

DELETE /*+ PARALLEL (t1, 2) */ FROM t1@dblink;

The DELETE operation is not parallelized because it references a remote object.

Example 21-3 Distributed Transaction Parallelization: Example 3

In this example, the DML operation is in a distributed transaction:

SELECT * FROM t1@dblink; 
DELETE /*+ PARALLEL (t2,2) */ FROM t2; 

The DELETE operation is not parallelized because it occurs in a distributed transaction (which is started by the SELECT statement).

Parallel Execution of Functions

SQL statements can contain user-defined functions written in PL/SQL, in Java, or as external procedures in C that can appear as part of the SELECT list, SET clause, or WHERE clause. When the SQL statement is parallelized, these functions are executed on a per-row basis by the parallel execution server. Any PL/SQL package variables or Java static attributes used by the function are entirely private to each individual parallel execution process and are newly initialized when each row is processed, rather than being copied from the original session. Because of this, not all functions will generate correct results if executed in parallel.

User-written table functions can appear in the statement's FROM list. These functions act like source tables in that they output rows. Table functions are initialized once during the statement at the start of each parallel execution process. As above, any variables are entirely private to the parallel execution process.

Functions in Parallel Queries

In a SELECT statement or a subquery in a DML or DDL statement, a user-written function may be executed in parallel if it has been declared with the PARALLEL_ENABLE keyword, if it is declared in a package or type and has a PRAGMA RESTRICT_REFERENCES that indicates all of WNDS, RNPS, and WNPS, or if it is declared with CREATE FUNCTION and the system can analyze the body of the PL/SQL code and determine that the code neither writes to the database nor reads or modifies package variables.

Other parts of a query or subquery can sometimes execute in parallel even if a given function execution must remain serial.

See Also:


Functions in Parallel DML and DDL Statements

In a parallel DML or DDL statement, as in a parallel query, a user-written function may be executed in parallel if it has been declared with the PARALLEL_ENABLE keyword, if it is declared in a package or type and has a PRAGMA RESTRICT_REFERENCES that indicates all of RNDS, WNDS, RNPS, and WNPS, or if it is declared with CREATE FUNCTION and the system can analyze the body of the PL/SQL code and determine that the code neither reads nor writes to the database or reads nor modifies package variables.

For a parallel DML statement, any function call that cannot be executed in parallel causes the entire DML statement to be executed serially.

For an INSERT ... SELECT or CREATE TABLE ... AS SELECT statement, function calls in the query portion are parallelized according to the parallel query rules in the prior paragraph. The query may be parallelized even if the remainder of the statement must execute serially, or vice versa.

Other Types of Parallelism

In addition to parallel SQL execution, Oracle can use parallelism for the following types of operations:

Like parallel SQL, parallel recovery and propagation are performed by a parallel execution coordinator and multiple parallel execution servers. Parallel load, however, uses a different mechanism.

The behavior of the parallel execution coordinator and parallel execution servers may differ, depending on what kind of operation they perform (SQL, recovery, or propagation). For example, if all parallel execution servers in the pool are occupied and the maximum number of parallel execution servers has been started:

For a given session, the parallel execution coordinator coordinates only one kind of operation. A parallel execution coordinator cannot coordinate, for example, parallel SQL and parallel recovery or propagation at the same time.

See Also:


Initializing and Tuning Parameters for Parallel Execution

You can initialize and automatically tune parallel execution by setting the initialization parameter PARALLEL_AUTOMATIC_TUNING to TRUE. Once enabled, automated parallel execution controls values for all parameters related to parallel execution. These parameters affect several aspects of server processing, namely, the DOP, the adaptive multiuser feature, and memory sizing.

With parallel automatic tuning enabled, Oracle determines parameter settings for each environment based on the number of CPUs on your system at database startup and the value set for PARALLEL_THREADS_PER_CPU. The default values Oracle sets for parallel execution processing when PARALLEL_AUTOMATIC_TUNING is TRUE are usually optimal for most environments. In most cases, Oracle's automatically derived settings are at least as effective as manually derived settings.

You can also manually tune parallel execution parameters; however, Oracle recommends using automated parallel execution. Manual tuning of parallel execution is more complex than using automated tuning for two reasons: manual parallel execution tuning requires more attentive administration than automated tuning, and manual tuning is prone to user-load and system-resource miscalculations.

Initializing and tuning parallel execution involves the following steps:

Selecting Automated or Manual Tuning of Parallel Execution

There are several ways to initialize and tune parallel execution. You can make your environment fully automated for parallel execution. As mentioned, by setting PARALLEL_AUTOMATIC_TUNING to TRUE. You can further customize this type of environment by overriding some of the automatically derived values.

You can also leave PARALLEL_AUTOMATIC_TUNING at its default value of FALSE and manually set the parameters that affect parallel execution. For most OLTP environments and other types of systems that would not benefit from parallel execution, do not enable parallel execution.


Well-established, manually tuned systems that achieve desired resource-use patterns might not benefit from automated parallel execution.  

Using Automatically Derived Parameter Settings

When PARALLEL_AUTOMATIC_TUNING is TRUE, Oracle automatically sets other parameters, as shown in Table 21-2. For most systems, you do not need to make further adjustments to have an adequately tuned, fully automated parallel execution environment.

Table 21-2 Parameters Affected by PARALLEL_AUTOMATIC_TUNING
Parameter  Default  Default if






The greater of: 1.2 x PARALLEL_MAX_SERVERS or
+ 6 + 5 + (CPUs x 4) 

Value is forced up to minimum if PARALLEL_AUTOMATIC_TUNING is TRUE.  


(PROCESSES x 1.1) + 5 

(PROCESSES x 1.1) + 5  

Automatic parallel tuning indirectly affects SESSIONS. If you do not set SESSIONS, Oracle sets it based on the value for PROCESSES.  


CPU x 10 

Use this limit to maximize the number of processes that parallel execution uses. The value for this parameter is port-specific so processing can vary from system to system.  



PARALLEL_EXECUTION_POOL + Shared Server heap requirements +
Backup buffer requests +
600 KB 

Oracle does not allocate parallel execution buffers from the SHARED_POOL when PARALLEL_AUTOMATIC_TUNING is set to TRUE.  


2 KB
(port specific) 

4 KB (port specific)  

Default increases because Oracle allocates memory from the LARGE_POOL.  

As mentioned, you can manually adjust the parameters shown in Table 21-2, even if you set PARALLEL_AUTOMATIC_TUNING to TRUE. You might need to do this if you have a highly customized environment or if your system does not perform optimally using the completely automated settings.

Setting the Degree of Parallelism

The parallel execution coordinator may enlist two or more of the instance's parallel execution servers to process a SQL statement. The number of parallel execution servers associated with a single operation is known as the degree of parallelism.

The DOP is specified in the following ways:

The following example shows a statement that sets the DOP to 4 on a table:


This next example sets the DOP on an index to 4:


This last example sets a hint to 4 on a query:

SELECT /*+ PARALLEL(emp, 4) */ COUNT(*) FROM emp;

Note that the DOP applies directly only to intraoperation parallelism. If interoperation parallelism is possible, the total number of parallel execution servers for a statement can be twice the specified DOP. No more than two operations can be performed simultaneously.

Parallel execution is designed to effectively use multiple CPUs and disks to answer queries quickly. When multiple users employ parallel execution at the same time, available CPU, memory, and disk resources may be quickly exhausted. Oracle provides several ways to deal with resource utilization in conjunction with parallel execution, including:

How Oracle Determines the Degree of Parallelism for Operations

The parallel execution coordinator determines the DOP by considering several specifications. The coordinator:

  1. Checks for hints or a PARALLEL clause specified in the SQL statement itself

  2. Checks for a session value set by the ALTER SESSION FORCE PARALLEL statement

  3. Looks at the table's or index's definition

After a DOP is found in one of these specifications, it becomes the DOP for the operation.

Hints, PARALLEL clauses, table or index definitions, and default values only determine the number of parallel execution servers that the coordinator requests for a given operation. The actual number of parallel execution servers used depends upon how many processes are available in the parallel execution server pool and whether interoperation parallelism is possible.

See Also:



You can specify hints in a SQL statement to set the DOP for a table or index and for the caching behavior of the operation.

Table and Index Definitions

You can specify the DOP within a table or index definition by using one of the following statements: CREATE TABLE, ALTER TABLE, CREATE INDEX, or ALTER INDEX.

See Also:

Oracle9i SQL Reference for information about the complete syntax of SQL statements 

Default Degree of Parallelism

The default DOP is used when you ask to parallelize an operation but you do not specify a DOP in a hint or within the definition of a table or index. The default DOP is appropriate for most applications.

The default DOP for a SQL statement is determined by the following factors:

The above factors determine the default number of parallel execution servers to use. However, the actual number of processes used is limited by their availability on the requested instances during run time. The initialization parameter PARALLEL_MAX_SERVERS sets an upper limit on the total number of parallel execution servers that an instance can have.

If a minimum fraction of the desired parallel execution servers is not available (specified by the initialization parameter PARALLEL_MIN_PERCENT), a user error is produced. The user can then retry the query with less parallelism.

See Also:

Oracle9i Database Performance Guide and Reference for information about adjusting the DOP 

Adaptive Multiuser Algorithm

When the adaptive multiuser algorithm is enabled, the parallel execution coordinator varies the DOP according to the system load. The Database Resource Manager determines the load by calculating the number of allocated threads. If the number of threads currently allocated is larger than the optimal number of threads, given the number of available CPUs, the algorithm reduces the DOP. This reduction improves throughput by avoiding overallocation of resources.

Minimum Number of Parallel Execution Servers

Oracle can perform an operation in parallel as long as at least two parallel execution servers are available. If too few parallel execution servers are available, your SQL statement may execute slower than expected. You can specify the minimum percentage of requested parallel execution servers that must be available in order for the operation to execute. This strategy ensures that your SQL statement executes with a minimum acceptable parallel performance. If the minimum percentage of requested parallel execution servers is not available, the SQL statement does not execute and returns an error.

The initialization parameter PARALLEL_MIN_PERCENT specifies the desired minimum percentage of requested parallel execution servers. This parameter affects DML and DDL operations as well as queries.

For example, if you specify 50 for this parameter, then at least 50 percent of the parallel execution servers requested for any parallel operation must be available in order for the operation to succeed. If 20 parallel execution servers are requested, then at least 10 must be available or an error is returned to the user. If PARALLEL_MIN_PERCENT is set to null, then all parallel operations will proceed as long as at least two parallel execution servers are available for processing.

Limiting the Number of Available Instances

In an Oracle Real Application Cluster, instance groups can be used to limit the number of instances that participate in a parallel operation. You can create any number of instance groups, each consisting of one or more instances. You can then specify which instance group is to be used for any or all parallel operations. Parallel execution servers will only be used on instances which are members of the specified instance group.

See Also:

Oracle9i Real Application Clusters Administration and Oracle9i Real Application Clusters Deployment and Performance for more information about instance groups 

Balancing the Workload

To optimize performance, all parallel execution servers should have equal work loads. For SQL statements parallelized by block range or by parallel execution servers, the workload is dynamically divided among the parallel execution servers. This minimizes workload skewing, which occurs when some parallel execution servers perform significantly more work than the other processes.

For SQL statements parallelized by partitions, if the workload is evenly distributed among the partitions, you can optimize performance by matching the number of parallel execution servers to the number of partitions or by choosing a DOP in which the number of partitions is a multiple of the number of processes.

For example, suppose a table has 10 partition, and a parallel operation divides the work evenly among them. You can use 10 parallel execution servers (DOP equals 10) to do the work in approximately one-tenth the time that one process would take. You might also use five processes to do the work in one-fifth the time, or two processes to do the work in one-half the time.

If, however, you use nine processes to work on 10 partitions, the first process to finish its work on one partition then begins work on the 10th partition; and as the other processes finish their work, they become idle. This configuration does not provide good performance when the work is evenly divided among partitions. When the work is unevenly divided, the performance varies depending on whether the partition that is left for last has more or less work than the other partitions.

Similarly, suppose you use four processes to work on 10 partitions and the work is evenly divided. In this case, each process works on a second partition after finishing its first partition, but only two of the processes work on a third partition while the other two remain idle.

In general, you cannot assume that the time taken to perform a parallel operation on a given number of partitions (N) with a given number of parallel execution servers (P) will be N/P. This formula does not take into account the possibility that some processes might have to wait while others finish working on the last partitions. By choosing an appropriate DOP, however, you can minimize the workload skew and optimize performance.

See Also:

"Affinity and Parallel DML" for information about balancing the workload with disk affinity 

Parallelization Rules for SQL Statements

A SQL statement can be parallelized if it includes a parallel hint or if the table or index being operated on has been declared PARALLEL with a CREATE or ALTER statement. In addition, a DDL statement can be parallelized by using the PARALLEL clause. However, not all of these methods apply to all types of SQL statements.

Parallelization has two components: the decision to parallelize and the DOP. These components are determined differently for queries, DDL operations, and DML operations.

To determine the DOP, Oracle looks at the reference objects:

Rules for Parallelizing Queries

Decision to Parallelize

A SELECT statement can be parallelized only if the following conditions are satisfied:

  1. The query includes a parallel hint specification (PARALLEL or PARALLEL_INDEX) or the schema objects referred to in the query have a PARALLEL declaration associated with them.

  2. At least one of the tables specified in the query requires one of the following:

    • A full table scan

    • An index range scan spanning multiple partitions

Degree of Parallelism

The DOP for a query is determined by the following rules:

  1. The query uses the maximum DOP taken from all of the table declarations involved in the query and all of the potential indexes that are candidates to satisfy the query (the reference objects). That is, the table or index that has the greatest DOP determines the query's DOP (maximum query directive).

  2. If a table has both a parallel hint specification in the query and a parallel declaration in its table specification, the hint specification takes precedence over parallel declaration specification. See Table 21-3 for precedence rules.

Rules for Parallelizing UPDATE, MERGE, and DELETE

UPDATE, MERGE, and DELETE operations are parallelized by partition or subpartition. Updates, merges, and deletes can only be parallelized on partitioned tables. Update, merge, and delete parallelism are not possible within a partition, nor on a nonpartitioned table.

You have two ways to specify parallel directives for UPDATE, MERGE, and DELETE operations (assuming that PARALLEL DML mode is enabled):

  1. Use a parallel clause in the definition of the table being updated or deleted (the reference object).

  2. Use an update, merge, or delete parallel hint in the statement.

Parallel hints are placed immediately after the UPDATE, MERGE, or DELETE keywords in UPDATE, MERGE, and DELETE statements. The hint also applies to the underlying scan of the table being changed.

You can use the ALTER SESSION FORCE PARALLEL DML statement to override parallel clauses for subsequent UPDATE, MERGE, and DELETE statements in a session. Parallel hints in UPDATE, MERGE, and DELETE statements override the ALTER SESSION FORCE PARALLEL DML statement.

Decision to Parallelize

The following rule determines whether the UPDATE, MERGE, or DELETE operation should be parallelized:

If the statement contains subqueries or updatable views, then they may have their own separate parallel hints or clauses. However, these parallel directives do not affect the decision to parallelize the UPDATE, MERGE, or DELETE.

The parallel hint or clause on the tables is used by both the query and the UPDATE, MERGE, DELETE portions to determine parallelism, the decision to parallelize the UPDATE, MERGE, or DELETE portion is made independently of the query portion, and vice versa.

Degree of Parallelism

The DOP is determined by the same rules as for the queries. Note that in the case of UPDATE and DELETE operations, only the target table to be modified (the only reference object) is involved. Thus, the UPDATE or DELETE parallel hint specification takes precedence over the parallel declaration specification of the target table. In other words, the precedence order is: MERGE, UPDATE, DELETE hint > Session > Parallel declaration specification of target table

See Table 21-3 for precedence rules.

The maximum DOP you can achieve is equal to the number of partitions (or subpartitions in the case of composite subpartitions) in the table. A parallel execution server can update or merge into, or delete from multiple partitions, but each partition can only be updated or deleted by one parallel execution server.

If the DOP is less than the number of partitions, then the first process to finish work on one partition continues working on another partition, and so on until the work is finished on all partitions. If the DOP is greater than the number of partitions involved in the operation, then the excess parallel execution servers will have no work to do.

Example 21-4 Parallelization: Example 1

UPDATE tbl_1 SET c1=c1+1 WHERE c1>100;

If tbl_1 is a partitioned table and its table definition has a parallel clause, then the update operation is parallelized even if the scan on the table is serial (such as an index scan), assuming that the table has more than one partition with c1 greater than 100.

Example 21-5 Parallelization: Example 2

UPDATE /*+ PARALLEL(tbl_2,4) */ tbl_2 SET c1=c1+1;

Both the scan and update operations on tbl_2 will be parallelized with degree four.

Rules for Parallelizing INSERT ... SELECT

An INSERT ... SELECT statement parallelizes its INSERT and SELECT operations independently, except for the DOP.

You can specify a parallel hint after the INSERT keyword in an INSERT ... SELECT statement. Because the tables being queried are usually not the same as the table being inserted into, the hint enables you to specify parallel directives specifically for the insert operation.

You have the following ways to specify parallel directives for an INSERT ... SELECT statement (assuming that PARALLEL DML mode is enabled):

You can use the ALTER SESSION FORCE PARALLEL DML statement to override parallel clauses for subsequent INSERT operations in a session. Parallel hints in insert operations override the ALTER SESSION FORCE PARALLEL DML statement.

Decision to Parallelize

The following rule determines whether the INSERT operation should be parallelized in an INSERT ... SELECT statement:

The decision to parallelize the INSERT operation is made independently of the SELECT operation, and vice versa.

Degree of Parallelism

Once the decision to parallelize the SELECT or INSERT operation is made, one parallel directive is picked for deciding the DOP of the whole statement, using the following precedence rule Insert hint directive > Session> Parallel declaration specification of the inserting table > Maximum query directive.

In this context, maximum query directive means that among multiple tables and indexes, the table or index that has the maximum DOP determines the parallelism for the query operation.

The chosen parallel directive is applied to both the SELECT and INSERT operations.

Example 21-6 Parallelization: Example 3

The DOP used is 2, as specified in the INSERT hint:

INSERT /*+ PARALLEL(tbl_ins,2) */ INTO tbl_ins
SELECT /*+ PARALLEL(tbl_sel,4) */ * FROM tbl_sel;

Rules for Parallelizing DDL Statements

Decision to Parallelize

DDL operations can be parallelized if a PARALLEL clause (declaration) is specified in the syntax. In the case of CREATE INDEX and ALTER INDEX ... REBUILD or ALTER INDEX ... REBUILD PARTITION, the parallel declaration is stored in the data dictionary.

You can use the ALTER SESSION FORCE PARALLEL DDL statement to override the parallel clauses of subsequent DDL statements in a session.

Degree of Parallelism

The DOP is determined by the specification in the PARALLEL clause, unless it is overridden by an ALTER SESSION FORCE PARALLEL DDL statement. A rebuild of a partitioned index is never parallelized.

Parallel clauses in CREATE TABLE and ALTER TABLE statements specify table parallelism. If a parallel clause exists in a table definition, it determines the parallelism of DDL statements as well as queries. If the DDL statement contains explicit parallel hints for a table, however, those hints override the effect of parallel clauses for that table. You can use the ALTER SESSION FORCE PARALLEL DDL statement to override parallel clauses.



The CREATE INDEX and ALTER INDEX ... REBUILD statements can be parallelized only by a PARALLEL clause or an ALTER SESSION FORCE PARALLEL DDL statement.

ALTER INDEX ... REBUILD can be parallelized only for a nonpartitioned index, but ALTER INDEX ... REBUILD PARTITION can be parallelized by a PARALLEL clause or an ALTER SESSION FORCE PARALLEL DDL statement.

The scan operation for ALTER INDEX ... REBUILD (nonpartitioned), ALTER INDEX ... REBUILD PARTITION, and CREATE INDEX has the same parallelism as the REBUILD or CREATE operation and uses the same DOP. If the DOP is not specified for REBUILD or CREATE, the default is the number of CPUs.


The ALTER INDEX ... MOVE PARTITION and ALTER INDEX ... SPLIT PARTITION statements can be parallelized only by a PARALLEL clause or an ALTER SESSION FORCE PARALLEL DDL statement. Their scan operations have the same parallelism as the corresponding MOVE or SPLIT operations. If the DOP is not specified, the default is the number of CPUs.

Rules for Parallelizing CREATE TABLE AS SELECT

The CREATE TABLE ... AS SELECT statement contains two parts: a CREATE part (DDL) and a SELECT part (query). Oracle can parallelize both parts of the statement. The CREATE part follows the same rules as other DDL operations.

Decision to Parallelize (Query Part)

The query part of a CREATE TABLE ... AS SELECT statement can be parallelized only if the following conditions are satisfied:

  1. The query includes a parallel hint specification (PARALLEL or PARALLEL_INDEX) or the CREATE part of the statement has a PARALLEL clause specification or the schema objects referred to in the query have a PARALLEL declaration associated with them.

  2. At least one of the tables specified in the query requires one of the following:

    • A full table scan

    • An index range scan spanning multiple partitions

Degree of Parallelism (Query Part)

The DOP for the query part of a CREATE TABLE ... AS SELECT statement is determined by one of the following rules:

Note that any values specified in a hint for parallelism are ignored.

See Also:

"Rules for Parallelizing Queries" 

Decision to Parallelize (CREATE Part)

The CREATE operation of CREATE TABLE ... AS SELECT can be parallelized only by a PARALLEL clause or an ALTER SESSION FORCE PARALLEL DDL statement.

When the CREATE operation of CREATE TABLE ... AS SELECT is parallelized, Oracle also parallelizes the scan operation if possible. The scan operation cannot be parallelized if, for example:

When the CREATE operation is not parallelized, the SELECT can be parallelized if it has a PARALLEL hint or if the selected table (or partitioned index) has a parallel declaration.

Degree of Parallelism (CREATE Part)

The DOP for the CREATE operation, and for the SELECT operation if it is parallelized, is specified by the PARALLEL clause of the CREATE statement, unless it is overridden by an ALTER SESSION FORCE PARALLEL DDL statement. If the PARALLEL clause does not specify the DOP, the default is the number of CPUs.

Summary of Parallelization Rules

Table 21-3 shows how various types of SQL statements can be parallelized and indicates which methods of specifying parallelism take precedence.

Enabling Parallelism for Tables and Queries

The DOP of tables involved in parallel operations affect the DOP for operations on those tables. Therefore, after setting parallel tuning parameters, you must also enable parallel execution for each table you want parallelized, using the PARALLEL clause of the CREATE TABLE or ALTER TABLE statements. You can also use the PARALLEL hint with SQL statements to enable parallelism for that operation only, or use the FORCE option of the ALTER SESSION statement to enable parallelism for all subsequent operations in the session.

When you parallelize tables, you can also specify the DOP or allow Oracle to use a default DOP. The value of the default DOP is derived automatically, based on the value of PARALLEL_THREADS_PER_CPU and the number of CPUs available to Oracle.

ALTER TABLE emp PARALLEL;    -- uses default DOP
ALTER TABLE emp PARALLEL 4;  -- users DOP of 4

Degree of Parallelism and Adaptive Multiuser: How They Interact

The DOP specifies the number of available processes, or threads, used in parallel operations. Each parallel thread can use one or two query processes, depending on the query's complexity.

The adaptive multiuser feature adjusts the DOP based on user load. For example, you might have a table with a DOP of 5. This DOP might be acceptable with 10 users. However, if 10 more users enter the system and you enable the PARALLEL_ADAPTIVE_MULTI_USER feature, Oracle reduces the DOP to spread resources more evenly according to the perceived system load.


Once Oracle determines the DOP for a query, the DOP does not change for the duration of the query. 

It is best to use the parallel adaptive multiuser feature when users process simultaneous parallel execution operations. If you enable PARALLEL_AUTOMATIC_TUNING, Oracle automatically sets PARALLEL_ADAPTIVE_MULTI_USER to TRUE.


Disable adaptive multiuser for single-user, batch processing systems or if your system already provides optimal performance.  

How the Adaptive Multiuser Algorithm Works

The adaptive multiuser algorithm has several inputs. The algorithm first considers the number of allocated threads as calculated by the Database Resource Manager. The algorithm then considers the default settings for parallelism as set in the initialization parameter file, as well as parallelism options used in CREATE TABLE and ALTER TABLE statements and SQL hints.

When a system is overloaded and the input DOP is larger than the default DOP, the algorithm uses the default degree as input. The system then calculates a reduction factor that it applies to the input DOP. For example, using a 16-CPU system, when the first user enters the system and it is idle, it will be granted a DOP of 32. The next user will be give a DOP of eight, the next four, and so on. If the system settles into a steady state of eight users issuing queries, all the users will eventually be given a DOP of 4, thus dividing the system evenly among all the parallel users.

Forcing Parallel Execution for a Session

If you are sure you want to execute in parallel and want to avoid setting the DOP for a table or modifying the queries involved, you can force parallelism with the following statement:


All subsequent queries will be executed in parallel provided no restrictions are violated. You can also force DML and DDL statements. This clause overrides any parallel clause specified in subsequent statements in the session, but is overridden by a parallel hint.

In typical OLTP environments, for example, the tables are not set parallel, but nightly batch scripts may want to collect data from these tables in parallel. By setting the DOP in the session, the user avoids altering each table in parallel and then altering it back to serial when finished.

See Also:

Oracle9i Database Administrator's Guide for additional information on forcing parallel execution 

Controlling Performance with the Degree of Parallelism

The initialization parameter PARALLEL_THREADS_PER_CPU affects algorithms controlling both the DOP and the adaptive multiuser feature. Oracle multiplies the value of PARALLEL_THREADS_PER_CPU by the number of CPUs per instance to derive the number of threads to use in parallel operations.

The adaptive multiuser feature also uses the default DOP to compute the target number of query server processes that should exist in a system. When a system is running more processes than the target number, the adaptive algorithm reduces the DOP of new queries as required. Therefore, you can also use PARALLEL_THREADS_PER_CPU to control the adaptive algorithm.

PARALLEL_THREADS_PER_CPU enables you to adjust for hardware configurations with I/O subsystems that are slow relative to the CPU speed and for application workloads that perform few computations relative to the amount of data involved. If the system is neither CPU-bound nor I/O-bound, then the PARALLEL_THREADS_PER_CPU value should be increased. This increases the default DOP and allow better utilization of hardware resources. The default for PARALLEL_THREADS_PER_CPU on most platforms is 2. However, the default for machines with relatively slow I/O subsystems can be as high as eight.

Tuning General Parameters for Parallel Execution

This section discusses the following topics:

Parameters Establishing Resource Limits for Parallel Operations

The parameters that establish resource limits are:


The recommended value for the PARALLEL_MAX_SEVERS parameter is:


The PARALLEL_MAX_SEVERS parameter sets a resource limit on the maximum number of processes available for parallel execution. If you set PARALLEL_AUTOMATIC_TUNING to FALSE, you need to manually specify a value for PARALLEL_MAX_SERVERS.

Most parallel operations need at most twice the number of query server processes as the maximum DOP attributed to any table in the operation.

If PARALLEL_AUTOMATIC_TUNING is FALSE, the default value for PARALLEL_MAX_SERVERS is 5. This is sufficient for some minimal operations, but not enough for effective use of parallel execution. If you manually set the PARALLEL_MAX_SERVERS parameter, set it to 16 times the number of CPUs. This is a reasonable starting value that will allow you to run four parallel queries simultaneously, assuming that each query is using a DOP of eight.

If the hardware system is neither CPU bound nor I/O bound, then you can increase the number of concurrent parallel execution users on the system by adding more query server processes. When the system becomes CPU- or I/O-bound, however, adding more concurrent users becomes detrimental to the overall performance. Careful setting of PARALLEL_MAX_SERVERS is an effective method of restricting the number of concurrent parallel operations.

If users initiate too many concurrent operations, Oracle might not have enough query server processes. In this case, Oracle executes the operations sequentially or displays an error if PARALLEL_MIN_PERCENT is set to a value other than the default value of 0 (zero).

This condition can be verified through the GV$SYSSTAT view by comparing the statistics for parallel operations not downgraded and parallel operations downgraded to serial. For example:

SELECT * FROM GV$SYSSTAT WHERE name LIKE 'Parallel operation%';
When Users Have Too Many Processes

When concurrent users have too many query server processes, memory contention (paging), I/O contention, or excessive context switching can occur. This contention can reduce system throughput to a level lower than if parallel execution were not used. Increase the PARALLEL_MAX_SERVERS value only if the system has sufficient memory and I/O bandwidth for the resulting load.

You can use operating system performance monitoring tools to determine how much memory, swap space and I/O bandwidth are free. Look at the runq lengths for both your CPUs and disks, as well as the service time for I/Os on the system. Verify that the machine has sufficient swap space exists on the machine to add more processes. Limiting the total number of query server processes might restrict the number of concurrent users who can execute parallel operations, but system throughput tends to remain stable.

Increasing the Number of Concurrent Users

To increase the number of concurrent users, you can restrict the number of concurrent sessions that resource consumer groups can have. For example:

Limiting the Number of Resources for a User

You can limit the amount of parallelism available to a given user by establishing a resource consumer group for the user. Do this to limit the number of sessions, concurrent logons, and the number of parallel processes that any one user or group of users can have.

Each query server process working on a parallel execution statement is logged on with a session ID. Each process counts against the user's limit of concurrent sessions. For example, to limit a user to 10 parallel execution processes, set the user's limit to 11. One process is for the parallel coordinator and the other 10 consist of two sets of query server servers. This would allow one session for the parallel coordinator and 10 sessions for the parallel execution processes.

See Also:



The recommended value for the PARALLEL_MIN_SERVERS parameter is 0 (zero), which is the default.

This parameter is used at startup and lets you specify in a single instance the number of processes to be started and reserved for parallel operations. The syntax is:


The n variable is the number of processes you want to start and reserve for parallel operations.

Setting PARALLEL_MIN_SERVERS balances the startup cost against memory usage. Processes started using PARALLEL_MIN_SERVERS do not exit until the database is shut down. This way, when a query is issued the processes are likely to be available. It is desirable, however, to recycle query server processes periodically since the memory these processes use can become fragmented and cause the high water mark to slowly increase. When you do not set PARALLEL_MIN_SERVERS, processes exit after they are idle for five minutes.


The following discussion of how to tune the large pool also applies to tuning the shared pool, except as noted in "SHARED_POOL_SIZE". You must also increase the value for this memory setting by the amount you determine.

Parallel execution requires additional memory resources in addition to those required by serial SQL execution. Additional memory is used for communication and passing data between query server processes and the query coordinator.

There is no recommended value for LARGE_POOL_SIZE. Instead, Oracle recommends leaving this parameter unset and having Oracle set it for you by setting the PARALLEL_AUTOMATIC_TUNING parameter to TRUE. The exception to this is when the system-assigned value is inadequate for your processing requirements.


When PARALLEL_AUTOMATIC_TUNING is set to TRUE, Oracle allocates parallel execution buffers from the large pool. When this parameter is FALSE, Oracle allocates parallel execution buffers from the shared pool. 

Oracle automatically computes LARGE_POOL_SIZE if PARALLEL_AUTOMATIC_TUNING is TRUE. To manually set a value for LARGE_POOL_SIZE, query the V$SGASTAT view and increase or decrease the value for LARGE_POOL_SIZE depending on your needs. For example, suppose Oracle displays the following error on startup:

ORA-27102: out of memory 
SVR4 Error: 12: Not enough space 

You should reduce the value for LARGE_POOL_SIZE low enough so your database starts. After reducing the value of LARGE_POOL_SIZE, you might see the error:

ORA-04031: unable to allocate 16084 bytes of shared memory ("large 
pool","unknown object","large pool heap","PX msg pool") 

If so, execute the following query to determine why Oracle could not allocate the 16,084 bytes:


Your output should resemble the following:

NAME                       SUM(BYTES) 
-------------------------- ---------- 
PX msg pool                   1474572 
free memory                    562132
3 rows selected.

If you specify LARGE_POOL_SIZE and the amount of memory you need to reserve is bigger than the pool, Oracle does not allocate all the memory it can get. Instead, it leaves some space. When the query runs, Oracle tries to get what it needs. Oracle uses the 560 KB and needs another 16KB when it fails. The error does not report the cumulative amount that is needed. The best way of determining how much more memory is needed is to use the formulas in "Adding Memory for Message Buffers".

To resolve the problem in the current example, increase the value for LARGE_POOL_SIZE. As shown in the sample output, the LARGE_POOL_SIZE is about 2 MB. Depending on the amount of memory available, you could increase the value of LARGE_POOL_SIZE to 4 MB and attempt to start your database. If Oracle continues to display an ORA-4031 message, gradually increase the value for LARGE_POOL_SIZE until startup is successful.

Computing Additional Memory Requirements for Message Buffers

After you determine the initial setting for the large or shared pool, you must calculate additional memory requirements for message buffers and determine how much additional space you need for cursors.

Adding Memory for Message Buffers

You must increase the value for the LARGE_POOL_SIZE or the SHARED_POOL_SIZE parameters to accommodate message buffers. The message buffers allow query server processes to communicate with each other. If you enable automatic parallel tuning, Oracle allocates space for the message buffer from the large pool. Otherwise, Oracle allocates space from the shared pool.

Oracle uses a fixed number of buffers per virtual connection between producer query servers and consumer query servers. Connections increase as the square of the DOP increases. For this reason, the maximum amount of memory used by parallel execution is bound by the highest DOP allowed on your system. You can control this value by using either the PARALLEL_MAX_SERVERS parameter or by using policies and profiles.

To calculate the amount of memory required, use one of the following formulas:

Each instance uses the memory computed by the formula.

The terms are:

Add this amount to your original setting for the large or shared pool. However, before setting a value for either of these memory structures, you must also consider additional memory for cursors, as explained in the following section.

Calculating Additional Memory for Cursors

Parallel execution plans consume more space in the SQL area than serial execution plans. You should regularly monitor shared pool resource use to ensure that the memory used by both messages and cursors can accommodate your system's processing requirements.

Adjusting Memory After Processing Begins

The formulas in this section are just starting points. Whether you are using automated or manual tuning, you should monitor usage on an on-going basis to make sure the size of memory is not too large or too small. To do this, tune the large and shared pools after examining the size of structures in the large pool, using the following query:


Your output should resemble the following:

POOL        NAME                       SUM(BYTES) 
----------- -------------------------- ---------- 
large pool  PX msg pool                  38092812 
large pool  free memory                    299988 
large pool                               38392800 
shared pool Checkpoint queue                38496 
shared pool KGFF heap                        1964 
shared pool KGK heap                         4372 
shared pool KQLS heap                     1134432 
shared pool LRMPD SGA Table                 23856 
shared pool PLS non-lib hp                   2096 
shared pool PX subheap                     186828 
shared pool SYSTEM PARAMETERS               55756 
shared pool State objects                 3907808 
shared pool character set memory            30260 
shared pool db_block_buffers               200000 
shared pool db_block_hash_buckets           33132 
shared pool db_files                       122984 
shared pool db_handles                      52416 
shared pool dictionary cache               198216 
shared pool dlm shared memory             5387924 
shared pool enqueue_resources               29016 
shared pool event statistics per sess      264768 
shared pool fixed allocation callback        1376 
shared pool free memory                  26329104 
shared pool gc_*                            64000 
shared pool latch nowait fails or sle       34944 
shared pool library cache                 2176808 
shared pool log_buffer                      24576 
shared pool log_checkpoint_timeout          24700 
shared pool long op statistics array        30240 
shared pool message pool freequeue         116232 
shared pool miscellaneous                  267624 
shared pool processes                       76896 
shared pool session param values            41424 
shared pool sessions                       170016 
shared pool sql area                      9549116 
shared pool table columns                  148104 
shared pool trace_buffers_per_process     1476320 
shared pool transactions                    18480 
shared pool trigger inform                  24684 
shared pool                              52248968 
41 rows selected.

Evaluate the memory used as shown in your output, and alter the setting for LARGE_POOL_SIZE based on your processing needs.

To obtain more memory usage statistics, execute the following query:


Your output should resemble the following:

STATISTIC                           VALUE 
-------------------                 ----- 
Buffers Allocated                   23225 
Buffers Freed                       23225 
Buffers Current                         0 
Buffers HWM                          3620 
4 Rows selected.

The amount of memory used appears in the Buffers Current and Buffers HWM statistics. Calculate a value in bytes by multiplying the number of buffers by the value for PARALLEL_EXECUTION_MESSAGE_SIZE. Compare the high water mark to the parallel execution message pool size to determine if you allocated too much memory. For example, in the first output, the value for large pool as shown in px msg pool is 38,092,812 or 38 MB. The Buffers HWM from the second output is 3,620, which when multiplied by a parallel execution message size of 4,096 is 14,827,520, or approximately 15 MB. In this case, the high water mark has reached approximately 40 percent of its capacity.


As mentioned earlier, if PARALLEL_AUTOMATIC_TUNING is FALSE, Oracle allocates query server processes from the shared pool. In this case, tune the shared pool as described under the previous heading for large pool, with the following exceptions:

You must also take into account that using parallel execution generates more cursors. Look at statistics in the V$SQLAREA view to determine how often Oracle recompiles cursors. If the cursor hit ratio is poor, increase the size of the pool. This happens only when you have a large number of distinct queries.

You can then monitor the number of buffers used by parallel execution in the same way as explained previously, and compare the shared pool PX msg pool to the current high water mark reported in output from the view V$PX_PROCESS_SYSSTAT.


The recommended value for the PARALLEL_MIN_PERCENT parameter is 0 (zero).

This parameter allows users to wait for an acceptable DOP, depending on the application in use. Setting this parameter to values other than 0 (zero) causes Oracle to return an error when the requested DOP cannot be satisfied by the system at a given time.

For example, if you set PARALLEL_MIN_PERCENT to 50, which translates to 50 percent, and the DOP is reduced by 50 percent or greater because of the adaptive algorithm or because of a resource limitation, then Oracle returns ORA-12827. For example:

SELECT /*+ PARALLEL(e, 8, 1) */ d.deptno, SUM(SAL)
FROM emp e, dept d WHERE e.deptno = d.deptno
GROUP BY d.deptno ORDER BY d.deptno;

Oracle responds with this message:

ORA-12827: insufficient parallel query slaves available


The CLUSTER_DATABASE_INSTANCES parameter should be set to a value that is equal to the number of instances in your Real Application Cluster environment.

The CLUSTER_DATABASE_INSTANCES parameter specifies the number of instances configured in an Oracle Real Application Cluster environment. Oracle uses the value of this parameter to compute values for LARGE_POOL_SIZE when PARALLEL_AUTOMATIC_TUNING is set to TRUE.

Parameters Affecting Resource Consumption

The first group of parameters discussed in this section affects memory and resource consumption for all parallel operations, in particular, for parallel execution. These parameters are:

A second subset of parameters discussed in this section explains parameters affecting parallel DML and DDL.

To control resource consumption, you should configure memory at two levels:

The SGA is typically part of real physical memory. The SGA is static and of fixed size; if you want to change its size, shut down the database, make the change, and restart the database. Oracle allocates the large and shared pools out of the SGA.

See Also:

Oracle9i Database Concepts for further details regarding the SGA 

A large percentage of the memory used in data warehousing operations is more dynamic. This memory comes from process memory, and both the size of process memory and the number of processes can vary greatly. This memory is controlled by the HASH_AREA_SIZE and SORT_AREA_SIZE parameters. Together, these parameters affect the amount of virtual memory used by Oracle.

Process memory comes from virtual memory. Total virtual memory should be somewhat larger than available real memory, which is the physical memory minus the size of the SGA. Virtual memory generally should not exceed twice the size of the physical memory minus the SGA size. If you set virtual memory to a value several times greater than real memory, the paging rate might increase when the machine is overloaded.

As a general rule for memory sizing, each process requires adequate address space for hash joins. A dominant factor in high volume data warehousing operations is the relationship between memory, the number of processes, and the number of hash join operations. Hash joins and large sorts are memory-intensive operations, so you might want to configure fewer processes, each with a greater limit on the amount of memory it can use.


You can improve hash join performance with a relatively high value for the HASH_AREA_SIZE parameter. If you use a relatively high value, you will increase your memory requirements.

Set HASH_AREA_SIZE using one of two approaches. The first approach examines how much memory is available after configuring the SGA and calculating the amount of memory processes the system uses during normal loads.

The total amount of memory that Oracle processes are allowed to use should be divided by the number of processes during the normal load. These processes include parallel execution servers. This number determines the total amount of working memory per process. This amount then needs to be shared among different operations in a given query. For example, setting HASH_AREA_SIZE or SORT_AREA_SIZE to one-half or one-third of this number is reasonable.

Set these parameters to the highest number that does not cause swapping. After setting these parameters as described, you should watch for swapping and free memory. If swapping occurs, decrease the values for these parameters. If a significant amount of free memory remains, you can increase the values for these parameters.

The second approach to setting HASH_AREA_SIZE requires a thorough understanding of the types of hash joins you execute and an understanding of the amount of data you will be querying against. If the queries and query plans you execute are well understood, this approach is reasonable.

The value for HASH_AREA_SIZE should be approximately half of the square root of S, where S is the size in megabytes of the smaller of the inputs to the join operation. In any case, the value for HASH_AREA_SIZE should not be less than 1 MB.

This relationship can be expressed as follows:

Text description of stu81089.gif follows
Text description of the illustration stu81089.gif

For example, if S equals 16 MB, a minimum appropriate value for HASH_AREA_SIZE might be 2 MB, summed over all parallel processes. Thus, if you have two parallel processes, a minimum value for HASH_AREA_SIZE might be 1 MB. A smaller hash area is not advisable.

For a large data warehouse, HASH_AREA_SIZE can range from 8 MB to 32 MB or more. This parameter provides for adequate memory for hash joins. Each process performing a parallel hash join uses an amount of memory equal to HASH_AREA_SIZE.

Hash join performance is more sensitive to HASH_AREA_SIZE than sort performance is to SORT_AREA_SIZE. As with SORT_AREA_SIZE, too large a hash area can cause the system to run out of memory.

The hash area does not cache blocks in the buffer cache; even low values of HASH_AREA_SIZE will not cause this to occur. Too small a setting, however, could adversely affect performance.

HASH_AREA_SIZE is relevant to parallel execution operations and to the query portion of DML or DDL statements.


The recommended values for this parameter range from 256 KB to 4 MB.

This parameter specifies the amount of memory to allocate per query server process for sort operations. If you have a lot of system memory, you can benefit from setting SORT_AREA_SIZE to a large value. This can dramatically increase the performance of sort operations because the entire process is more likely to be performed in memory. However, if memory is a concern for your system, you might want to limit the amount of memory allocated for sort and hash operations.

If the sort area is too small, an excessive amount of I/O is required to merge a large number of sort runs. If the sort area size is smaller than the amount of data to sort, the sort will move to disk, creating sort runs. These must then be merged again using the sort area. If the sort area size is very small, there will be many runs to merge, and multiple passes might be necessary. The amount of I/O increases as SORT_AREA_SIZE decreases.

If the sort area is too large, the operating system paging rate will be excessive. The cumulative sort area adds up quickly because each query server process can allocate this amount of memory for each sort. For such situations, monitor the operating system paging rate to see if too much memory is being requested.

SORT_AREA_SIZE is relevant to parallel execution operations and to the query portion of DML or DDL statements. All CREATE INDEX statements must do some sorting to generate the index. Commands that require sorting include:


The recommended value for PARALLEL_EXECUTION_MESSAGE_SIZE is 4 KB. If PARALLEL_AUTOMATIC_TUNING is TRUE, the default is 4 KB. If PARALLEL_AUTOMATIC_TUNING is FALSE, the default is slightly greater than 2 KB.

The PARALLEL_EXECUTION_MESSAGE_SIZE parameter specifies the upper limit for the size of parallel execution messages. The default value is operating system specific and this value should be adequate for most applications. Larger values for PARALLEL_EXECUTION_MESSAGE_SIZE require larger values for LARGE_POOL_SIZE or SHARED_POOL_SIZE, depending on whether you have enabled parallel automatic tuning.

While you might experience significantly improved response time by increasing the value for PARALLEL_EXECUTION_MESSAGE_SIZE, memory use also drastically increases. For example, if you double the value for PARALLEL_EXECUTION_MESSAGE_SIZE, parallel execution requires a message source pool that is twice as large.

Therefore, if you set PARALLEL_AUTOMATIC_TUNING to FALSE, you must adjust the SHARED_POOL_SIZE to accommodate parallel execution messages. If you have set PARALLEL_AUTOMATIC_TUNING to TRUE, but have set LARGE_POOL_SIZE manually, then you must adjust the LARGE_POOL_SIZE to accommodate parallel execution messages.


The default value for the PARALLEL_BROADCAST_ENABLE parameter is FALSE.

Set PARALLEL_BROADCAST_ENABLE to TRUE if you are joining a very large join result set with a very small result set (size being measured in bytes, rather than number of rows). In this case, the optimizer has the option of broadcasting the small set's rows to each of the query server processes that are processing the rows of the larger set. The result is enhanced performance. If the result set is large, the optimizer will not broadcast, which avoids excessive communication overhead.

Parameters Affecting Resource Consumption for Parallel DML and Parallel DDL

The parameters that affect parallel DML and parallel DDL resource consumption are:

Parallel inserts, updates, and deletes require more resources than serial DML operations. Similarly, PARALLEL CREATE TABLE ... AS SELECT and PARALLEL CREATE INDEX can require more resources. For this reason, you may need to increase the value of several additional initialization parameters. These parameters do not affect resources for queries.


For parallel DML and DDL, each query server process starts a transaction. The parallel coordinator uses the two-phase commit protocol to commit transactions; therefore, the number of transactions being processed increases by the DOP. As a result, you might need to increase the value of the TRANSACTIONS initialization parameter.

The TRANSACTIONS parameter specifies the maximum number of concurrent transactions. The default assumes no parallelism. For example, if you have a DOP of 20, you will have 20 more new server transactions (or 40, if you have two server sets) and 1 coordinator transaction. In this case, you should increase TRANSACTIONS by 21 (or 41) if the transactions are running in the same instance. If you do not set this parameter, Oracle sets it to a value equal to 1.1 x SESSIONS.


The increased number of transactions for parallel DML and DDL requires more rollback segments. For example, one command with a DOP of five uses 5 server transactions distributed among different rollback segments. The rollback segments should belong to tablespaces that have free space. The rollback segments should also be unlimited, or you should specify a high value for the MAXEXTENTS parameter of the STORAGE clause. In this way, the rollback segments can extend and not run out of space.


If a system crashes when there are uncommitted parallel DML or DDL transactions, you can speed up transaction recovery during startup by using the FAST_START_PARALLEL_ROLLBACK parameter.

This parameter controls the DOP used when recovering dead transactions. Dead transactions are transactions that are active before a system crash. By default, the DOP is chosen to be at most two times the value of the CPU_COUNT parameter.

If the default DOP is insufficient, set the parameter to the HIGH. This gives a maximum DOP of at most four times the value of the CPU_COUNT parameter. This feature is available by default.


Check the statistic redo buffer allocation retries in the V$SYSSTAT view. If this value is high relative to redo blocks written, try to increase the LOG_BUFFER size. A common LOG_BUFFER size for a system generating numerous logs is 3 MB to 5 MB. If the number of retries is still high after increasing LOG_BUFFER size, a problem might exist with the disk on which the log files reside. In that case, tune the I/O subsystem to increase the I/O rates for redo. One way of doing this is to use fine-grained striping across multiple disks. For example, use a stripe size of 16 KB. A simpler approach is to isolate redo logs on their own disk.


This parameter specifies the maximum number of DML locks. Its value should equal the total number of locks on all tables referenced by all users. A parallel DML operation's lock and enqueue resource requirement is very different from serial DML. Parallel DML holds many more locks, so you should increase the value of the ENQUEUE_RESOURCES and DML_LOCKS parameters by equal amounts.

Table 21-4 shows the types of locks acquired by coordinator and parallel execution server processes for different types of parallel DML statements. Using this information, you can determine the value required for these parameters.

Table 21-4 Locks Acquired by Parallel DML Statements
Type of statement  Coordinator process acquires:  Each parallel execution server acquires: 

Parallel UPDATE or DELETE into partitioned table; WHERE clause pruned to a subset of partitions or subpartitions 

1 table lock SX

1 partition lock X per pruned (sub)partition  

1 table lock SX

1 partition lock NULL per pruned (sub)partition owned by the query server process

1 partition-wait lock S per pruned (sub)partition owned by the query server process 

Parallel row-migrating UPDATE into partitioned table; WHERE clause pruned to a subset of (sub)partitions 

1 table lock SX 

1 table lock SX  

1 partition X lock per pruned (sub)partition 

1 partition lock NULL per pruned (sub)partition owned by the query server process

1 partition-wait lock S per pruned partition owned by the query server process 

1 partition lock SX for all other (sub)partitions 

1 partition lock SX for all other (sub)partitions 

Parallel UPDATE, MERGE, DELETE, or INSERT into partitioned table 

1 table lock SX

Partition locks X for all (sub)partitions 

1 table lock SX

1 partition lock NULL per (sub)partition owned by the query server process

1 partition-wait lock S per (sub)partition owned by the query server process 

Parallel INSERT into partitioned table; destination table with partition or subpartition clause 

1 table lock SX

1 partition lock X per specified (sub)partition  

1 table lock SX

1 partition lock NULL per specified (sub)partition owned by the query server process

1 partition-wait lock S per specified (sub)partition owned by the query server process 

Parallel INSERT into nonpartitioned table 

1 table lock X  



Table, partition, and partition-wait DML locks all appear as TM locks in the V$LOCK view. 

Consider a table with 600 partitions running with a DOP of 100. Assume all partitions are involved in a parallel UPDATE or DELETE statement with no row-migrations.

The coordinator acquires: 

1 table lock SX 


600 partition locks X 

Total server processes acquire: 

100 table locks SX 


600 partition locks NULL 


600 partition-wait locks S 


This parameter sets the number of resources that can be locked by the lock manager. Parallel DML operations require many more resources than serial DML. Oracle allocates more enqueue resources as needed.

See Also:


Parameters Related to I/O

The parameters that affect I/O are:

These parameters also affect the optimizer which ensures optimal performance for parallel execution I/O operations.


When you perform parallel updates, merges, and deletes, the buffer cache behavior is very similar to any OLTP system running a high volume of updates.


The recommended value for this parameter is 8 KB or 16 KB.

Set the database block size when you create the database. If you are creating a new database, use a large block size such as 8 KB or 16 KB.


The recommended value for this parameter is eight for 8 KB block size, or four for 16 KB block size. The default is 8.

This parameter determines how many database blocks are read with a single operating system READ call. The upper limit for this parameter is platform-dependent. If you set DB_FILE_MULTIBLOCK_READ_COUNT to an excessively high value, your operating system will lower the value to the highest allowable level when you start your database. In this case, each platform uses the highest value possible. Maximum values generally range from 64 KB to 1 MB.


The recommended value for both of these parameters is TRUE.

These parameters enable or disable the operating system's asynchronous I/O facility. They allow query server processes to overlap I/O requests with processing when performing table scans. If the operating system supports asynchronous I/O, leave these parameters at the default value of TRUE.

Figure 21-6 Asynchronous Read

Text description of stu81074.gif follows
Text description of the illustration stu81074.gif

Asynchronous operations are currently supported for parallel table scans, hash joins, sorts, and serial table scans. However, this feature can require operating system specific configuration and may not be supported on all platforms. Check your Oracle operating system-specific documentation.

Monitoring and Diagnosing Parallel Execution Performance

You should do the following tasks when diagnosing parallel execution performance problems:

Performance expectations are based on either prior performance metrics (for example, the length of time a given query took last week or on the previous version of Oracle) or scaling and extrapolating from serial execution times (for example, serial execution took 10 minutes while parallel execution took 5 minutes). If the performance does not meet your expectations, consider the following questions:

After setting your basic goals and answering these questions, you need to consider the following topics:

Is There Regression?

Does parallel execution's actual performance deviate from what you expected? If performance is as you expected, could there be an underlying performance problem? Perhaps you have a desired outcome in mind to which you are comparing the current outcome. Perhaps you have justifiable performance expectations that the system does not achieve. You might have achieved this level of performance or a particular execution plan in the past, but now, with a similar environment and operation, the system is not meeting this goal.

If performance is not as you expected, can you quantify the deviation? For data warehousing operations, the execution plan is key. For critical data warehousing operations, save the EXPLAIN PLAN results. Then, as you analyze and reanalyze the data, upgrade Oracle, and load new data, over time you can compare new execution plans with old plans. Take this approach either proactively or reactively.

Alternatively, you might find that plan performance improves if you use hints. You might want to understand why hints are necessary and determine how to get the optimizer to generate the desired plan without hints. Try increasing the statistical sample size: better statistics can give you a better plan.

See Also:

Oracle9i Database Performance Guide and Reference for information on preserving plans throughout changes to your system, using plan stability and outlines 

Is There a Plan Change?

If there has been a change in the execution plan, determine whether the plan is or should be parallel or serial.

Is There a Parallel Plan?

If the execution plan is or should be parallel, study the EXPLAIN PLAN output. Did you analyze all the tables? Perhaps you need to use hints in a few cases. Verify that the hint provides better performance.

Is There a Serial Plan?

If the execution plan is or should be serial, consider the following strategies:

Is There Parallel Execution?

If the cause of regression cannot be traced to problems in the plan, the problem must be an execution issue. For data warehousing operations, both serial and parallel, consider how the plan uses memory. Check the paging rate and make sure the system is using memory as effectively as possible. Check buffer, sort, and hash area sizing. After you run a query or DML operation, look at the V$SESSTAT, V$PX_SESSTAT, and V$PQ_SYSSTAT views to see the number of server processes used and other information for the session and system.

Is The Workload Evenly Distributed?

If you are using parallel execution, is there unevenness in workload distribution? For example, if there are 10 CPUs and a single user, you can see whether the workload is evenly distributed across CPUs. This can vary over time, with periods that are more or less I/O intensive, but in general each CPU should have roughly the same amount of activity.

The statistics in V$PQ_TQSTAT show rows produced and consumed per parallel execution server. This is a good indication of skew and does not require single user operation.

Operating system statistics show you the per-processor CPU utilization and per-disk I/O activity. Concurrently running tasks make it harder to see what is going on, however. It may be useful to run in single-user mode and check operating system monitors that show system level CPU and I/O activity.

If I/O problems occur, you might need to reorganize your data by spreading it over more devices. If parallel execution problems occur, check to be sure you have followed the recommendation to spread data over at least as many devices as CPUs.

If there is no skew in workload distribution, check for the following conditions:

Monitoring Parallel Execution Performance with Dynamic Performance Views

After your system has run for a few days, monitor parallel execution performance statistics to determine whether your parallel processing is optimal. Do this using any of the views discussed in this section.

In Oracle Real Application Cluster, global versions of the views described in this section aggregate statistics from multiple instances. The global views have names beginning with G, such as GV$FILESTAT for V$FILESTAT, and so on.


The V$PX_SESSION view shows data about query server sessions, groups, sets, and server numbers. It also displays real-time data about the processes working on behalf of parallel execution. This table includes information about the requested DOP and the actual DOP granted to the operation.


The V$PX_SESSTAT view provides a join of the session information from V$PX_SESSION and the V$SESSTAT table. Thus, all session statistics available to a normal session are available for all sessions performed using parallel execution.


The V$PX_PROCESS view contains information about the parallel processes, including status, session ID, process ID, and other information.


The V$PX_PROCESS_SYSSTAT view shows the status of query servers and provides buffer allocation statistics.


The V$PQ_SESSTAT view shows the status of all current server groups in the system such as data about how queries allocate processes and how the multiuser and load balancing algorithms are affecting the default and hinted values. V$PQ_SESSTAT will be obsolete in a future release.

You might need to adjust some parameter settings to improve performance after reviewing data from these views. In this case, refer to the discussion of "Tuning General Parameters for Parallel Execution". Query these views periodically to monitor the progress of long-running parallel operations.


For many dynamic performance views, you must set the parameter TIMED_STATISTICS to TRUE in order for Oracle to collect statistics for each view. You can use the ALTER SYSTEM or ALTER SESSION statements to turn TIMED_STATISTICS on and off. 


The V$FILESTAT view sums read and write requests, the number of blocks, and service times for every datafile in every tablespace. Use V$FILESTAT to diagnose I/O and workload distribution problems.

You can join statistics from V$FILESTAT with statistics in the DBA_DATA_FILES view to group I/O by tablespace or to find the filename for a given file number. Using a ratio analysis, you can determine the percentage of the total tablespace activity used by each file in the tablespace. If you make a practice of putting just one large, heavily accessed object in a tablespace, you can use this technique to identify objects that have a poor physical layout.

You can further diagnose disk space allocation problems using the DBA_EXTENTS view. Ensure that space is allocated evenly from all files in the tablespace. Monitoring V$FILESTAT during a long-running operation and then correlating I/O activity to the EXPLAIN PLAN output is a good way to follow progress.


The V$PARAMETER view lists the name, current value, and default value of all system parameters. In addition, the view shows whether a parameter is a session parameter that you can modify online with an ALTER SYSTEM or ALTER SESSION statement.


As a simple example, consider a hash join between two tables, with a join on a column with only 2 distinct values. At best, this hash function will have one value hash to parallel execution server A and the other to parallel execution server B. A DOP of two is fine, but, if it is 4, then at least 2 parallel execution servers have no work. To discover this type of skew, use a query similar to the following example:

SELECT dfo_number, tq_id, server_type, process, num_rows
ORDER BY dfo_number DESC, tq_id, server_type, process;

The best way to resolve thie problem might be to choose a different join method; a nested loop join might be the best option. Alternatively, if one of the join tables is small relative to the other, it can be broadcast if PARALLEL_BROADCAST_ENABLED=TRUE or a PQ_DISTRIBUTE hint is used.

Now, assume that you have a join key with high cardinality, but one of the values contains most of the data, for example, lava lamp sales by year. The only year that had big sales was 1968, and thus, the parallel execution server for the 1968 records will be overwhelmed. You should use the same corrective actions as described above.

The V$PQ_TQSTAT view provides a detailed report of message traffic at the table queue level. V$PQ_TQSTAT data is valid only when queried from a session that is executing parallel SQL statements. A table queue is the pipeline between query server groups, between the parallel coordinator and a query server group, or between a query server group and the coordinator. Table queues are represented in EXPLAIN PLAN output by the row labels of PARALLEL_TO_PARALLEL, SERIAL_TO_PARALLEL, or PARALLEL_TO_SERIAL, respectively.

V$PQ_TQSTAT has a row for each query server process that reads from or writes to in each table queue. A table queue connecting 10 consumer processes to 10 producer processes has 20 rows in the view. Sum the bytes column and group by TQ_ID, the table queue identifier, to obtain the total number of bytes sent through each table queue. Compare this with the optimizer estimates; large variations might indicate a need to analyze the data using a larger sample.

Compute the variance of bytes grouped by TQ_ID. Large variances indicate workload imbalances. You should investigate large variances to determine whether the producers start out with unequal distributions of data, or whether the distribution itself is skewed. If the data itself is skewed, this might indicate a low cardinality, or low number of distinct values.


The V$PQ_TQSTAT view will be renamed in a future release to V$PX_TQSTSAT


The V$SESSTAT view provides parallel execution statistics for each session. The statistics include total number of queries, DML and DDL statements executed in a session and the total number of intrainstance and interinstance messages exchanged during parallel execution during the session.

V$SYSSTAT provides the same statistics as V$SESSTAT, but for the entire system.

Monitoring Session Statistics

These examples use the dynamic performance views described in "Monitoring Parallel Execution Performance with Dynamic Performance Views".

Use GV$PX_SESSION to determine the configuration of the server group executing in parallel. In this example, sessions 9 is the query coordinator, while sessions 7 and 21 are in the first group, first set. Sessions 18 and 20 are in the first group, second set. The requested and granted DOP for this query is 2, as shown by Oracle's response to the following query:

  DEGREE "Degree", REQ_DEGREE "Req Degree"

Your output should resemble the following:

QCSID      SID        Inst       Group      Set        Degree     Req Degree 
---------- ---------- ---------- ---------- ---------- ---------- ---------- 
         9          9          1 
         9          7          1          1          1          2          2 
         9         21          1          1          1          2          2 
         9         18          1          1          2          2          2 
         9         20          1          1          2          2          2 
5 rows selected.


For a single instance, use SELECT FROM V$PX_SESSION and do not include the column name Instance ID

The processes shown in the output from the previous example using
GV$PX_SESSION collaborate to complete the same task. The next example shows the execution of a join query to determine the progress of these processes in terms of physical reads. Use this query to track any specific statistic:

  NAME "Stat Name", VALUE

Your output should resemble the following:

QCSID  SID   Inst   Group  Set    Stat Name          VALUE      
------ ----- ------ ------ ------ ------------------ ---------- 
     9     9      1               physical reads           3863 
     9     7      1      1      1 physical reads              2 
     9    21      1      1      1 physical reads              2 
     9    18      1      1      2 physical reads              2 
     9    20      1      1      2 physical reads              2 
5 rows selected.

Use the previous type of query to track statistics in V$STATNAME. Repeat this query as often as required to observe the progress of the query server processes.

The next query uses V$PX_PROCESS to check the status of the query servers.


Your output should resemble the following:

---- --------- ------ --------- ------ ------ 
P002 IN USE        16     16955     21   7729 
P003 IN USE        17     16957     20   2921 
P004 AVAILABLE     18     16959              
P005 AVAILABLE     19     16962             
P000 IN USE        12      6999     18   4720 
P001 IN USE        13      7004      7    234 
6 rows selected.

See Also:

Oracle9i Database Reference for more information about these views 

Monitoring System Statistics

The V$SYSSTAT and V$SESSTAT views contain several statistics for monitoring parallel execution. Use these statistics to track the number of parallel queries, DMLs, DDLs, data flow operators (DFOs), and operations. Each query, DML, or DDL can have multiple parallel operations and multiple DFOs.

In addition, statistics also count the number of query operations for which the DOP was reduced, or downgraded, due to either the adaptive multiuser algorithm or the depletion of available parallel execution servers.

Finally, statistics in these views also count the number of messages sent on behalf of parallel execution. The following syntax is an example of how to display these statistics:


Your output should resemble the following:

NAME                                               VALUE      
-------------------------------------------------- ---------- 
queries parallelized                                      347 
DML statements parallelized                                 0 
DDL statements parallelized                                 0 
DFO trees parallelized                                    463 
Parallel operations not downgraded                         28 
Parallel operations downgraded to serial                   31 
Parallel operations downgraded 75 to 99 pct               252 
Parallel operations downgraded 50 to 75 pct               128 
Parallel operations downgraded 25 to 50 pct                43 
Parallel operations downgraded 1 to 25 pct                 12 
PX local messages sent                                  74548 
PX local messages recv'd                                74128 
PX remote messages sent                                     0 
PX remote messages recv'd                                   0 

14 rows selected.

Monitoring Operating System Statistics

There is considerable overlap between information available in Oracle and information available though operating system utilities (such as sar and vmstat on UNIX-based systems). Operating systems provide performance statistics on I/O, communication, CPU, memory and paging, scheduling, and synchronization primitives. The V$SESSTAT view provides the major categories of operating system statistics as well.

Typically, operating system information about I/O devices and semaphore operations is harder to map back to database objects and operations than is Oracle information. However, some operating systems have good visualization tools and efficient means of collecting the data.

Operating system information about CPU and memory usage is very important for assessing performance. Probably the most important statistic is CPU usage. The goal of low-level performance tuning is to become CPU bound on all CPUs. Once this is achieved, you can work at the SQL level to find an alternate plan that might be more I/O intensive but use less CPU.

Operating system memory and paging information is valuable for fine tuning the many system parameters that control how memory is divided among memory-intensive warehouse subsystems like parallel communication, sort, and hash join.

Affinity and Parallel Operations


The features described in this section are available only if you have purchased Oracle9i Enterprise Edition with the Real Application Cluster Option. See Oracle9i Database New Features for information about the features and options available with Oracle9i Enterprise Edition.  

In a shared-disk cluster or MPP configuration, an instance of the Oracle Real Application Cluster is said to have affinity for a device if the device is directly accessed from the processors on which the instance is running. Similarly, an instance has affinity for a file if it has affinity for the devices on which the file is stored.

Determination of affinity may involve arbitrary determinations for files that are striped across multiple devices. Somewhat arbitrarily, an instance is said to have affinity for a tablespace (or a partition of a table or index within a tablespace) if the instance has affinity for the first file in the tablespace.

Oracle considers affinity when allocating work to parallel execution servers. The use of affinity for parallel execution of SQL statements is transparent to users.

Affinity and Parallel Queries

Affinity in parallel queries increases the speed of scanning data from disk by doing the scans on a processor that is near the data. This can provide a substantial performance increase for machines that do not naturally support shared disks.

The most common use of affinity is for a table or index partition to be stored in one file on one device. This configuration provides the highest availability by limiting the damage done by a device failure and makes the best use of partition-parallel index scans.

DSS customers might prefer to stripe table partitions over multiple devices (probably a subset of the total number of devices). This configuration allows some queries to prune the total amount of data being accessed using partitioning criteria and still obtain parallelism through rowid-range parallel table (partition) scans. If the devices are configured as a RAID, availability can still be very good. Even when used for DSS, indexes should probably be partitioned on individual devices.

Other configurations (for example, multiple partitions in one file striped over multiple devices) will yield correct query results, but you may need to use hints or explicitly set object attributes to select the correct DOP.

Affinity and Parallel DML

For parallel DML (inserts, updates, and deletes), affinity enhancements improve cache performance by routing the DML operation to the node that has affinity for the partition.

Affinity determines how to distribute the work among the set of instances or parallel execution servers to perform the DML operation in parallel. Affinity can improve performance of queries in several ways:

For partitioned tables and indexes, partition-to-node affinity information determines process allocation and work assignment. For shared-nothing MPP systems, the Oracle Real Application Cluster tries to assign partitions to instances, taking the disk affinity of the partitions into account. For shared-disk MPP and cluster systems, partitions are assigned to instances in a round-robin manner.

Affinity is only available for parallel DML when running in an Oracle Real Application Cluster configuration. Affinity information which persists across statements improves buffer cache hit ratios and reduces block pings between instances.

See Also:

Oracle9i Real Application Clusters Concepts  

Miscellaneous Parallel Execution Tuning Tips

This section contains some ideas for improving performance in a parallel execution environment and includes the following topics:

Formula for Memory, Users, and Parallel Execution Server Processes

A key to the tuning of parallel operations is an understanding of the relationship between memory requirements, the number of users (processes) a system can support, and the maximum number of parallel execution servers. The goal is to obtain the dramatic performance enhancements made possible by parallelizing certain operations and by using hash joins rather than sort merge joins. You must balance this performance goal with the need to support multiple users.

In considering the maximum number of processes a system can support, it is useful to divide the processes into three classes, based on their memory requirements. Table 21-5 defines high, medium, and low memory processes.

Analyze the maximum number of processes that can fit in memory by using the following formula:

Figure 21-7 Formula for Memory/Users/Server Relationship

Text description of stu81093.gif follows
Text description of the illustration stu81093.gif
Table 21-5 Memory Requirements for Three Classes of Process
Class  Description 

Low Memory Processes:

100 KB to 1 MB 

Low memory processes include table scans, index lookups, index nested loop joins; single-row aggregates (such as sum or average with no GROUP BY clauses, or very few groups), and sorts that return only a few rows; and direct loading.

This class of data warehousing process is similar to OLTP processes in the amount of memory required. Process memory may be as low as a few hundred kilobytes of fixed overhead. You could potentially support thousands of users performing this kind of operation. You can take this requirement even lower and support even more users by using the shared server.  

Medium Memory Processes:

1 MB to 10 MB 

Medium Memory Processes include large sorts, sort merge join, GROUP BY or ORDER BY operations returning a large number of rows, parallel insert operations that involve index maintenance, and index creation.

These processes require the fixed overhead needed by a low memory process, plus one or more sort areas, depending on the operation. For example, a typical sort merge join would sort both its inputs--resulting in two sort areas. GROUP BY or ORDER BY operations with many groups or rows also require sort areas.

Look at the EXPLAIN PLAN output for the operation to identify the number and type of joins, and the number and type of sorts. Optimizer statistics in the plan show the size of the operations. When planning joins, remember that you have several choices. The EXPLAIN PLAN statement is described in Oracle9i Database Performance Guide and Reference.  

High Memory Processes:

10 MB to 100 MB 

High memory processes include one or more hash joins, or a combination of one or more hash joins with large sorts.

These processes require the fixed overhead needed by a low memory process, plus hash area. The hash area size required might range from 8 MB to 32 MB, and you might need two of them. If you are performing two or more serial hash joins, each process uses 2 hash areas. In a parallel operation, each parallel execution server does at most 1 hash join at a time; therefore, you would need one hash area size per server.

In summary, the amount of hash join memory for an operation equals the DOP multiplied by hash area size, multiplied by the lesser of either 2 or the number of hash joins in the operation. 


The process memory requirements of parallel DML and parallel DDL operations also depend upon the query portion of the statement. 

The formula to calculate the maximum number of processes your system can support (referred to here as MAX_PROCESSES) is:

Figure 21-8 Formula for Calculating the Maximum Number of Processes

Text description of stu81094.gif follows
Text description of the illustration stu81094.gif

In general, if the value for MAX_PROCESSES is much larger than the number of users, consider using parallel operations. If MAX_PROCESSES is considerably less than the number of users, consider other alternatives, such as those described in the following section on "Balancing the Formula".

Setting Buffer Pool Size for Parallel Operations

With the exception of parallel update and delete, parallel operations do not generally benefit from larger buffer pool sizes. Parallel update and delete benefit from a larger buffer pool when they update indexes. This is because index updates have a random access pattern, and I/O activity can be reduced if an entire index or its interior nodes can be kept in the buffer pool. Other parallel operations can benefit only if you increase the size of the buffer pool and thereby accommodate the inner table or index for a nested loop join.

Balancing the Formula

Use the following technique to balance the formula provided in Figure 21-7.

You can permit the potential workload to exceed the limits recommended in the formula. Total memory required, minus the SGA size, can be multiplied by a factor of 1.2, to allow for 20 percent oversubscription. Thus, if you have 1 GB of memory, you may be able to support 1.2 GB of demand: the other 20 percent could be handled by the paging system.

You must, however, verify that a particular degree of oversubscription is viable on your system. Do this by monitoring the paging rate and making sure you are not spending more than a very small percent of the time waiting for the paging subsystem. Your system might perform acceptably even if oversubscribed by 60 percent, if, on average, not all of the processes are performing hash joins concurrently. Users might then try to access more than the available memory, so you must continually monitor paging activity in such a situation. If paging dramatically increases, consider other alternatives.

On average, no more than 5 percent of the time should be spent simply waiting in the operating system on page faults. A wait time of more than 5 percent indicates your paging subsystem is I/O-bound. Use your operating system monitor to check wait time.

If wait time for paging devices exceeds 5 percent, you can reduce memory requirements in one of the following ways:

If the wait time indicates an I/O bottleneck in the paging subsystem, you could resolve this by striping.

Parallel Execution Space Management Issues

This section describes space management issues that occur when using parallel execution. These issues are:

These problems become particularly important for parallel operations in an Oracle Real Application Cluster environment. The more nodes that are involved, the more critical tuning becomes.

If you can implement locally managed tablespaces, you can avoid these issues altogether.

See Also:

Oracle9i Database Administrator's Guide for more information about locally managed tablespaces 

ST Enqueue for Sorts and Temporary Data

Every space management transaction in the database (such as creation of temporary segments in PARALLEL CREATE TABLE, or parallel direct-path INSERTs of non-partitioned tables) is controlled by a single space transaction enqueue. A high transaction rate, for example, more than two or three transactions per minute, on ST enqueues can result in poor scalability on Oracle Real Application Clusters with many nodes, or a timeout waiting for space management resources. Use the V$ROWCACHE and V$LIBRARYCACHE views to locate this type of contention.

Try to minimize the number of space management transactions, in particular:

To optimize space management for sorts, use locally managed tablespaces for temporary data. This is particularly beneficial on Oracle Real Application Clusters. You can monitor this using V$SORT_SEGMENT.

External Fragmentation

External fragmentation is a concern for parallel load, direct-path INSERT, and PARALLEL CREATE TABLE ... AS SELECT. Memory tends to become fragmented as extents are allocated and data is inserted and deleted. This can result in a fair amount of free space that is unusable because it consists of small, noncontiguous chunks of memory.

To reduce external fragmentation on partitioned tables, set all extents to the same size. Set the value for NEXT equal to the value for INITIAL, and set PERCENT_INCREASE to 0. The system can handle this well with a few thousand extents per object. Therefore, set MAXEXTENTS to, for example, 1,000 to 3,000. Never attempt to use a value for MAXEXTENTS in excess of 10,000. For tables that are not partitioned, the initial extent should be small. In general, the smaller the extent, the better utilization of space. The trade-off is that your system will spend more time getting new extents.

Free Space

Schema objects from an OLTP database are often duplicated in the data warehouse. However, these objects will probably not be subject to the same mix of insert versus update activity in the data warehouse as in the OLTP environment. The PCTFREE storage clause can be reduced in the data warehouse environment if the data is loaded and then very seldomly updated. The default value is 10, which reserves 10 percent of each block that is loaded for future updates. An OLTP environment may use higher values, so care should be taken when importing schema DDL from OLTP systems.

Overriding the Default Degree of Parallelism

The default DOP is appropriate for reducing response time while guaranteeing use of CPU and I/O resources for any parallel operations.

If it is memory-bound, or if several concurrent parallel operations are running, you might want to decrease the default DOP.

Oracle uses the default DOP for tables that have PARALLEL attributed to them in the data dictionary or that have the PARALLEL hint specified. If a table does not have parallelism attributed to it, or has NOPARALLEL (the default) attributed to it, and parallelism is not being forced through ALTER SESSION FORCE PARALLEL, then that table is never scanned in parallel. This override occurs regardless of the default DOP indicated by the number of CPUs, instances, and devices storing that table.

You can adjust the DOP by using the following guidelines:

Rewriting SQL Statements

The most important issue for parallel execution is ensuring that all parts of the query plan that process a substantial amount of data execute in parallel. Use EXPLAIN PLAN to verify that all plan steps have an OTHER_TAG of PARALLEL_TO_PARALLEL, PARALLEL_TO_SERIAL, PARALLEL_COMBINED_WITH_PARENT, or PARALLEL_COMBINED_WITH_CHILD. Any other keyword (or null) indicates serial execution and a possible bottleneck.

You can also use the utlxplp.sql script to present the EXPLAIN PLAN output with all relevant parallel information.

See Also:

Oracle9i Database Performance Guide and Reference for more information on using EXPLAIN PLAN 

You can increase the optimizer's ability to generate parallel plans converting subqueries, especially correlated subqueries, into joins. Oracle can parallelize joins more efficiently than subqueries. This also applies to updates.

See Also:

"Updating the Table in Parallel" 

Creating and Populating Tables in Parallel

Oracle cannot return results to a user process in parallel. If a query returns a large number of rows, execution of the query might indeed be faster. However, the user process can only receive the rows serially. To optimize parallel execution performance for queries that retrieve large result sets, use PARALLEL CREATE TABLE ... AS SELECT or direct-path INSERT to store the result set in the database. At a later time, users can view the result set serially.


Performing the SELECT in parallel does not influence the CREATE statement. If the CREATE is parallel, however, the optimizer tries to make the SELECT run in parallel also. 

When combined with the NOLOGGING option, the parallel version of CREATE TABLE ... AS SELECT provides a very efficient intermediate table facility, for example:

  AS SELECT dim_1, dim_2 ..., SUM (meas_1)
     FROM facts
  GROUP BY dim_1, dim_2;

These tables can also be incrementally loaded with parallel INSERT. You can take advantage of intermediate tables using the following techniques:

Creating Temporary Tablespaces for Parallel Sort and Hash Join

For optimal space management performance, use dedicated temporary tablespaces. As with the TStemp tablespace, first add a single datafile and later add the remainder in parallel, as in this example:


Size of Temporary Extents

Temporary extents are all the same size because the server ignores the PCTINCREASE and INITIAL settings and only uses the NEXT setting for temporary extents. This helps avoid fragmentation.

As a general rule, temporary extents should be smaller than permanent extents because there are more demands for temporary space, and parallel processes or other operations running concurrently must share the temporary tablespace. Normally, temporary extents should be in the range of 1 MB to 10 MB. Once you allocate an extent, it is available for the duration of an operation. If you allocate a large extent but only need to use a small amount of space, the unused space in the extent is tied up.

At the same time, temporary extents should be large enough that processes do not have to wait for space. Temporary tablespaces use less overhead than permanent tablespaces when allocating and freeing a new extent. However, obtaining a new temporary extent still requires the overhead of acquiring a latch and searching through the SGA structures, as well as SGA space consumption for the sort extent pool. Also, if extents are too small, SMON might take a long time dropping old sort segments when new instances start up.

Operating System Striping of Temporary Tablespaces

Operating system striping is an alternative technique you can use with temporary tablespaces. Media recovery, however, offers subtle challenges for large temporary tablespaces. It does not make sense to mirror, use RAID, or back up a temporary tablespace. If you lose a disk in an operating system striped temporary space, you will probably have to drop and re-create the tablespace. This could take several hours for the 120 GB example. With Oracle striping, simply remove the defective disk from the tablespace. For example, if /dev/D50 fails, enter:


Because the dictionary sees the size as 1 KB, which is less than the extent size, the corrupt file is not accessed. Eventually, you might wish to re-create the tablespace.

To make your temporary tablespace available for use, enter:


See Also:

For MPP systems, see your platform-specific documentation regarding the advisability of disabling disk affinity when using operating system striping 

Executing Parallel SQL Statements

After analyzing your tables and indexes, you should see performance improvements based on the DOP used.

As a general process, you should start with simple parallel operations and evaluate their total I/O throughput with a SELECT COUNT(*) FROM facts statement. Then, evaluate total CPU power by adding a complex WHERE clause to the statement. An I/O imbalance might suggest a better physical database layout. After you understand how simple scans work, add aggregation, joins, and other operations that reflect individual aspects of the overall workload. In particular, you should look for bottlenecks.

Besides query performance, you should also monitor parallel load, parallel index creation, and parallel DML, and look for good utilization of I/O and CPU resources.

Using EXPLAIN PLAN to Show Parallel Operations Plans

Use the EXPLAIN PLAN statement to see the execution plans for parallel queries. EXPLAIN PLAN output shows optimizer information in the COST, BYTES, and CARDINALITY columns. You can also use the utlxplp.sql script to present the EXPLAIN PLAN output with all relevant parallel information.

See Also:

Oracle9i Database Performance Guide and Reference for more information on using EXPLAIN PLAN 

There are several ways to optimize the parallel execution of join statements. You can alter system configuration, adjust parameters as discussed earlier in this chapter, or use hints, such as the DISTRIBUTION hint.

The key points when using EXPLAIN PLAN are to:

Additional Considerations for Parallel DML

When you want to refresh your data warehouse database using parallel insert, update, or delete on a data warehouse, there are additional issues to consider when designing the physical database. These considerations do not affect parallel execution operations. These issues are:

PDML and Direct-Path Restrictions

If a parallel restriction is violated, the operation is simply performed serially. If a direct-path INSERT restriction is violated, then the APPEND hint is ignored and a conventional insert is performed. No error message is returned.

Limitation on the Degree of Parallelism

If you are performing parallel UPDATE, MERGE, or DELETE operations, the DOP is equal to or less than the number of partitions in the table.

Using Local and Global Striping

Parallel updates and deletes work only on partitioned tables. They can generate a high number of random I/O requests during index maintenance.

For local index maintenance, local striping is most efficient in reducing I/O contention because one server process only goes to its own set of disks and disk controllers. Local striping also increases availability in the event of one disk failing.

For global index maintenance (partitioned or nonpartitioned), globally striping the index across many disks and disk controllers is the best way to distribute the number of I/Os.


If you have global indexes, a global index segment and global index blocks are shared by server processes of the same parallel DML statement. Even if the operations are not performed against the same row, the server processes can share the same index blocks. Each server transaction needs one transaction entry in the index block header before it can make changes to a block. Therefore, in the CREATE INDEX or ALTER INDEX statements, you should set INITRANS, the initial number of transactions allocated within each data block, to a large value, such as the maximum DOP against this index. Leave MAXTRANS, the maximum number of concurrent transactions that can update a data block, at its default value, which is the maximum your system can support. This value should not exceed 255.

If you run a DOP of 10 against a table with a global index, all 10 server processes might attempt to change the same global index block. For this reason, you must set MAXTRANS to at least 10 so all server processes can make the change at the same time. If MAXTRANS is not large enough, the parallel DML operation fails.

Limitation on Available Number of Transaction Free Lists for Segments in Dictionary-Managed Tablespaces

Once a segment has been created, the number of process and transaction free lists is fixed and cannot be altered. If you specify a large number of process free lists in the segment header, you might find that this limits the number of transaction free lists that are available. You can abate this limitation the next time you re-create the segment header by decreasing the number of process free lists; this leaves more room for transaction free lists in the segment header.

For UPDATE and DELETE operations, each server process can require its own transaction free list. The parallel DML DOP is thus effectively limited by the smallest number of transaction free lists available on any of the global indexes the DML statement must maintain. For example, if you have two global indexes, one with 50 transaction free lists and one with 30 transaction free lists, the DOP is limited to 30.

The FREELISTS parameter of the STORAGE clause is used to set the number of process free lists. By default, no process free lists are created.

The default number of transaction free lists depends on the block size. For example, if the number of process free lists is not set explicitly, a 4 KB block has about 80 transaction free lists by default. The minimum number of transaction free lists is 25.

Using Multiple Archivers

Parallel DDL and parallel DML operations can generate a large amount of redo logs. A single ARCH process to archive these redo logs might not be able to keep up. To avoid this problem, you can spawn multiple archiver processes. This can be done manually or by using a job queue.

Database Writer Process (DBWn) Workload

Parallel DML operations dirty a large number of data, index, and undo blocks in the buffer cache during a short period of time. For example, suppose you see a high number of free_buffer_waits after querying the V$SYSTEM_EVENT view, as in the following syntax:


In this case, you should consider increasing the DBWn processes. If there are no waits for free buffers, the query will not return any rows.


The [NO]LOGGING clause applies to tables, partitions, tablespaces, and indexes. Virtually no log is generated for certain operations (such as direct-path INSERT) if the NOLOGGING clause is used. The NOLOGGING attribute is not specified at the INSERT statement level but is instead specified when using the ALTER or CREATE statement for a table, partition, index, or tablespace.

When a table or index has NOLOGGING set, neither parallel nor serial direct-path INSERT operations generate undo or redo logs. Processes running with the NOLOGGING option set run faster because no redo is generated. However, after a NOLOGGING operation against a table, partition, or index, if a media failure occurs before a backup is taken, then all tables, partitions, and indexes that have been modified might be corrupted.


Direct-path INSERT operations (except for dictionary updates) never generate undo logs. The NOLOGGING attribute does not affect undo, only redo. To be precise, NOLOGGING allows the direct-path INSERT operation to generate a negligible amount of redo (range-invalidation redo, as opposed to full image redo). 

For backward compatibility, [UN]RECOVERABLE is still supported as an alternate keyword with the CREATE TABLE statement. This alternate keyword might not be supported, however, in future releases.

At the tablespace level, the logging clause specifies the default logging attribute for all tables, indexes, and partitions created in the tablespace. When an existing tablespace logging attribute is changed by the ALTER TABLESPACE statement, then all tables, indexes, and partitions created after the ALTER statement will have the new logging attribute; existing ones will not change their logging attributes. The tablespace-level logging attribute can be overridden by the specifications at the table, index, or partition level.

The default logging attribute is LOGGING. However, if you have put the database in NOARCHIVELOG mode, by issuing ALTER DATABASE NOARCHIVELOG, then all operations that can be done without logging will not generate logs, regardless of the specified logging attribute.

Creating Indexes in Parallel

Multiple processes can work together simultaneously to create an index. By dividing the work necessary to create an index among multiple server processes, Oracle can create the index more quickly than if a single server process created the index sequentially.

Parallel index creation works in much the same way as a table scan with an ORDER BY clause. The table is randomly sampled and a set of index keys is found that equally divides the index into the same number of pieces as the DOP. A first set of query processes scans the table, extracts key-rowid pairs, and sends each pair to a process in a second set of query processes based on key. Each process in the second set sorts the keys and builds an index in the usual fashion. After all index pieces are built, the parallel coordinator simply concatenates the pieces (which are ordered) to form the final index.

Parallel local index creation uses a single server set. Each server process in the set is assigned a table partition to scan and for which to build an index partition. Because half as many server processes are used for a given DOP, parallel local index creation can be run with a higher DOP.

You can optionally specify that no redo and undo logging should occur during index creation. This can significantly improve performance but temporarily renders the index unrecoverable. Recoverability is restored after the new index is backed up. If your application can tolerate a window where recovery of the index requires it to be re-created, then you should consider using the NOLOGGING clause.

The PARALLEL clause in the CREATE INDEX statement is the only way in which you can specify the DOP for creating the index. If the DOP is not specified in the parallel clause of CREATE INDEX, then the number of CPUs is used as the DOP. If there is no PARALLEL clause, index creation is done serially.


When creating an index in parallel, the STORAGE clause refers to the storage of each of the subindexes created by the query server processes. Therefore, an index created with an INITIAL of 5 MB and a DOP of 12 consumes at least 60 MB of storage during index creation because each process starts with an extent of 5 MB. When the query coordinator process combines the sorted subindexes, some of the extents might be trimmed, and the resulting index might be smaller than the requested 60 MB. 

When you add or enable a UNIQUE or PRIMARY KEY constraint on a table, you cannot automatically create the required index in parallel. Instead, manually create an index on the desired columns, using the CREATE INDEX statement and an appropriate PARALLEL clause, and then add or enable the constraint. Oracle then uses the existing index when enabling or adding the constraint.

Multiple constraints on the same table can be enabled concurrently and in parallel if all the constraints are already in the ENABLE NOVALIDATE state. In the following example, the ALTER TABLE ... ENABLE CONSTRAINT statement performs the table scan that checks the constraint in parallel:

INSERT INTO a values (1);

See Also:

Oracle9i Database Concepts for more information on how extents are allocated when using parallel execution 

Parallel DML Tips

This section provides an overview of parallel DML functionality. The topics covered include:


Oracle INSERT functionality can be summarized as follows:

Table 21-6 Summary of INSERT Features
Insert Type  Parallel  Serial  NOLOGGING 





Direct-path INSERT (Append) 

Yes: requires:


  • Table PARALLEL attribute or PARALLEL hint

  • APPEND hint (optional)


Yes: requires:

  • APPEND hint


Yes: requires:

  • NOLOGGING attribute set for table or partition


If parallel DML is enabled and there is a PARALLEL hint or PARALLEL attribute set for the table in the data dictionary, then inserts are parallel and appended, unless a restriction applies. If either the PARALLEL hint or PARALLEL attribute is missing, the insert is performed serially.

Direct-path INSERT

Append mode is the default during a parallel insert: data is always inserted into a new block which is allocated to the table. Therefore the APPEND hint is optional. You should use append mode to increase the speed of INSERT operations, but not when space utilization needs to be optimized. You can use NOAPPEND to override append mode.

The APPEND hint applies to both serial and parallel insert: even serial inserts are faster if you use this hint. APPEND, however, does require more space and locking overhead.

You can use NOLOGGING with APPEND to make the process even faster. NOLOGGING means that no redo log is generated for the operation. NOLOGGING is never the default; use it when you wish to optimize performance. It should not normally be used when recovery is needed for the table or partition. If recovery is needed, be sure to take a backup immediately after the operation. Use the ALTER TABLE [NO]LOGGING statement to set the appropriate value.


When the table or partition has the PARALLEL attribute in the data dictionary, that attribute setting is used to determine parallelism of INSERT, UPDATE, and DELETE statements as well as queries. An explicit PARALLEL hint for a table in a statement overrides the effect of the PARALLEL attribute in the data dictionary.

You can use the NOPARALLEL hint to override a PARALLEL attribute for the table in the data dictionary. In general, hints take precedence over attributes.

DML operations are considered for parallelization only if the session is in a PARALLEL DML enabled mode. (Use ALTER SESSION ENABLE PARALLEL DML to enter this mode.) The mode does not affect parallelization of queries or of the query portions of a DML statement.

See Also:

Oracle9i Database Concepts for more information on parallel INSERT, UPDATE and DELETE 

Parallelizing INSERT ... SELECT

In the INSERT ... SELECT statement you can specify a PARALLEL hint after the INSERT keyword, in addition to the hint after the SELECT keyword. The PARALLEL hint after the INSERT keyword applies to the INSERT operation only, and the PARALLEL hint after the SELECT keyword applies to the SELECT operation only. Thus, parallelism of the INSERT and SELECT operations are independent of each other. If one operation cannot be performed in parallel, it has no effect on whether the other operation can be performed in parallel.

The ability to parallelize inserts causes a change in existing behavior if the user has explicitly enabled the session for parallel DML and if the table in question has a PARALLEL attribute set in the data dictionary entry. In that case, existing INSERT ... SELECT statements that have the select operation parallelized can also have their insert operation parallelized.

If you query multiple tables, you can specify multiple SELECT PARALLEL hints and multiple PARALLEL attributes.

Example 21-7 Parallelizing INSERT ... SELECT Example

Add the new employees who were hired after the acquisition of ACME.


The APPEND keyword is not required in this example because it is implied by the PARALLEL hint.

Parallelizing UPDATE and DELETE

The PARALLEL hint (placed immediately after the UPDATE or DELETE keyword) applies not only to the underlying scan operation, but also to the UPDATE or DELETE operation. Alternatively, you can specify UPDATE or DELETE parallelism in the PARALLEL clause specified in the definition of the table to be modified.

If you have explicitly enabled parallel DML for the session or transaction, UPDATE or DELETE statements that have their query operation parallelized can also have their UPDATE or DELETE operation parallelized. Any subqueries or updatable views in the statement can have their own separate PARALLEL hints or clauses, but these parallel directives do not affect the decision to parallelize the update or delete. If these operations cannot be performed in parallel, it has no effect on whether the UPDATE or DELETE portion can be performed in parallel.

Tables must be partitioned in order to support parallel UPDATE and DELETE.

Example 21-8 Parallelizing UPDATE and DELETE: Example 1

Give a 10 percent salary raise to all clerks in Dallas.


The PARALLEL hint is applied to the UPDATE operation as well as to the scan.

Example 21-9 Parallelizing UPDATE and DELETE: Example 2

Remove all products in the grocery category because the grocery business line was recently spun off into a separate company.


Again, the parallelism is applied to the scan as well as UPDATE operation on table emp.

Incremental Data Loading in Parallel

Parallel DML combined with the updatable join views facility provides an efficient solution for refreshing the tables of a data warehouse system. To refresh tables is to update them with the differential data generated from the OLTP production system.

In the following example, assume that you want to refresh a table named customer that has columns c_key, c_name, and c_addr. The differential data contains either new rows or rows that have been updated since the last refresh of the data warehouse. In this example, the updated data is shipped from the production system to the data warehouse system by means of ASCII files. These files must be loaded into a temporary table, named diff_customer, before starting the refresh process. You can use SQL*Loader with both the parallel and direct options to efficiently perform this task.

Once diff_customer is loaded, the refresh process can be started. It can be performed in two phases or with a newer technique:

Updating the Table in Parallel

A straightforward SQL implementation of the update uses subqueries:

UPDATE customer 
SET(c_name, c_addr) = 
  (SELECT c_name, c_addr
   FROM diff_customer
   WHERE diff_customer.c_key = customer.c_key)
   WHERE c_key IN(SELECT c_key FROM diff_customer);

Unfortunately, the two subqueries in this statement affect performance.

An alternative is to rewrite this query using updatable join views. To do this, you must first add a primary key constraint to the diff_customer table to ensure that the modified columns map to a key-preserved table:

CREATE UNIQUE INDEX diff_pkey_ind ON diff_customer(c_key)
ALTER TABLE diff_customer ADD PRIMARY KEY (c_key);

You can then update the customer table with the following SQL statement:

UPDATE /*+ PARALLEL(cust_joinview) */ 
(SELECT /*+ PARALLEL(customer) PARALLEL(diff_customer) */ 
CUSTOMER.c_name as c_name
CUSTOMER.c_addr as c_addr,
diff_customer.c_name as c_newname, diff_customer.c_addr as c_newaddr
   WHERE customer.c_key = diff_customer.c_key) cust_joinview
   SET c_name = c_newname, c_addr = c_newaddr;

The base scans feeding the join view cust_joinview are done in parallel. You can then parallelize the update to further improve performance, but only if the customer table is partitioned.

See Also:


Inserting the New Rows into the Table in Parallel

The last phase of the refresh process consists of inserting the new rows from the diff_customer temporary table to the customer table. Unlike the update case, you cannot avoid having a subquery in the INSERT statement:

INSERT /*+PARALLEL(customer)*/ INTO customer
SELECT * FROM diff_customer
WHERE diff_customer.c_key NOT IN (SELECT /*+ HASH_AJ */ KEY FROM customer);

However, youcan guarantee that the subquery is transformed into an anti-hash join by using the HASH_AJ hint. Doing so enables you to use parallel INSERT to execute the preceding statement efficiently. Parallel INSERT is applicable even if the table is not partitioned.

Merging in Parallel

In Oracle9i, you combine the previous updates and inserts into one statement, commonly known as an upsert or merge. The following statement achieves the same result as all of the statements in "Updating the Table in Parallel" and "Inserting the New Rows into the Table in Parallel":

MERGE INTO customer USING diff_customer
ON (diff_customer.c_key = customer.c_key)
  UPDATE SET (c_name, c_addr) = (SELECT c_name, c_addr 
  FROM diff_customer
  WHERE diff_customer.c_key = customer.c_key) 
  INSERT VALUES (diff_customer.c_key,diff_customer.c_data);

Using Hints with Cost-Based Optimization

Cost-based optimization is a sophisticated approach to finding the best execution plan for SQL statements. Oracle automatically uses cost-based optimization with parallel execution.


You must use the DBMS_STATS package to gather current statistics for cost-based optimization. In particular, tables used in parallel should always be analyzed. Always keep your statistics current by using the DBMS_STATS package. 

Use discretion in employing hints. If used, hints should come as a final step in tuning and only when they demonstrate a necessary and significant performance advantage. In such cases, begin with the execution plan recommended by cost-based optimization, and go on to test the effect of hints only after you have quantified your performance expectations. Remember that hints are powerful. If you use them and the underlying data changes, you might need to change the hints. Otherwise, the effectiveness of your execution plans might deteriorate.

Always use cost-based optimization unless you have an existing application that has been hand-tuned for rule-based optimization. If you must use rule-based optimization, rewriting a SQL statement can greatly improve application performance.


If any table in a query has a DOP greater than one (including the default DOP), Oracle uses the cost-based optimizer for that query, even if OPTIMIZER_MODE is set to RULE or if there is a RULE hint in the query itself. 

Go to previous page Go to next page
Copyright © 1996-2001, Oracle Corporation.

All Rights Reserved.
Go To Documentation Library
Go To Product List
Book List
Go To Table Of Contents
Go To Index

Master Index