8 Using Parallel Execution

Parallel execution is the ability to apply multiple CPU and I/O resources to the execution of a single SQL statement by using multiple processes.

This chapter explains how parallel execution works, and how to control, manage, and monitor parallel execution in the Oracle Database.

This chapter contains the following sections:

See Also:

http://www.oracle.com/technetwork/database/database-technologies/parallel-execution/overview/index.html for information about the parallel execution with Oracle Database

8.1 Parallel Execution Concepts

Parallel execution enables the application of multiple CPU and I/O resources to the execution of a single SQL statement.

Parallel execution dramatically reduces response time for data-intensive operations on large databases typically associated with a decision support system (DSS) and data warehouses. You can also implement parallel execution on an online transaction processing (OLTP) system for batch processing or schema maintenance operations, such as index creations.

Parallel execution is sometimes called parallelism. Parallelism is the idea of breaking down a task so that, instead of one process doing all of the work in a query, many processes do part of the work at the same time. An example of this is when four processes combine to calculate the total sales for a year, each process handles one quarter of the year instead of a single process handling all four quarters by itself. The improvement in performance can be quite significant.

Parallel execution improves processing for:

  • Queries requiring large table scans, joins, or partitioned index scans

  • Creation of large indexes

  • Creation of large tables, including materialized views

  • Bulk insertions, updates, merges, and deletions

This section contains the following topics:

8.1.1 When to Implement Parallel Execution

Parallel execution is used to reduce the execution time of queries by exploiting the CPU and I/O capabilities in the hardware.

Parallel execution is a better choice than serial execution when:

  • The query references a large data set.

  • There is low concurrency.

  • Elapsed time is important.

Parallel execution enables many processes working together to execute single operation, such as a SQL query. Parallel execution benefits systems with all of the following characteristics:

  • Symmetric multiprocessors (SMPs), clusters, or massively parallel systems

  • Sufficient I/O bandwidth

  • Underutilized or intermittently used CPUs (for example, systems where CPU usage is typically less than 30%)

  • Sufficient memory to support additional memory-intensive processes, such as sorting, hashing, and I/O buffers

If your system lacks any of these characteristics, parallel execution might not significantly improve performance. In fact, parallel execution may reduce system performance on overutilized systems or systems with small I/O bandwidth.

The benefits of parallel execution can be observed in DSS and data warehouse environments. OLTP systems can also benefit from parallel execution during batch processing and during schema maintenance operations such as creation of indexes. The average simple DML or SELECT statements that characterize OLTP applications would not experience any benefit from being executed in parallel.

8.1.2 When Not to Implement Parallel Execution

Serial execution is different than parallel execution in that only one process executes a single database operation, such as a SQL query.

Serial execution is a better choice than parallel execution when:

  • The query references a small data set.

  • There is high concurrency.

  • Efficiency is important.

Parallel execution is not typically useful for:

  • Environments in which the typical query or transaction is very short (a few seconds or less).

    This includes most online transaction systems. Parallel execution is not useful in these environments because there is a cost associated with coordinating the parallel execution servers; for short transactions, the cost of this coordination may outweigh the benefits of parallelism.

  • Environments in which the CPU, memory, or I/O resources are heavily used.

    Parallel execution is designed to exploit additional available hardware resources; if no such resources are available, then parallel execution does not yield any benefits and indeed may be detrimental to performance.

8.1.3 Fundamental Hardware Requirements

Parallel execution is designed to effectively use multiple CPUs and disks to answer queries quickly.

It is very I/O intensive by nature. To achieve optimal performance, each component in the hardware configuration must be sized to sustain the same level of throughput: from the CPUs and the Host Bus Adapters (HBAs) in the compute nodes, to the switches, and on into the I/O subsystem, including the storage controllers and the physical disks. If the system is an Oracle Real Application Clusters (Oracle RAC) system, then the interconnection also has to be sized appropriately. The weakest link is going to limit the performance and scalability of operations in a configuration.

It is recommended to measure the maximum I/O performance that a hardware configuration can achieve without Oracle Database. You can use this measurement as a baseline for the future system performance evaluations. Remember, it is not possible for parallel execution to achieve better I/O throughput than the underlying hardware can sustain. Oracle Database provides a free calibration tool called Orion, which is designed to measure the I/O performance of a system by simulating Oracle I/O workloads. A parallel execution typically performs large random I/Os.

See Also:

Oracle Database Performance Tuning Guide for information about I/O configuration and design

8.1.4 How Parallel Execution Works

Parallel execution breaks down a task so that, instead of one process doing all of the work in a query, many processes do part of the work at the same time.

This section contains the following topics:

8.1.4.1 Parallel Execution of SQL Statements

Each SQL statement undergoes an optimization and parallelization process when it is parsed.

If the statement is determined to be executed in parallel, then the following steps occur in the execution plan:

  1. The user session or shadow process takes on the role of a coordinator, often called the query coordinator (QC) or the parallel execution (PX) coordinator. The QC is the session that initiates the parallel SQL statement.

  2. The PX coordinator obtains the necessary number of processes called parallel execution (PX) servers. The PX servers are the individual processes that perform work in parallel on behalf of the initiating session.

  3. The SQL statement is executed as a sequence of operations, such as a full table scan or an ORDER BY clause. Each operation is performed in parallel if possible.

  4. When the PX servers are finished executing the statement, the PX coordinator performs any portion of the work that cannot be executed in parallel. For example, a parallel query with a SUM() operation requires adding the individual subtotals calculated by each PX server.

  5. Finally, the PX coordinator returns the results to the user.

8.1.4.2 Producer/Consumer Model

Parallel execution uses the producer/consumer model.

A parallel execution plan is carried out as a series of producer/consumer operations. Parallel execution (PX) servers that produce data for subsequent operations are called producers, PX servers that require the output of other operations are called consumers. Each producer or consumer parallel operation is performed by a set of PX servers called PX server sets. The number of PX servers in PX server set is called Degree of Parallelism (DOP). The basic unit of work for a PX server set is called a data flow operation (DFO).

A PX coordinator can have multiple levels of producer/consumer operations (multiple DFOs), but the number of PX servers sets for a PX coordinator is limited to two. Therefore, at one point in time only two PX server sets can be active for a PX coordinator. As a result, there is parallelism in both the operations in a DFO and between DFOs. The parallelism of an individual DFO is called intra-operation parallelism and the parallelism between DFOs is called inter-operation parallelism. To illustrate intra- and inter-operation parallelism, consider the following statement:

SELECT * FROM employees ORDER BY last_name;

The execution plan implements a full scan of the employees table. This operation is followed by a sorting of the retrieved rows, based on the value of the last_name column. For the sake of this example, assume the last_name column is not indexed. Also assume that the DOP for the query is set to 4, which means that four parallel execution servers can be active for any given operation.

Figure 8-1 illustrates the parallel execution of the example query.

Figure 8-1 Inter-operation Parallelism and Dynamic Partitioning

Description of Figure 8-1 follows
Description of "Figure 8-1 Inter-operation Parallelism and Dynamic Partitioning"

As illustrated in Figure 8-1, there are actually eight PX servers involved in the query even though the DOP is 4. This is because a producer and consumer operator can be performed at the same time (inter-operation parallelism).

Also all of the PX servers involved in the scan operation send rows to the appropriate PX server performing the SORT operation. If a row scanned by a PX server contains a value for the last_name column between A and G, that row is sent to the first ORDER BY parallel execution server. When the scan operation is complete, the sorting processes can return the sorted results to the query coordinator, which returns the complete query results to the user.

8.1.4.3 Granules of Parallelism

The basic unit of work in parallelism is a called a granule.

Oracle Database divides the operation executed in parallel, such as a table scan or index creation, into granules. Parallel execution (PX) servers execute the operation one granule at a time. The number of granules and their sizes correlate with the degree of parallelism (DOP). The number of granules also affect how well the work is balanced across PX servers.

8.1.4.3.1 Block Range Granules

Block range granules are the basic unit of most parallel operations, even on partitioned tables. From an Oracle Database perspective, the degree of parallelism is not related to the number of partitions.

Block range granules are ranges of physical blocks from a table. Oracle Database computes the number and the size of the granules during run-time to optimize and balance the work distribution for all affected parallel execution (PX) servers. The number and size of granules are dependent upon the size of the object and the DOP. Block range granules do not depend on static preallocation of tables or indexes. During the computation of the granules, Oracle Database takes the DOP into account and tries to assign granules from different data files to each of the PX servers to avoid contention whenever possible. Additionally, Oracle Database considers the disk affinity of the granules on massive parallel processing (MPP) systems to take advantage of the physical proximity between PX servers and disks.

8.1.4.3.2 Partition Granules

When partition granules are used, a parallel execution (PX) server works on an entire partition or subpartition of a table or index.

Because partition granules are statically determined by the structure of the table or index when a table or index is created, partition granules do not give you the flexibility in executing an operation in parallel that block granules do. The maximum allowable degree of parallelism (DOP) is the number of partitions. This might limit the utilization of the system and the load balancing across PX servers.

When partition granules are used for parallel access to a table or index, you should use a relatively large number of partitions, ideally three times the DOP, so that Oracle Database can effectively balance work across the PX servers.

Partition granules are the basic unit of parallel index range scans, joins between two equipartitioned tables where the query optimizer has chosen to use partition-wise joins, and parallel operations that modify multiple partitions of a partitioned object. These operations include parallel creation of partitioned indexes, and parallel creation of partitioned tables.

You can tell which types of granules were used by looking at the execution plan of a statement. The line PX BLOCK ITERATOR above the table or index access indicates that block range granules have been used. In the following example, you can see this on line 7 of the explain plan output just above the TABLE FULL ACCESS on the SALES table.

-------------------------------------------------------------------------------------------------
|Id|      Operation          |  Name  |Rows|Bytes|Cost%CPU|  Time  |Pst|Pst|  TQ |INOUT|PQDistri|
-------------------------------------------------------------------------------------------------
| 0|SELECT STATEMENT         |        |  17| 153 |565(100)|00:00:07|   |   |     |     |        |
| 1| PX COORDINATOR          |        |    |     |        |        |   |   |     |     |        |
| 2|  PX SEND QC(RANDOM)     |:TQ10001|  17| 153 |565(100)|00:00:07|   |   |Q1,01|P->S |QC(RAND)|
| 3|   HASH GROUP BY         |        |  17| 153 |565(100)|00:00:07|   |   |Q1,01|PCWP |        |
| 4|    PX RECEIVE           |        |  17| 153 |565(100)|00:00:07|   |   |Q1,01|PCWP |        |
| 5|     PX SEND HASH        |:TQ10000|  17| 153 |565(100)|00:00:07|   |   |Q1,00|P->P | HASH   |
| 6|      HASH GROUP BY      |        |  17| 153 |565(100)|00:00:07|   |   |Q1,00|PCWP |        |
| 7|       PX BLOCK ITERATOR |        | 10M| 85M | 60(97) |00:00:01| 1 | 16|Q1,00|PCWC |        |
|*8|        TABLE ACCESS FULL|  SALES | 10M| 85M | 60(97) |00:00:01| 1 | 16|Q1,00|PCWP |        |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
8 - filter("CUST_ID"<=22810 AND "CUST_ID">=22300)

When partition granules are used, you see the line PX PARTITION RANGE above the table or index access in the explain plan output. On line 6 of the example that follows, the plan has PX PARTITION RANGE ALL because this statement accesses all of the 16 partitions in the table. If not all of the partitions are accessed, it simply shows PX PARTITION RANGE.

---------------------------------------------------------------------------------------------------------
|Id|      Operation                    |  Name    |Rows|Byte|Cost%CPU|  Time  |Ps|Ps|  TQ |INOU|PQDistri|
---------------------------------------------------------------------------------------------------------
| 0|SELECT STATEMENT                   |          |  17| 153|   2(50)|00:00:01|  |  |     |    |        |
| 1| PX COORDINATOR                    |          |    |    |        |        |  |  |     |    |        |
| 2|  PX SEND QC(RANDOM)               |:TQ10001  |  17| 153|   2(50)|00:00:01|  |  |Q1,01|P->S|QC(RAND)|
| 3|   HASH GROUP BY                   |          |  17| 153|   2(50)|00:00:01|  |  |Q1,01|PCWP|        |
| 4|    PX RECEIVE                     |          |  26| 234|    1(0)|00:00:01|  |  |Q1,01|PCWP|        |
| 5|     PX SEND HASH                  |:TQ10000  |  26| 234|    1(0)|00:00:01|  |  |Q1,00|P->P| HASH   |
| 6|      PX PARTITION RANGE ALL       |          |  26| 234|    1(0)|00:00:01|  |  |Q1,00|PCWP|        |
| 7|       TABLEACCESSLOCAL INDEX ROWID|SALES     |  26| 234|    1(0)|00:00:01| 1|16|Q1,00|PCWC|        |
|*8|        INDEX RANGE SCAN           |SALES_CUST|  26|    |    1(0)|00:00:01| 1|16|Q1,00|PCWP|        |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
8 - access("CUST_ID"<=22810 AND "CUST_ID">=22300)
8.1.4.4 Distribution Methods Between Producers and Consumers

A distribution method is the method by which data is sent (or redistributed) from one parallel execution (PX) server set to another.

The following are the most commonly used distribution methods in parallel execution.
  • Hash Distribution

    The hash distribution method uses a hash function on one or more columns in the row which then determines the consumer where the producer should send the row. This distribution attempts to divide the work equally among consumers based on hash values.

  • Broadcast Distribution

    In the broadcast distribution method, each producer sends all rows to all consumers. This method is used when the result set of the left side in a join operation is small and the cost of broadcasting all rows is not high. The result set from the right side of the join does not need to be distributed in this case; consumer PX servers assigned to the join operation can scan the right side and perform the join.

  • Range Distribution

    Range distribution is mostly used in parallel sort operations. In this method each producer sends rows that have a range of values to the same consumer. This is the method used in Figure 8-1.

  • Hybrid Hash Distribution

    Hybrid hash is an adaptive distribution method used in join operations. The actual distribution method is decided at runtime by the optimizer depending on the size of the result set of the left side of the join. The number of rows returned from the left side is counted and checked against a threshold value. When the number of rows is less than or equal to the threshold value, broadcast distribution is used for the left side of the join, and the right side is not distributed as the same consumer PX servers assigned to the join operation scan the right side and perform the join. When the number of rows returned from the left side is higher than the threshold value, hash distribution is used for both sides of the join.

To determine the distribution method, the parallel execution (PX) coordinator examines each operation in a SQL statement's execution plan and then determines the way in which the rows operated on by the operation must be redistributed among the PX servers. As an example of parallel query, consider the query in Example 8-1. Figure 8-2 illustrates the data flow or query plan for the query in Example 8-1, and Example 8-2 shows the explain plan output for the same query.

The query plan shows that an adaptive distribution methods was picked by the PX coordinator. Assuming the optimizer picks hash distribution at runtime, the execution proceeds as follows: two sets of PX servers, SS1 and SS2, are allocated for the query, each server set has four PX servers because of the PARALLEL hint that specifies the DOP of the statement.

PX set SS1 first scans the table customers and sends rows to SS2, which builds a hash table on the rows. In other words, the consumers in SS2 and the producers in SS1 work concurrently: one in scanning customers in parallel, the other is consuming rows and building the hash table to enable the hash join in parallel. This is an example of inter-operation parallelism.

After a PX server process in SS1 scans a row from the customers table, which PX server process in SS2 should it send it to? In this case, the redistribution of rows flowing up from SS1 performing the parallel scan of customers into SS2 performing the parallel hash-join is done by hash distribution on the join column. That is, a PX server process scanning customers computes a hash function on the value of the column customers.cust_id to decide which PX server process in SS2 to send it to. The redistribution method used is explicitly shown in the Distrib column in the EXPLAIN PLAN of the query. In Figure 8-2, this can be seen on lines 5, 9, and 14 of the EXPLAIN PLAN.

After SS1 has finished scanning the entire customers table, it scans the sales table in parallel. It sends its rows to PX servers in SS2, which then perform the probes to finish the hash join in parallel. These PX servers also perform a GROUP BY operation after the join. After SS1 has scanned the sales table in parallel and sent the rows to SS2, it switches to performing the final group by operation in parallel. At this point the PX servers in SS2 send their rows using hash distribution to PX servers on SS1 for the group by operation. This is how two server sets run concurrently to achieve inter-operation parallelism across various operators in the query tree.

Figure 8-2 Data Flow Diagram for Joining Tables

Description of Figure 8-2 follows
Description of "Figure 8-2 Data Flow Diagram for Joining Tables"

Example 8-1 Running an Explain Plan for a Query on Customers and Sales

EXPLAIN PLAN FOR
SELECT /*+ PARALLEL(4) */ customers.cust_first_name, customers.cust_last_name, 
  MAX(QUANTITY_SOLD), AVG(QUANTITY_SOLD)
  FROM sales, customers
  WHERE sales.cust_id=customers.cust_id
  GROUP BY customers.cust_first_name, customers.cust_last_name;

Explained.

Example 8-2 Explain Plan Output for a Query on Customers and Sales

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3260900439
---------------------------------------------------------------------------------------------------------------------------------------
|Id  |Operation                      |Name     |Rows  | Bytes |TempSpc|Cost (%CPU)| Time     |Pstart|Pstop |   TQ  |IN-OUT|PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------------
|  0 |SELECT STATEMENT               |         |  960 | 26880 |       |    6  (34)| 00:00:01 |      |      |       |      |           |
|  1 | PX COORDINATOR                |         |      |       |       |           |          |      |      |       |      |           |
|  2 |  PX SEND QC (RANDOM)          |:TQ10003 |  960 | 26880 |       |    6  (34)| 00:00:01 |      |      | Q1,03 | P->S |QC (RAND)  |
|  3 |   HASH GROUP BY               |         |  960 | 26880 | 50000 |    6  (34)| 00:00:01 |      |      | Q1,03 | PCWP |           |
|  4 |    PX RECEIVE                 |         |  960 | 26880 |       |    6  (34)| 00:00:01 |      |      | Q1,03 | PCWP |           |
|  5 |     PX SEND HASH              |:TQ10002 |  960 | 26880 |       |    6  (34)| 00:00:01 |      |      | Q1,02 | P->P |HASH       |
|  6 |      HASH GROUP BY            |         |  960 | 26880 | 50000 |    6  (34)| 00:00:01 |      |      | Q1,02 | PCWP |           |
|* 7 |       HASH JOIN               |         |  960 | 26880 |       |    5  (20)| 00:00:01 |      |      | Q1,02 | PCWP |           |
|  8 |        PX RECEIVE             |         |  630 | 12600 |       |    2   (0)| 00:00:01 |      |      | Q1,02 | PCWP |           |
|  9 |         PX SEND HYBRID HASH   |:TQ10000 |  630 | 12600 |       |    2   (0)| 00:00:01 |      |      | Q1,00 | P->P |HYBRID HASH|
| 10 |          STATISTICS COLLECTOR |         |      |       |       |           |          |      |      | Q1,00 | PCWC |           |
| 11 |           PX BLOCK ITERATOR   |         |  630 | 12600 |       |    2   (0)| 00:00:01 |      |      | Q1,00 | PCWC |           |
| 12 |            TABLE ACCESS FULL  |CUSTOMERS|  630 | 12600 |       |    2   (0)| 00:00:01 |      |      | Q1,00 | PCWP |           |
| 13 |        PX RECEIVE             |         |  960 |  7680 |       |    2   (0)| 00:00:01 |      |      | Q1,02 | PCWP |           |
| 14 |         PX SEND HYBRID HASH   |:TQ10001 |  960 |  7680 |       |    2   (0)| 00:00:01 |      |      | Q1,01 | P->P |HYBRID HASH|
| 15 |          PX BLOCK ITERATOR    |         |  960 |  7680 |       |    2   (0)| 00:00:01 |    1 |   16 | Q1,01 | PCWC |           |
| 16 |           TABLE ACCESS FULL   |SALES    |  960 |  7680 |       |    2   (0)| 00:00:01 |    1 |   16 | Q1,01 | PCWP |           |
---------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
7 - access("SALES"."CUST_ID"="CUSTOMERS"."CUST_ID")
Note
-----
   - Degree of Parallelism is 4 because of hint
8.1.4.5 How Parallel Execution Servers Communicate

To execute a query in parallel, Oracle Database generally creates a set of producer parallel execution servers and a set of consumer parallel execution servers.

The producer server retrieves rows from tables and the consumer server performs operations such as join, sort, DML, and DDL on these rows. Each server in the producer set has a connection to each server in the consumer set. The number of virtual connections between parallel execution servers increases as the square of the degree of parallelism.

Each communication channel has at least one, and sometimes up to four memory buffers, which are allocated from the shared pool. Multiple memory buffers facilitate asynchronous communication among the parallel execution servers.

A single-instance environment uses at most three buffers for each communication channel. An Oracle Real Application Clusters environment uses at most four buffers for each channel. Figure 8-3 illustrates message buffers and how producer parallel execution servers connect to consumer parallel execution servers.

Figure 8-3 Parallel Execution Server Connections and Buffers

Description of Figure 8-3 follows
Description of "Figure 8-3 Parallel Execution Server Connections and Buffers"

When a connection is between two processes on the same instance, the servers communicate by passing the buffers back and forth in memory (in the shared pool). When the connection is between processes in different instances, the messages are sent using external high-speed network protocols over the interconnect. In Figure 8-3, the DOP equals the number of parallel execution servers, which in this case is n. Figure 8-3 does not show the parallel execution coordinator. Each parallel execution server actually has an additional connection to the parallel execution coordinator. It is important to size the shared pool adequately when using parallel execution. If there is not enough free space in the shared pool to allocate the necessary memory buffers for a parallel server, it fails to start.

8.1.5 Parallel Execution Server Pool

When an instance starts, Oracle Database creates a pool of parallel execution servers, which are available for any parallel operation.

The initialization parameter PARALLEL_MIN_SERVERS specifies the number of parallel execution servers that Oracle Database creates at instance startup.

When executing a parallel operation, the parallel execution coordinator obtains parallel execution servers from the pool and assigns them to the operation. If necessary, Oracle Database can create additional parallel execution servers for the operation. These parallel execution servers remain with the operation throughout execution. After the statement has been processed, the parallel execution servers return to the pool.

Note:

Parallel processes can be released pro-actively before individual statements using parallelism are finished. For example, an uncommitted parallel DML operation or a partially fetched parallel SELECT statement with two Parallel Server Sets (Producer-Consumer) will release one of the Parallel Server Sets to the server pool as soon as it has finished working, freeing half of the parallel process for use by other statements. You can use the V$PQ_SESSTAT view to monitor the number of servers released proactively in this way.

If the number of parallel operations increases, Oracle Database creates additional parallel execution servers to handle incoming requests. However, Oracle Database never creates more parallel execution servers for an instance than the value specified by the initialization parameter PARALLEL_MAX_SERVERS.

If the number of parallel operations decreases, Oracle Database terminates any parallel execution servers that have been idle for a threshold interval. Oracle Database does not reduce the size of the pool less than the value of PARALLEL_MIN_SERVERS, no matter how long the parallel execution servers have been idle.

See Also:

V$PQ_SESSTAT in the Oracle Database Reference.
8.1.5.1 Processing without Enough Parallel Execution Servers

Oracle Database can process a parallel operation with fewer than the requested number of processes.

If all parallel execution servers in the pool are occupied and the maximum number of parallel execution servers has been started, the parallel execution coordinator switches to serial processing.

See Also:

8.1.6 Balancing the Workload to Optimize Performance

To optimize performance, all parallel execution servers should have equal workloads.

For SQL statements run in parallel by block range or by parallel execution servers, the workload is dynamically divided among the parallel execution servers. This minimizes workload skewing, which occurs when some parallel execution servers perform significantly more work than the other processes.

For the relatively few SQL statements executed in parallel by partitions, if the workload is evenly distributed among the partitions, you can optimize performance by matching the number of parallel execution servers to the number of partitions or by choosing a DOP in which the number of partitions is a multiple of the number of processes. This applies to partition-wise joins and parallel DML on tables created before Oracle9i. Refer to Limitation on the Degree of Parallelism for more information.

For example, suppose a table has 16 partitions, and a parallel operation divides the work evenly among them. You can use 16 parallel execution servers (DOP equals 16) to do the work in approximately one-tenth the time that one process would take. You might also use five processes to do the work in one-fifth the time, or two processes to do the work in one-half the time.

If, however, you use 15 processes to work on 16 partitions, the first process to finish its work on one partition then begins work on the 16th partition; and as the other processes finish their work, they become idle. This configuration does not provide good performance when the work is evenly divided among partitions. When the work is unevenly divided, the performance varies depending on whether the partition that is left for last has more or less work than the other partitions.

Similarly, suppose you use six processes to work on 16 partitions and the work is evenly divided. In this case, each process works on a second partition after finishing its first partition, but only four of the processes work on a third partition while the other two remain idle.

In general, you cannot assume that the time taken to perform a parallel operation on a given number of partitions (N) with a given number of parallel execution servers (P) equals N divided by P. This formula does not consider the possibility that some processes might have to wait while others finish working on the last partitions. By choosing an appropriate DOP, however, you can minimize the workload skew and optimize performance.

8.1.7 Multiple Parallelizers

Each parallel execution (PX) coordinator in an execution plan is called a parallelizer.

The number of PX servers used by a SQL statement is determined by the statement degree of parallelism (DOP) and the number of parallelizers. Because the number of PX server sets for a parallelizer is limited to two, the number of PX servers for most statements is DOP*2. Some statements can have more than one parallelizer. Because each parallelizer can use two PX server sets, the number of PX servers for these statements can be more than DOP*2. You can identify these statements by looking at the EXPLAIN PLAN. If the plan has multiple PX coordinators it means the statement has multiple parallelizers.

A few example cases where SQL statements use multiple parallelizers are subquery factoring, grouping sets, star queries, in-memory aggregation, and noncorrelated subqueries.

Multiple parallelizers in a SQL statement can be active concurrently or one after the other depending on the execution plan.

A statement with a single parallelizer allocates the required number of PX servers at the start of execution and holds these allocated PX servers without releasing until the statement completes. This ensures that the number of PX servers throughout the execution is constant. Statements with multiple parallelizers are different as they allocate PX servers when each parallelizer starts. Because parallelizers can start at different times during the execution, each parallelizer may be running with a different number of PX servers based on the number of available processes in the system.

If multiple parallelizers are executed concurrently the statement can use more PX servers than DOP*2.

The view V$PQ_SESSTAT shows the number of parallelizers in the STATISTIC column. The data flow operation statistic,DFO Trees , shows the number of parallelizers. The Server Threads statistic shows the maximum number of PX servers used concurrently for a SQL statement.

See Also:

Oracle Database Reference for information about V$PQ_SESSTAT and other dynamic views

8.1.8 Parallel Execution on Oracle RAC

By default in an Oracle RAC environment, a SQL statement executed in parallel can run across all the nodes in the cluster.

For this cross-node or inter-node parallel execution to perform, the interconnect in the Oracle RAC environment must be sized appropriately because inter-node parallel execution may result in heavy interconnect traffic. Inter-node parallel execution does not scale with an undersized interconnect.

Limiting the Number of Available Instances

In an Oracle RAC environment, you can use services to limit the number of instances that participate in the execution of a parallel SQL statement. The default service includes all available instances. You can create any number of services, each consisting of one or more instances. When a user connects to the database using a service, only PX servers on the instances that are members of the service can participate in the execution of a parallel statement.

To limit parallel execution to a single node, you can set the PARALLEL_FORCE_LOCAL initialization parameter to TRUE. In this case, only PX servers on the instance that a session connects to is used to execute parallel statements from that session. Note that when this parameter is set to TRUE, all parallel statements running on that instance are executed locally, whether the session connects to the instance directly or connects using a service.

Parallel Execution on Flex Clusters

Parallel statements executed on flex clusters can use both hub and leaf nodes. As user sessions are only allowed to connect to the hub nodes, the coordinator process (Query Coordinator or PX Coordinator) resides on hub nodes and can use PX server processes from any node in the cluster. For parallel queries any PX server on any node can participate in the execution of the statement. For parallel DML operations only PX servers on hub nodes can participate in the execution of the DML part of the statement as only hub nodes are allowed to perform DML operations.

When there is data distribution from the leaf nodes to the hub nodes for DML operations, the execution plan indicates this distribution. In the following example, data is distributed to hub nodes in line Id 5, indicating the load operation in line Id 3 is executed only on hub nodes.

--------------------------------------------------------
| Id  | Operation                          | Name      |
--------------------------------------------------------
|   0 | CREATE TABLE STATEMENT             |           |
|   1 |  PX COORDINATOR                    |           |
|   2 |   PX SEND QC (RANDOM)              | :TQ10001  |
|   3 |    LOAD AS SELECT (HYBRID TSM/HWMB)| SALESTEMP |
|   4 |     PX RECEIVE                     |           |
|   5 |      PX SEND ROUND-ROBIN (HUB)     | :TQ10000  |
|   6 |       PX BLOCK ITERATOR            |           |
|   7 |        TABLE ACCESS FULL           | SALES     |
--------------------------------------------------------

See Also:

8.2 Setting the Degree of Parallelism

The degree of parallelism (DOP) is the number of parallel execution servers associated with a single operation.

Parallel execution is designed to effectively use multiple CPUs. Oracle Database parallel execution framework enables you to either explicitly choose a specific degree of parallelism or to rely on Oracle Database to automatically control it.

This section contains the following topics:

8.2.1 Manually Specifying the Degree of Parallelism

A specific degree of parallelism (DOP) can be requested from Oracle Database for both tables and indexes.

For example, you can set a fixed DOP at a table level with the following:

ALTER TABLE sales PARALLEL 8;
ALTER TABLE customers PARALLEL 4;

In this example, queries accessing just the sales table request a DOP of 8 and queries accessing the customers table request a DOP of 4. A query accessing both the sales and the customers tables is processed with a DOP of 8 and potentially allocates 16 parallel execution servers (because of the producer/consumer model). Whenever different DOPs are specified, Oracle Database uses the higher DOP.

You can also request a specific DOP by using statement level or object level parallel hints.

The DOP specified in the PARALLEL clause of a table or an index takes effect only when PARALLEL_DEGREE_POLICY is set to MANUAL or LIMITED.

The actual runtime DOP of a statement can be limited by Oracle Database Resource Manager.

See Also:

8.2.2 Default Degree of Parallelism

If the PARALLEL clause is specified but no degree of parallelism (DOP) is listed, then the object gets the default DOP.

For example, you can set a table to the default DOP with the following SQL statement.

ALTER TABLE sales PARALLEL;

Default parallelism uses a formula to determine the DOP based on the system configuration, as in the following:

  • For a single instance, DOP = PARALLEL_THREADS_PER_CPU x CPU_COUNT

  • For an Oracle RAC configuration, DOP = PARALLEL_THREADS_PER_CPU x sum(CPU_COUNT)

By default, sum(CPU_COUNT) is the total number of CPUs in the cluster. However, if you have used Oracle RAC services to limit the number of nodes across which a parallel operation can execute, then sum(CPU_COUNT) is the total number of CPUs across the nodes belonging to that service. For example, on a 4-node Oracle RAC cluster, with each node having 8 CPU cores and no Oracle RAC services, the default DOP would be 2 x (8+8+8+8) = 64.

You can also request the default DOP by using statement level or object level parallel hints.

The default DOP specified in the PARALLEL clause of a table or an index takes effect only when PARALLEL_DEGREE_POLICY is set to MANUAL.

The default DOP algorithm is designed to use maximum resources and assumes that the operation finishes faster if it can use more resources. Default DOP targets the single-user workload and it is not recommended in a multiuser environment.

The actual runtime DOP of a SQL statement can be limited by Oracle Database Resource Manager.

See Also:

8.2.3 Automatic Degree of Parallelism

Automatic Degree of Parallelism (Auto DOP) enables Oracle Database to automatically decide if a statement should execute in parallel and what DOP it should use.

The following is a summary of parallel statement processing when Auto DOP is enabled.

  1. A SQL statement is issued.

  2. The statement is parsed and the optimizer determines the execution plan.

  3. The threshold limit specified by the PARALLEL_MIN_TIME_THRESHOLD initialization parameter is checked.

    1. If the expected execution time is less than the threshold limit, the SQL statement is run serially.

    2. If the expected execution time is greater than the threshold limit, the statement is run in parallel based on the DOP that the optimizer calculates, including factoring for any defined resource limitations.

8.2.4 Determining Degree of Parallelism in Auto DOP

With automatic degree of parallelism (DOP), the optimizer automatically determines the DOP for a statement based on the resource requirements of that statement.

The optimizer uses the cost of all scan operations, such as a full table scan or index fast full scan, and the cost of all CPU operations in the execution plan to determine the necessary DOP.

However, the optimizer limits the actual maximum DOP to ensure parallel execution servers do not overwhelm the system. This limit is set by the parameter PARALLEL_DEGREE_LIMIT. The default value for this parameter is CPU, which means the DOP is limited by the number of CPUs on the system (PARALLEL_THREADS_PER_CPU * sum(CPU_COUNT)) also known as the default DOP. This default DOP ensures that a single user operation cannot overwhelm the system. By adjusting this parameter setting, you can control the maximum DOP the optimizer can choose for a SQL statement. The optimizer can further limit the maximum DOP that can be chosen if Oracle Database Resource Manager is used to limit the DOP.

Note:

The value AUTO for PARALLEL_DEGREE_LIMIT has the same functionality as the value CPU.

To calculate the cost of operations for a SQL statement, Auto DOP uses information about the hardware characteristics of the system. The hardware characteristics include I/O calibration statistics so these statistics should be gathered.

If I/O calibration is not run to gather the required statistics, a default calibration value is used to calculate the cost of operations and the DOP.

I/O calibration statistics can be gathered with the PL/SQL DBMS_RESOURCE_MANAGER.CALIBRATE_IO procedure. I/O calibration is a one-time action if the physical hardware does not change.

The DOP determined by the optimizer is shown in the notes section of an explain plan output, as shown in the following explain plan output, visible either using the explain plan statement or V$SQL_PLAN.

EXPLAIN PLAN FOR
SELECT SUM(AMOUNT_SOLD) FROM SH.SALES;

PLAN TABLE OUTPUT
Plan hash value: 1763145153

----------------------------------------------------------------------------------------------------------------------
|Id| Operation              | Name     | Rows | Bytes| Cost (%CPU)| Time     | Pstart| Pstop|    TQ |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------
| 0| SELECT STATEMENT       |          |     1|     4|     2   (0)| 00:00:01 |       |       |      |      |           |
| 1|  SORT AGGREGATE        |          |     1|     4|            |          |       |      |       |      |           |
| 2|   PX COORDINATOR       |          |      |      |            |          |       |      |       |      |           |
| 3|    PX SEND QC (RANDOM) | :TQ10000 |     1|     4|            |          |       |      |  Q1,00| P->S | QC (RAND) |
| 4|     SORT AGGREGATE     |          |     1|     4|            |          |       |      |  Q1,00| PCWP |           |
| 5|      PX BLOCK ITERATOR |          |   960|  3840|     2   (0)| 00:00:01 |     1 |    16|  Q1,00| PCWC |           |
| 6|       TABLE ACCESS FULL| SALES    |   960|  3840|     2   (0)| 00:00:01 |     1 |    16|  Q1,00| PCWP |           |
------------------------------------------------------------------------------------------------------------------------
Note
-----
   - automatic DOP: Computed Degree of Parallelism is 4

See Also:

Oracle Database PL/SQL Packages and Types Reference for information about the DBMS_RESOURCE_MANAGER package

8.2.5 Controlling Automatic Degree of Parallelism

There are several initialization parameters that control automatic degree of parallelism (auto DOP). These initialization parameters include PARALLEL_DEGREE_POLICY, PARALLEL_DEGREE_LIMIT, PARALLEL_MIN_TIME_THRESHOLD, and PARALLEL_MIN_DEGREE.

The initialization parameter PARALLEL_DEGREE_POLICY controls whether Auto DOP, parallel statement queuing, and in-memory parallel execution are enabled. This parameter has the following possible values:

  • MANUAL

    This setting disables Auto DOP, parallel statement queuing and in-memory parallel execution. It reverts the behavior of parallel execution to what it was previous to Oracle Database 11g Release 2 (11.2), which is the default.

    With the default setting of MANUAL for PARALLEL_DEGREE_POLICY, the system only uses parallel execution when a DOP has been explicitly set on an object or if a parallel hint is specified in the SQL statement. The DOP used is exactly what was specified. No parallel statement queuing and in-memory parallel execution occurs.

  • LIMITED

    This setting enables Auto DOP for some statements but parallel statement queuing and in-memory parallel execution are disabled. Automatic DOP is applied only to statements that access tables or indexes declared explicitly with the PARALLEL clause without an explicit DOP specified. Tables and indexes that have a DOP specified use that explicit DOP setting.

    If you want Oracle Database to automatically decide the DOP only for a subset of SQL statements that touch a specific subset of objects, then set PARALLEL_DEGREE_POLICY to LIMITED and set the parallel property without specifying an explicit DOP on that subset of objects.

  • AUTO

    This setting enables Auto DOP for all statements, also enables parallel statement queuing and in-memory parallel execution.

    If you want Oracle Database to automatically decide the DOP for all SQL statements, then set PARALLEL_DEGREE_POLICY to AUTO.

  • ADAPTIVE

    This setting enables Auto DOP, parallel statement queuing, and in-memory parallel execution, similar to the AUTO value. In addition, performance feedback is enabled.

The PARALLEL_DEGREE_LIMIT initialization parameter specifies the maximum DOP that Auto DOP can use systemwide. For a more fine grained control of the maximum DOP, you can use Oracle Database Resource Manager.

The PARALLEL_MIN_TIME_THRESHOLD initialization parameter specifies the minimum estimated execution time for a statement to be considered for Auto DOP. First the optimizer calculates a serial execution plan for the SQL statement. If the estimated execution time is greater than the value of PARALLEL_MIN_TIME_THRESHOLD, the statement becomes a candidate for Auto DOP.

The PARALLEL_MIN_ DEGREE initialization parameter controls the minimum degree of parallelism computed by automatic degree of parallelism. However, PARALLEL_MIN_DEGREE has no impact if the value of PARALLEL_MIN_DEGREE is greater than the value of CPU_COUNT or if an object is Oracle-owned, such as a dictionary table or view created on a dictionary table.

You can also request Auto DOP by specifying the appropriate statement level or object level SQL hints.

See Also:

8.2.6 Adaptive Parallelism

The adaptive multiuser algorithm reduces the degree of parallelism as the load on the system increases.

When using Oracle Database adaptive parallelism capabilities, the database uses an algorithm at SQL execution time to determine whether a parallel operation should receive the requested DOP or have its DOP lower to ensure the system is not overloaded.

In a system that makes aggressive use of parallel execution by using a high DOP, the adaptive algorithm adjusts the DOP down when only a few operations are running in parallel. While the algorithm still ensures optimal resource utilization, users may experience inconsistent response times. Using solely the adaptive parallelism capabilities in an environment that requires deterministic response times is not advised. Adaptive parallelism is controlled through the database initialization parameter PARALLEL_ADAPTIVE_MULTI_USER.

Note:

Because the initialization parameter PARALLEL_ADAPTIVE_MULTI_USER is deprecated in Oracle Database 12c Release 2 (12.2.0.1) and to be desupported in a future release, Oracle recommends using parallel statement queuing instead.

8.3 In-Memory Parallel Execution

In-memory features provide techniques for parallel execution.

This section discusses in-memory parallel execution.

8.3.1 Buffer Cache Usage in Parallel Execution

By default parallel execution does not use the SGA (buffer cache) to cache the scanned blocks unless the object is very small or is declared as CACHE.

In-Memory Parallel Execution, enabled by setting the parameter PARALLEL_DEGREE_POLICY is set to AUTO, enables parallel statements to leverage the SGA to cache object blocks. Oracle Database decides if an object that is accessed using parallel execution would benefit from being cached in the SGA. The decision to cache an object is based on a well-defined set of heuristics including the size of the object and frequency on which it is accessed. In an Oracle Real Applications Cluster (Oracle RAC) environment, Oracle Database maps pieces of the object into each of the buffer caches on the active instances. By creating this mapping, Oracle Database automatically knows which buffer cache to access to find different parts or pieces of the object. Using this information, Oracle Database prevents multiple instances from reading the same information from disk over and over again, thus maximizing the amount of memory that can cache objects. It does this by using PX servers on the instances where the blocks are cached.

If the size of the object is larger than a specific threshold value based on the total size of the buffer cache (single instance) or the size of the buffer cache multiplied by the number of active instances in an Oracle RAC cluster, then the object is read using direct-path reads and not cached in the SGA.

8.3.2 Automatic Big Table Caching

Automatic big table caching integrates queries with the buffer cache to enhance the in-memory query capabilities of Oracle Database, in both single instance and Oracle RAC environments.

In Oracle Real Application Clusters (Oracle RAC) environments, this feature is supported only with parallel queries. In single instance environments, this feature is supported with both parallel and serial queries.

The cache section reserved for the big table cache is used for caching data for table scans. While the big table cache is primarily designed to enhance performance for data warehouse workloads, it also improves performance in Oracle Database running mixed workloads.

Automatic big table caching uses temperature and object based algorithms to track medium and big tables. Oracle does cache very small tables, but automatic big table caching does not track these tables.

To enable automatic big table caching for serial queries, you must set a value (percentage) for the DB_BIG_TABLE_CACHE_PERCENT_TARGET initialization parameter. Additionally, you must set the PARALLEL_DEGREE_POLICY initialization parameter to AUTO or ADAPTIVE to enable parallel queries to use automatic big table caching. In Oracle RAC environments, automatic big table caching is only supported in parallel queries so both settings are required.

If a large table is approximately the size of the combined size of the big table cache of all instances, then the table is partitioned and cached, or mostly cached, on all instances. An in-memory query could eliminate most disk reads for queries on the table, or the database could intelligently read from disk only for that portion of the table that does not fit in the big table cache. If the big table cache cannot cache all the tables to be scanned, only the most frequently accessed table are cached, and the rest are read through direct read automatically.

The DB_BIG_TABLE_CACHE_PERCENT_TARGET parameter determines the percentage of the buffer cache size used for scans. If DB_BIG_TABLE_CACHE_PERCENT_TARGET is set to 80 (%), then 80 (%) of the buffer cache is used for scans and the remaining 20 (%) is used for OLTP workloads.

The DB_BIG_TABLE_CACHE_PERCENT_TARGET parameter is only enabled in an Oracle RAC environment if PARALLEL_DEGREE_POLICY is set to AUTO or ADAPTIVE. The default for DB_BIG_TABLE_CACHE_PERCENT_TARGET is 0 (disabled) and the upper limit is 90 (%) reserving at least 10% buffer cache for usage besides table scans. When the value is 0, in-memory queries run with the existing least recently used (LRU) mechanism. You can adjust the DB_BIG_TABLE_CACHE_PERCENT_TARGET parameter dynamically.

Use the following guidelines when setting the DB_BIG_TABLE_CACHE_PERCENT_TARGET parameter:

  • If you do not enable automatic degree of parallelism (DOP) in an Oracle RAC environment, then you should not set this parameter because the big table cache section is not used in that situation.

  • When setting this parameter, you should consider the workload mix: how much of the workload is for OLTP; insert, update, and random access; and how much of the workload involves table scans. Because data warehouse workloads often perform large table scans, you may consider giving big table cache section a higher percentage of buffer cache space for data warehouses.

  • This parameter can be dynamically changed if the workload changes. The change could take some time depending on the current workload to reach the target, because buffer cache memory might be actively used at the time.

When PARALLEL_DEGREE_POLICY is set to AUTO or ADAPTIVE, additional object-level statistics for a data warehouse load and scan buffers are added to represent the number of parallel queries (PQ) scans on the object on the particular (helper) instance.

The V$BT_SCAN_CACHE and V$BT_SCAN_OBJ_TEMPS views provide information about the big table cache.

See Also:

8.4 Parallel Statement Queuing

In some situations, parallel statements are queued while waiting for resources.

When the parameter PARALLEL_DEGREE_POLICY is set to AUTO, Oracle Database queues SQL statements that require parallel execution if the necessary number of parallel execution server processes are not available. After the necessary resources become available, the SQL statement is dequeued and allowed to execute. The default dequeue order is a simple first in, first out queue based on the time a statement was issued.

The following is a summary of parallel statement processing.

  1. A SQL statements is issued.

  2. The statement is parsed and the DOP is automatically determined.

  3. Available parallel resources are checked.

    1. If there are sufficient parallel execution servers available and there are no statements ahead in the queue waiting for the resources, the SQL statement is executed.

    2. If there are not sufficient parallel execution servers available, the SQL statement is queued based on specified conditions and dequeued from the front of the queue when specified conditions are met.

Parallel statements are queued if running the statements would increase the number of active parallel servers above the value of the PARALLEL_SERVERS_TARGET initialization parameter. For example, if PARALLEL_SERVERS_TARGET is set to 64, the number of current active servers is 60, and a new parallel statement needs 16 parallel servers, it would be queued because 16 added to 60 is greater than 64, the value of PARALLEL_SERVERS_TARGET.

This value is not the maximum number of parallel server processes allowed on the system, but the number available to run parallel statements before parallel statement queuing is used. It is set lower than the maximum number of parallel server processes allowed on the system (PARALLEL_MAX_SERVERS) to ensure each parallel statement gets all of the parallel server resources required and to prevent overloading the system with parallel server processes. Note all serial (nonparallel) statements execute immediately even if parallel statement queuing has been activated.

If a statement has been queued, it is identified by the resmgr:pq queued wait event.

This section discusses the following topics:

See Also:

8.4.1 About Managing Parallel Statement Queuing with Oracle Database Resource Manager

By default, the parallel statement queue operates as a first-in, first-out queue, but you can modify the default behavior with a resource plan.

Note:

A multitenant container database is the only supported architecture in Oracle Database 21c. While the documentation is being revised, legacy terminology may persist. In most cases, "database" and "non-CDB" refer to a CDB or PDB, depending on context. In some contexts, such as upgrades, "non-CDB" refers to a non-CDB from a previous release.

By configuring and setting a resource plan, you can control the order in which parallel statements are dequeued and the number of parallel execution servers used by each workload or consumer group.

Oracle Database Resource Manager operates based on the concept of consumer groups and resource plans. Consumer groups identify groups of users with identical resource privileges and requirements. A resource plan consists of a collection of directives for each consumer group which specify controls and allocations for various database resources, such as parallel servers. For multitenant container databases (CDBs) and pluggable databases (PDBs), the order of the parallel statement queue is managed by the directive called shares.

A resource plan is enabled by setting the RESOURCE_MANAGER_PLAN parameter to the name of the resource plan.

You can use the directives described in the following sections to manage the processing of parallel statements for consumer groups when the parallel degree policy is set to AUTO.

In all cases, the parallel statement queue of a given consumer group is managed as a single queue on an Oracle RAC database. Limits for each consumer group apply to all sessions across the Oracle RAC database that belong to that consumer group. The queuing of parallel statements occurs based on the sum of the values of the PARALLEL_SERVERS_TARGET initialization parameter across all database instances.

You can also manage parallel statement queuing for multitenant container databases (CDBs) and pluggable databases (PDBs).

See Also:

8.4.1.1 About Managing the Order of the Parallel Statement Queue

You can use Oracle Database Resource Manager to manage the priority for dequeuing parallel statements from the parallel statement queue across multiple consumer groups.

The parallel statements for a particular consumer group are dequeued in FIFO order by default. With the directives shares, you can determine the order in which the parallel statements of a consumer group are dequeued. You configure these directives with the CREATE_PLAN_DIRECTIVE or UPDATE_PLAN_DIRECTIVE procedure of the DBMS_RESOURCE_MANAGER PL/SQL package. You can also set shares in a CDB resource plan to manage the order of parallel statements among PDBs.

For example, you can create the PQ_HIGH, PQ_MEDIUM, and PQ_LOW consumer groups and map parallel statement sessions to these consumer groups based on priority. You then create a resource plan that sets shares=14 for PQ_HIGH, shares=5 for PQ_MEDIUM, and shares=1 for PQ_LOW. This indicates that PQ_HIGH statements are dequeued with a probability of 70% (14/(14+5+1)=.70) of the time, PQ_MEDIUM dequeued with a probability of 25% (5/(14+5+1)=.25) of the time, and PQ_LOW dequeued with a probability of 5% (1/(14+5+1)=.05) of the time.

If a parallel statement has been queued and you determine that the parallel statement must be run immediately, then you can run the DBMS_RESOURCE_MANAGER.DEQUEUE_PARALLEL_STATEMENT PL/SQL procedure to dequeue the parallel statement.

See Also:

8.4.1.2 About Limiting the Parallel Server Resources for a Consumer Group

You can use Oracle Database Resource Manager to limit the number of parallel servers that parallel statements from lower priority consumer groups can use for parallel statement processing.

Using Oracle Database Resource Manager you can map parallel statement sessions to different consumer groups that each have specific limits on the number of the parallel servers that can be used. Every consumer group has its own individual parallel statement queue. When these limits for consumer groups are specified, parallel statements from a consumer group are queued when its limit would be exceeded.

This limitation becomes useful when a database has high priority and low priority consumer groups. Without limits, a user may issue a large number of parallel statements from a low-priority consumer group that uses all parallel servers. When a parallel statement from a high priority consumer group is issued, the resource allocation directives can ensure that the high priority parallel statement is dequeued first. By limiting the number of parallel servers a low-priority consumer group can use, you can ensure that there are always some parallel servers available for a high priority consumer group.

To limit the parallel servers used by a consumer group, use the parallel_server_limit parameter with the CREATE_PLAN_DIRECTIVE procedure or the new_parallel_server_limit parameter with the UPDATE_PLAN_DIRECTIVE procedure in the DBMS_RESOURCE_MANAGER package. The parallel_server_limit parameter specifies the maximum percentage of the Oracle RAC-wide parallel server pool that is specified by PARALLEL_SERVERS_TARGET that a consumer group can use.

For multitenant container database (CDB) resource plans, the parallel server limit applies to pluggable databases (PDBs). For PDB resource plans or non-CDB resource plans, this limit applies to consumer groups.

For example, on an Oracle RAC database in nonmultitenant configuration, the initialization parameter PARALLEL_SERVERS_TARGET is set to 32 on two nodes so there are a total of 32 x 2 = 64 parallel servers that can be used before queuing begins. You can set up the consumer group PQ_LOW to use 50% of the available parallel servers (parallel_server_limit = 50) and low priority statements can then be mapped to the PQ_LOW consumer group. This scenario limits any parallel statements from the PQ_LOW consumer group to 64 x 50% = 32 parallel servers, even though there are more inactive or unused parallel servers. In this scenario, after the statements from the PQ_LOW consumer group have used 32 parallel servers, statements from that consumer group are queued.

It is possible in one database to have some sessions with the parallelism degree policy set to MANUAL and some sessions set to AUTO. In this scenario, only the sessions with parallelism degree policy set to AUTO can be queued. However, the parallel servers used in sessions where the parallelism degree policy is set to MANUAL are included in the total of all parallel servers used by a consumer group.

See Also:

PARALLEL_SERVERS_TARGET for information about limiting parallel resources for users

8.4.1.3 Specifying a Parallel Statement Queue Timeout for Each Consumer Group

You can use Oracle Database Resource Manager to set specific maximum queue timeouts for consumer groups so that parallel statements do not stay in the queue for long periods of time.

To manage the queue timeout, the parallel_queue_timeout parameter is used with the CREATE_PLAN_DIRECTIVE procedure or the new_parallel_queue_timeout parameter is used with the UPDATE_PLAN_DIRECTIVE procedure in the DBMS_RESOURCE_MANAGER package. The parallel_queue_timeout and new_parallel_queue_timeout parameters specify the time in seconds that a statement can remain in a consumer group parallel statement queue. After the timeout period expires, the statement is either terminated with error ORA-7454 or removed from the parallel statement queue and enabled to run based on the value for the PQ_TIMEOUT_ACTION directive in the resource manager plan.

You can specify queue timeout actions for parallel statements using the PQ_TIMEOUT_ACTION resource manager directive. Setting this directive to CANCEL terminates the statement with the error ORA-7454. Setting this directive to RUN enables the statement to run.

8.4.1.4 Specifying a Degree of Parallelism Limit for Consumer Groups

You can use Oracle Database Resource Manager to the limit the degree of parallelism for specific consumer groups.

Using Oracle Database Resource Manager you can map parallel statement sessions to different consumer groups that each have specific limits for the degree of parallelism in a resource plan.

To manage the limit of parallelism in consumer groups, use the parallel_degree_limit_p1 parameter with the CREATE_PLAN_DIRECTIVE procedure in the DBMS_RESOURCE_MANAGER package or the new_parallel_degree_limit_p1 parameter with the UPDATE_PLAN_DIRECTIVE procedure in the DBMS_RESOURCE_MANAGER package. The parallel_degree_limit_p1 and new_parallel_degree_limit_p1 parameters specify a limit on the degree of parallelism for any operation.

For example, you can create the PQ_HIGH, PQ_MEDIUM, and PQ_LOW consumer groups and map parallel statement sessions to these consumer groups based on priority. You then create a resource plan that specifies degree of parallelism limits so that the PQ_HIGH limit is set to 16, the PQ_MEDIUM limit is set to 8, and the PQ_LOW limit is set to 2.

The degree of parallelism limit is enforced, even if PARALLEL_DEGREE_POLICY is not set to AUTO.

8.4.1.5 Critical Parallel Statement Prioritization

The setting of the PARALLEL_STMT_CRITICAL parameter affects the critical designation of parallel statements in the plan directive with respect to the parallel statement queue.

  • If the PARALLEL_STMT_CRITICAL parameter is set to QUEUE, then parallel statements that have PARALLEL_DEGREE_POLICY set to MANUAL are queued.

  • If the PARALLEL_STMT_CRITICAL parameter is set to BYPASS_QUEUE, then parallel statements bypass the parallel statement queue and execute immediately.

  • If PARALLEL_STMT_CRITICAL is set to FALSE, then that specifies the default behavior and no statement is designated as critical.

Because critical parallel statements bypass the parallel statement queue, the system may encounter more active parallel servers than specified by the PARALLEL_SERVERS_TARGET parameter. Critical parallel statements are allowed to run after the number of parallel servers reaches PARALLEL_MAX_SERVERS, so some critical parallel statements may be downgraded.

The PARALLEL_STMT_CRITICAL column in the DBA_RSRC_PLAN_DIRECTIVES view indicates whether parallel statements are from a consumer group that has been marked critical.

8.4.1.6 A Sample Scenario for Managing Statements in the Parallel Queue

This scenario discusses how to manage statements in the parallel queue with consumer groups set up with Oracle Database Resource Manager.

For this scenario, consider a data warehouse workload that consists of three types of SQL statements:

  • Short-running SQL statements

    Short-running identifies statements running less than one minute. You expect these statements to have very good response times.

  • Medium-running SQL statements

    Medium-running identifies statements running more than one minute, but less than 15 minutes. You expect these statements to have reasonably good response times.

  • Long-running SQL statements

    Long-running identifies statements that are ad-hoc or complex queries running more than 15 minutes. You expect these statements to take a long time.

For this data warehouse workload, you want better response times for the short-running statements. To achieve this goal, you must ensure that:

  • Long-running statements do not use all of the parallel server resources, forcing shorter statements to wait in the parallel statement queue.

  • When both short-running and long-running statements are queued, short-running statements should be dequeued ahead of long-running statements.

  • The DOP for short-running queries is limited because the speedup from a very high DOP is not significant enough to justify the use of a large number of parallel servers.

Example 8-3 shows how to set up consumer groups using Oracle Database Resource Manager to set priorities for statements in the parallel statement queue. Note the following for this example:

  • By default, users are assigned to the OTHER_GROUPS consumer group. If the estimated execution time of a SQL statement is longer than 1 minute (60 seconds), then the user switches to MEDIUM_SQL_GROUP. Because switch_for_call is set to TRUE, the user returns to OTHER_GROUPS when the statement has completed. If the user is in MEDIUM_SQL_GROUP and the estimated execution time of the statement is longer than 15 minutes (900 seconds), the user switches to LONG_SQL_GROUP. Similarly, because switch_for_call is set to TRUE, the user returns to OTHER_GROUPS when the query has completed. The directives used to accomplish the switch process are switch_time, switch_estimate, switch_for_call, and switch_group.

  • After the number of active parallel servers reaches the value of the PARALLEL_SERVERS_TARGET initialization parameter, subsequent parallel statements are queued. The shares directives control the order in which parallel statements are dequeued when parallel servers become available. Because shares is set to 100% for SYS_GROUP in this example, parallel statements from SYS_GROUP are always dequeued first. If no parallel statements from SYS_GROUP are queued, then parallel statements from OTHER_GROUPS are dequeued with probability 70%, from MEDIUM_SQL_GROUP with probability 20%, and LONG_SQL_GROUP with probability 10%.

  • Parallel statements issued from OTHER_GROUPS are limited to a DOP of 4 with the setting of the parallel_degree_limit_p1 directive.

  • To prevent parallel statements of the LONG_SQL_GROUP group from using all of the parallel servers, which could potentially cause parallel statements from OTHER_GROUPS or MEDIUM_SQL_GROUP to wait for long periods of time, its parallel_server_limit directive is set to 50%. This setting means that after LONG_SQL_GROUP has used up 50% of the parallel servers set with the PARALLEL_SERVERS_TARGET initialization parameter, its parallel statements are forced to wait in the queue.

  • Because parallel statements of the LONG_SQL_GROUP group may be queued for a significant amount of time, a timeout is configured for 14400 seconds (4 hours). When a parallel statement from LONG_SQL_GROUP has waited in the queue for 4 hours, the statement is terminated with the error ORA-7454.

Example 8-3 Using consumer groups to set priorities in the parallel statement queue

BEGIN
  DBMS_RESOURCE_MANAGER.CREATE_PENDING_AREA();
 
  /* Create consumer groups.
   * By default, users start in OTHER_GROUPS, which is automatically
   * created for every database.
   */
  DBMS_RESOURCE_MANAGER.CREATE_CONSUMER_GROUP(
    'MEDIUM_SQL_GROUP',
    'Medium-running SQL statements, between 1 and 15 minutes.  Medium priority.');

  DBMS_RESOURCE_MANAGER.CREATE_CONSUMER_GROUP(
    'LONG_SQL_GROUP',
    'Long-running SQL statements of over 15 minutes.  Low priority.');
 
  /* Create a plan to manage these consumer groups */
  DBMS_RESOURCE_MANAGER.CREATE_PLAN(
    'REPORTS_PLAN',
    'Plan for daytime that prioritizes short-running queries');
 
  DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
    'REPORTS_PLAN', 'SYS_GROUP', 'Directive for sys activity',
    shares => 100);
 
  DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
    'REPORTS_PLAN', 'OTHER_GROUPS', 'Directive for short-running queries',
    shares => 70,
    parallel_degree_limit_p1 => 4,
    switch_time => 60, switch_estimate => TRUE, switch_for_call => TRUE,
    switch_group => 'MEDIUM_SQL_GROUP');
 
  DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
    'REPORTS_PLAN', 'MEDIUM_SQL_GROUP', 'Directive for medium-running queries',
    shares => 20,
    parallel_server_limit => 80,
    switch_time => 900, switch_estimate => TRUE, switch_for_call => TRUE,
    switch_group => 'LONG_SQL_GROUP');
 
  DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
    'REPORTS_PLAN', 'LONG_SQL_GROUP', 'Directive for medium-running queries',
    shares => 10,
    parallel_server_limit => 50,
    parallel_queue_timeout => 14400);
 
  DBMS_RESOURCE_MANAGER.SUBMIT_PENDING_AREA();
END;
/

/* Allow all users to run in these consumer groups */
EXEC DBMS_RESOURCE_MANAGER_PRIVS.GRANT_SWITCH_CONSUMER_GROUP(
  'public', 'MEDIUM_SQL_GROUP', FALSE);
 
EXEC DBMS_RESOURCE_MANAGER_PRIVS.GRANT_SWITCH_CONSUMER_GROUP(
  'public', 'LONG_SQL_GROUP', FALSE);

8.4.2 Grouping Parallel Statements with BEGIN_SQL_BLOCK END_SQL_BLOCK

Often it is important for a report or batch job that consists of multiple parallel statements to complete as quickly as possible.

For example, when many reports are launched simultaneously, you may want all of the reports to complete as quickly as possible. However, you also want some specific reports to complete first, rather than all reports finishing at the same time.

If a report contains multiple parallel statements and PARALLEL_DEGREE_POLICY is set to AUTO, then each parallel statement may be forced to wait in the queue on a busy database. For example, the following steps describe a scenario in SQL statement processing:

serial statement
parallel query - dop 8
  -> wait in queue
serial statement
parallel query - dop 32
  -> wait in queue
parallel query - dop 4
  -> wait in queue

For a report to be completed quickly, the parallel statements must be grouped to produce the following behavior:

start SQL block
serial statement
parallel query - dop 8
  -> first parallel query: ok to wait in queue
serial statement
parallel query - dop 32
  -> avoid or minimize wait
parallel query - dop 4
  -> avoid or minimize wait
end SQL block

To group the parallel statements, you can use the BEGIN_SQL_BLOCK and END_SQL_BLOCK procedures in the DBMS_RESOURCE_MANAGER PL/SQL package. For each consumer group, the parallel statement queue is ordered by the time associated with each of the consumer group's parallel statements. Typically, the time associated with a parallel statement is the time that the statement was enqueued, which means that the queue appears to be FIFO. When parallel statements are grouped in a SQL block with the BEGIN_SQL_BLOCK and END_SQL_BLOCK procedures, the first queued parallel statement also uses the time that it was enqueued. However, the second and all subsequent parallel statements receive special treatment and are enqueued using the enqueue time of the first queued parallel statement within the SQL block. With this functionality, the statements frequently move to the front of the parallel statement queue. This preferential treatment ensures that their wait time is minimized.

See Also:

Oracle Database PL/SQL Packages and Types Reference for information about the DBMS_RESOURCE_MANAGER package

8.4.3 About Managing Parallel Statement Queuing with Hints

You can use the NO_STATEMENT_QUEUING and STATEMENT_QUEUING hints in SQL statements to influence whether or not a statement is queued with parallel statement queuing.

  • NO_STATEMENT_QUEUING

    When PARALLEL_DEGREE_POLICY is set to AUTO, this hint enables a statement to bypass the parallel statement queue. However, a statement that bypasses the statement queue can potentially cause the system to exceed the maximum number of parallel execution servers defined by the value of the PARALLEL_SERVERS_TARGET initialization parameter, which determines the limit at which parallel statement queuing is initiated.

    There is no guarantee that the statement that bypasses the parallel statement queue receives the number of parallel execution servers requested because only the number of parallel execution servers available on the system, up to the value of the PARALLEL_MAX_SERVERS initialization parameter, can be allocated.

    For example:

    SELECT /*+ NO_STATEMENT_QUEUING */ last_name, department_name 
      FROM employees e, departments d 
      WHERE e.department_id = d.department_id; 
    
  • STATEMENT_QUEUING

    When PARALLEL_DEGREE_POLICY is not set to AUTO, this hint enables a statement to be considered for parallel statement queuing, but to run only when enough parallel processes are available to run at the requested DOP. The number of available parallel execution servers, before queuing is enabled, is equal to the difference between the number of parallel execution servers in use and the maximum number allowed in the system, which is defined by the PARALLEL_SERVERS_TARGET initialization parameter.

    For example:

    SELECT /*+ STATEMENT_QUEUING */ last_name, department_name 
      FROM employees e, departments d 
      WHERE e.department_id = d.department_id; 

8.5 Types of Parallelism

There are multiple types of parallelism.

This section discusses the types of parallelism in the following topics:

8.5.1 About Parallel Queries

You can use parallel queries and parallel subqueries in SELECT statements and execute in parallel the query portions of DDL statements and DML statements (INSERT, UPDATE, and DELETE).

You can also query external tables in parallel.

The parallelization decision for SQL queries has two components: the decision to parallelize and the degree of parallelism (DOP). These components are determined differently for queries, DDL operations, and DML operations. To determine the DOP, Oracle Database looks at the reference objects:

  • Parallel query looks at each table and index, in the portion of the query to be executed in parallel, to determine which is the reference table. The basic rule is to pick the table or index with the largest DOP.

  • For parallel DML (INSERT, UPDATE, MERGE, and DELETE), the reference object that determines the DOP is the table being modified by an insert, update, or delete operation. Parallel DML also adds some limits to the DOP to prevent deadlock. If the parallel DML statement includes a subquery, the subquery's DOP is equivalent to that for the DML operation.

  • For parallel DDL, the reference object that determines the DOP is the table, index, or partition being created, rebuilt, split, or moved. If the parallel DDL statement includes a subquery, the subquery's DOP is equivalent to the DDL operation.

This section contains the following topics:

See Also:

8.5.1.1 Parallel Queries on Index-Organized Tables

There are several parallel scan methods that are supported on index-organized tables.

These parallel scan methods include:

  • Parallel fast full scan of a nonpartitioned index-organized table

  • Parallel fast full scan of a partitioned index-organized table

  • Parallel index range scan of a partitioned index-organized table

You can use these scan methods for index-organized tables with overflow areas and for index-organized tables that contain LOBs.

Nonpartitioned Index-Organized Tables

Parallel query on a nonpartitioned index-organized table uses parallel fast full scan. Work is allocated by dividing the index segment into a sufficiently large number of block ranges and then assigning the block ranges to parallel execution servers in a demand-driven manner. The overflow blocks corresponding to any row are accessed in a demand-driven manner only by the process, which owns that row.

Partitioned Index-Organized Tables

Both index range scan and fast full scan can be performed in parallel. For parallel fast full scan, parallelization is the same as for nonpartitioned index-organized tables. Depending on the DOP, each parallel execution server gets one or more partitions, each of which contains the primary key index segment and the associated overflow segment, if any.

8.5.1.2 Parallel Queries on Object Types

Parallel queries can be performed on object type tables and tables containing object type columns.

Parallel query for object types supports all of the features that are available for sequential queries on object types, including:

  • Methods on object types

  • Attribute access of object types

  • Constructors to create object type instances

  • Object views

  • PL/SQL and Oracle Call Interface (OCI) queries for object types

There are no limitations on the size of the object types for parallel queries.

The following restrictions apply to using parallel query for object types:

  • A MAP function is needed to execute queries in parallel for queries involving joins and sorts (through ORDER BY, GROUP BY, or set operations). Without a MAP function, the query is automatically executed serially.

  • Parallel DML and parallel DDL are not supported with object types, and such statements are always performed serially.

In all cases where the query cannot execute in parallel because of any of these restrictions, the whole query executes serially without giving an error message.

8.5.1.3 Rules for Parallelizing Queries

A SQL query can only be executed in parallel under certain conditions.

A SELECT statement can be executed in parallel only if one of the following conditions is satisfied:

  • The query includes a statement level or object level parallel hint specification (PARALLEL or PARALLEL_INDEX).

  • The schema objects referred to in the query have a PARALLEL declaration associated with them.

  • Automatic Degree of Parallelism (Auto DOP) has been enabled.

  • Parallel query is forced using the ALTER SESSION FORCE PARALLEL QUERY statement.

In addition, the execution plan should have at least one of the following:

  • A full table scan

  • An index range scan spanning multiple partitions

  • An index fast full scan

  • A parallel table function

See Also:

8.5.2 About Parallel DDL Statements

Parallelism for DDL statements is introduced in this topic.

This section about parallelism for DDL statements contains the following topics:

8.5.2.1 DDL Statements That Can Be Parallelized

You can execute DDL statements in parallel for tables and indexes that are nonpartitioned or partitioned.

The parallel DDL statements for nonpartitioned tables and indexes are:

  • CREATE INDEX

  • CREATE TABLE AS SELECT

  • ALTER TABLE MOVE

  • ALTER INDEX REBUILD

  • ALTER TABLE MODIFY

The parallel DDL statements for partitioned tables and indexes are:

  • CREATE INDEX

  • CREATE TABLE AS SELECT

  • ALTER TABLE {MOVE|SPLIT|COALESCE} PARTITION

  • ALTER INDEX {REBUILD|SPLIT} PARTITION

    • This statement can be executed in parallel only if the (global) index partition being split is usable.

All of these DDL operations can be performed in NOLOGGING mode for either parallel or serial execution.

The CREATE TABLE statement for an index-organized table can be executed in parallel either with or without an AS SELECT clause.

Parallel DDL cannot occur on tables with object columns. Parallel DDL cannot occur on nonpartitioned tables with LOB columns.

8.5.2.2 About Using CREATE TABLE AS SELECT in Parallel

Parallel execution enables you execute the query in parallel and create operations of creating a table as a subquery from another table or set of tables.

This parallel functionality can be extremely useful in the creation of summary or rollup tables.

Note that clustered tables cannot be created and populated in parallel.

Figure 8-4 illustrates creating a summary table from a subquery in parallel.

Figure 8-4 Creating a Summary Table in Parallel

Description of Figure 8-4 follows
Description of "Figure 8-4 Creating a Summary Table in Parallel"
8.5.2.3 Recoverability and Parallel DDL

Parallel DDL is often used to create summary tables or do massive data loads that are standalone transactions, which do not always need to be recoverable.

By switching off Oracle Database logging, no undo or redo log is generated, so the parallel DML operation is likely to perform better, but becomes an all or nothing operation. In other words, if the operation fails, for whatever reason, you must redo the operation, it is not possible to restart it.

If you disable logging during parallel table creation (or any other parallel DDL operation), you should back up the tablespace containing the table after the table is created to avoid loss of the table due to media failure.

Use the NOLOGGING clause of the CREATE TABLE, CREATE INDEX, ALTER TABLE, and ALTER INDEX statements to disable undo and redo log generation.

8.5.2.4 Space Management for Parallel DDL

Creating a table or index in parallel has space management implications.

These space management implications affect both the storage space required during a parallel operation and the free space available after a table or index has been created.

8.5.2.5 Storage Space When Using Dictionary-Managed Tablespaces

When creating a table or index in parallel, each parallel execution server uses the values in the STORAGE clause of the CREATE statement to create temporary segments to store the rows.

A table created with a NEXT setting of 4 MB and a PARALLEL DEGREE of 16 consumes at least 64 megabytes (MB) of storage during table creation because each parallel server process starts with an extent of 4 MB. When the parallel execution coordinator combines the segments, some segments may be trimmed, and the resulting table may be smaller than the requested 64 MB.

8.5.2.6 Free Space and Parallel DDL

When you create indexes and tables in parallel, each parallel execution server allocates a new extent and fills the extent with the table or index data.

For example, if you create an index with a DOP of 4, then the index has at least four extents initially. This allocation of extents is the same for rebuilding indexes in parallel and for moving, splitting, or rebuilding partitions in parallel.

Serial operations require the schema object to have at least one extent. Parallel creations require that tables or indexes have at least as many extents as there are parallel execution servers creating the schema object.

When you create a table or index in parallel, it is possible to create areas of free space. This occurs when the temporary segments used by the parallel execution servers are larger than what is needed to store the rows.

  • If the unused space in each temporary segment is larger than the value of the MINIMUM EXTENT parameter set at the tablespace level, then Oracle Database trims the unused space when merging rows from all of the temporary segments into the table or index. The unused space is returned to the system free space and can be allocated for new extents, but it cannot be coalesced into a larger segment because it is not contiguous space (external fragmentation).

  • If the unused space in each temporary segment is smaller than the value of the MINIMUM EXTENT parameter, then unused space cannot be trimmed when the rows in the temporary segments are merged. This unused space is not returned to the system free space; it becomes part of the table or index (internal fragmentation) and is available only for subsequent insertions or for updates that require additional space.

For example, if you specify a DOP of 3 for a CREATE TABLE AS SELECT statement, but there is only one data file in the tablespace, then internal fragmentation may occur, as shown in Figure 8-5. The areas of free space within the internal table extents of a data file cannot be coalesced with other free space and cannot be allocated as extents.

Figure 8-5 Unusable Free Space (Internal Fragmentation)

Description of Figure 8-5 follows
Description of "Figure 8-5 Unusable Free Space (Internal Fragmentation)"

See Also:

Oracle Database SQL Tuning Guide for more information about creating tables and indexes in parallel

8.5.2.7 Rules for DDL Statements

DDL operations can be executed in parallel under certain conditions.

DDL operations can be executed in parallel only if at least one of the following conditions is satisfied:

  • A PARALLEL clause (declaration) is specified in the syntax. For CREATE TABLE, CREATE INDEX , ALTER INDEX REBUILD, and ALTER INDEX REBUILD PARTITION, the parallel declaration is stored in the data dictionary.
  • Automatic Degree of Parallelism (Auto DOP) has been enabled.

  • Parallel DDL is forced using the ALTER SESSION FORCE PARALLEL DDL statement.

See Also:

8.5.2.8 Rules for CREATE TABLE AS SELECT

The CREATE operation of the CREATE TABLE AS SELECT statement is parallelized based on the rules for parallelizing DDL statements.

In addition, a statement level PARALLEL hint specified in the SELECT part of the statement can also parallelize the DDL operation. For information about rules for parallelizing DDL statements, refer to Rules for DDL Statements.

When the CREATE operation of CREATE TABLE AS SELECT is parallelized, Oracle Database also parallelizes the scan operation if possible.

Even if the DDL part is not parallelized, the SELECT part can be parallelized based on the rules for parallelizing queries.

Automatic Degree of Parallelism (Auto DOP) parallelizes both the DDL and the query parts of the statement.

For information about the rules for determining the degree of parallelism (DOP), refer to Degree of Parallelism Rules for SQL Statements.

8.5.3 About Parallel DML Operations

Parallel DML operations are introduced in the topic.

Parallel DML (PARALLEL INSERT, UPDATE, DELETE, and MERGE) uses parallel execution mechanisms to speed up or scale up large DML operations against large database tables and indexes.

Note:

Multiple DML/parallel DML operations can modify the same object in the same session. There is no longer a requirement to commit each operation separately. This reduces the commit overhead for parallel DML.

Although DML generally includes queries, in this chapter the term DML refers only to INSERT, UPDATE, MERGE, and DELETE operations.

This section discusses the following parallel DML topics:

8.5.3.1 When to Use Parallel DML

Parallel DML is useful in a decision support system (DSS) environment where the performance and scalability of accessing large objects are important. Parallel DML complements parallel query in providing you with both querying and updating capabilities for your DSS databases.

The overhead of setting up parallelism makes parallel DML operations not feasible for short OLTP transactions. However, parallel DML operations can speed up batch jobs running in an OLTP database.

Several scenarios where parallel DML is used include:

8.5.3.1.1 Refreshing Tables in a Data Warehouse System

In a data warehouse system, large tables must be refreshed (updated) periodically with new or modified data from the production system.

You can do this efficiently by using the MERGE statement.

8.5.3.1.2 Creating Intermediate Summary Tables

In a DSS environment, many applications require complex computations that involve constructing and manipulating many large intermediate summary tables.

These summary tables are often temporary and frequently do not need to be logged. Parallel DML can speed up the operations against these large intermediate tables. One benefit is that you can put incremental results in the intermediate tables and perform parallel updates.

In addition, the summary tables may contain cumulative or comparative information which has to persist beyond application sessions; thus, temporary tables are not feasible. Parallel DML operations can speed up the changes to these large summary tables.

8.5.3.1.3 Using Scoring Tables

Many DSS applications score customers periodically based on a set of criteria.

The scores are usually stored in large DSS tables. The score information is then used in making a decision, for example, inclusion in a mailing list.

This scoring activity queries and updates a large number of rows in the table. Parallel DML can speed up the operations against these large tables.

8.5.3.1.4 Updating Historical Tables

Historical tables describe the business transactions of an enterprise over a recent time interval.

Periodically, the DBA deletes the set of oldest rows and inserts a set of new rows into the table. Parallel INSERT SELECT and parallel DELETE operations can speed up this rollover task.

Dropping a partition can also be used to delete old rows. However, the table has to be partitioned by date and with the appropriate time interval.

8.5.3.1.5 Running Batch Jobs

Batch jobs executed in an OLTP database during off hours have a fixed time during which the jobs must complete. A good way to ensure timely job completion is to execute their operations in parallel.

As the workload increases, more computer resources can be added; the scaleup property of parallel operations ensures that the time constraint can be met.

8.5.3.2 Enable Parallel DML Mode

A DML statement can be parallelized only if you have explicitly enabled parallel DML in the session or in the SQL statement.

To enable this mode in a session, run the following SQL statement:

ALTER SESSION ENABLE PARALLEL DML;

To enable parallel DML mode in a specific SQL statement, include the ENABLE_PARALLEL_DML SQL hint. For example:

INSERT /*+ ENABLE_PARALLEL_DML */ …

This mode is required because parallel DML and serial DML have different locking, transaction, and disk space requirements and parallel DML is disabled for a session by default.

When parallel DML is disabled, no DML is executed in parallel even if the PARALLEL hint is used.

When parallel DML is enabled in a session, all DML statements in this session are considered for parallel execution. When parallel DML is enabled in a SQL statement with the ENABLE_PARALLEL_DML hint, only that specific statement is considered for parallel execution. However, even if parallel DML is enabled, the DML operation may still execute serially if there are no parallel hints or no tables with a parallel attribute or if restrictions on parallel operations are violated.

The session's PARALLEL DML mode does not influence the parallelism of SELECT statements, DDL statements, and the query portions of DML statements. If this mode is not set, the DML operation is not parallelized, but scans or join operations within the DML statement may still be parallelized.

When the parallel DML mode has been enabled for a session, you can disable the mode for a specific SQL statement with the DISABLE_PARALLEL_DML SQL hint.

Note:

As of Oracle Database 23ai, ALTER SESSION ENABLE PARALLEL DML not only enables parallel DML within a session, but also allows multiple DML/PDML/QUERY operations on the same object without requiring a separate transaction commit for each operation. This can reduce the overhead of commits in parallel DML.

For more information, refer to Space Considerations for Parallel DML and Restrictions on Parallel DML.

8.5.3.3 Rules for UPDATE, MERGE, and DELETE

An update, merge, or delete operation is parallelized only under certain conditions.

An UPDATE, MERGE, and DELETE operation is parallelized only if at least one of the following conditions is satisfied:

  • The table being updated, merged, or deleted has a PARALLEL declaration set by a previous CREATE TABLE or ALTER TABLE statement.

  • A statement level or object level PARALLEL hint is specified in the DML statement.

  • Automatic Degree of Parallelism (Auto DOP) has been enabled.

  • Parallel DML is forced using the ALTER SESSION FORCE PARALLEL DML statement.

If the statement contains subqueries or updatable views, then they may also be executed in parallel based on the rules for parallelizing queries. The decision to parallelize the UPDATE, MERGE, and DELETE portion is independent of the query portion, and vice versa. Statement level PARALLEL hints or Auto DOP parallelize both the DML and the query portions.

See Also:

8.5.3.4 Rules for INSERT SELECT

An insert operation is executed in parallel only under certain conditions.

An INSERT operation is executed in parallel only if at least one of the following conditions is satisfied:

  • The table being inserted into (the reference object) has a PARALLEL declaration set by a previous CREATE TABLE or ALTER TABLE statement.

  • A statement level or object level PARALLEL hint is specified after the INSERT in the DML statement.

  • Automatic Degree of Parallelism (Auto DOP) has been enabled.

  • Parallel DML is forced using the ALTER SESSION FORCE PARALLEL DML statement.

The decision to parallelize the INSERT operation is independent of the SELECT operation, and vice versa. The SELECT operation can be parallelized based on the rules for parallelizing queries. Statement level PARALLEL hints or Auto DOP parallelize both the INSERT and the SELECT operations.

See Also:

8.5.3.5 Transaction Restrictions for Parallel DML

To execute a DML operation in parallel, the parallel execution coordinator acquires parallel execution servers, and each parallel execution server executes a portion of the work under its own parallel process transaction.

Note the following conditions:

  • Each parallel execution server creates a different parallel process transaction.

  • If you use rollback segments instead of Automatic Undo Management, you may want to reduce contention on the rollback segments by limiting the number of parallel process transactions residing in the same rollback segment. Refer to Oracle Database SQL Language Reference for more information.

The coordinator also has its own coordinator transaction, which can have its own rollback segment. To ensure user-level transactional atomicity, the coordinator uses a two-phase commit protocol to commit the changes performed by the parallel process transactions.

If a PL/SQL procedure or block is executed in a parallel DML-enabled session, then this rule applies to statements in the procedure or block.

8.5.3.6 Rollback Segments

If you use rollback segments instead of Automatic Undo Management, there are some restrictions when using parallel DML.

See Also:

Oracle Database SQL Language Reference for information about restrictions for parallel DML and rollback segments

8.5.3.7 Recovery for Parallel DML

The time required to roll back a parallel DML operation is roughly equal to the time it takes to perform the forward operation.

Oracle Database supports parallel rollback after transaction and process failures, and after instance and system failures. Oracle Database can parallelize both the rolling forward stage and the rolling back stage of transaction recovery.

See Also:

Oracle Database Backup and Recovery User’s Guide for details about parallel rollback

8.5.3.7.1 Transaction Recovery for User-Issued Rollback

A user-issued rollback in a transaction failure due to statement error is performed in parallel by the parallel execution coordinator and the parallel execution servers.

The rollback takes approximately the same amount of time as the forward transaction.

8.5.3.7.2 Process Recovery

Recovery from the failure of a parallel execution coordinator or parallel execution server is performed by the PMON process.

If a parallel execution server or a parallel execution coordinator fails, then PMON rolls back the work from that process and all other processes in the transaction roll back their changes.

8.5.3.7.3 System Recovery

Recovery from a system failure requires a new startup.

Recovery is performed by the SMON process and any recovery server processes spawned by SMON. Parallel DML statements may be recovered using parallel rollback. If the initialization parameter COMPATIBLE is set to 8.1.3 or greater, Fast-Start On-Demand Rollback enables terminated transactions to be recovered, on demand, one block at a time.

8.5.3.8 Space Considerations for Parallel DML

Parallel UPDATE uses the existing free space in the object, while direct-path INSERT gets new extents for the data.

Space usage characteristics may be different in parallel than serial execution because multiple concurrent child transactions modify the object.

8.5.3.9 Restrictions on Parallel DML

There are several restrictions that apply to parallel DML.

The following restrictions apply to parallel DML (including direct-path INSERT):

  • Intra-partition parallelism for UPDATE, MERGE, and DELETE operations require that the COMPATIBLE initialization parameter be set to 23.0 or greater.

  • Parallel DML operations cannot be done on tables with triggers.

  • Replication functionality is not supported for parallel DML.

  • Parallel DML can be done on tables with object columns provided the object columns are not accessed.

  • Parallel DML can be done on tables with LOB columns provided the table is partitioned. However, intra-partition parallelism is not supported.

    For non-partitioned tables with LOB columns, parallel INSERT operations are supported provided that the LOB columns are declared as SecureFiles LOBs. Parallel UPDATE, DELETE, and MERGE operations on such tables are not supported.

  • A DML operation cannot be executed in parallel if it is in a distributed transaction or if the DML or the query operation is on a remote object.

  • Clustered tables are not supported.

  • Parallel UPDATE, DELETE, and MERGE operations are not supported for private temporary tables.

  • Parallel DML is not supported on a table with bitmap indexes if the table is not partitioned.

Note:

As of Oracle Database 23ai, some previous restrictions on DML and PDML after direct-path inserts have been lifted. You can now do the following immediately after a direct-path insert:
  • Query a table multiple times after one or more PDMLs on the same table within a session.
  • Perform conventional DML (serial and parallel) on the same table within the same session.
  • Multiple direct loads (serial and parallel) on the same table in the same session.
Note that the restrictions above are lifted if the following conditions are satisfied:
  • The table is a heap table.
  • The table is in a tablespace that was created with SEGMENT SPACE MANAGEMENT AUTO and AUTOALLOCATE.
  • The initialization parameter COMPATIBLE is set to 23.0 or greater.
8.5.3.9.1 Partitioning Key Restriction

You can only update the partitioning key of a partitioned table to a new value if the update does not cause the row to move to a new partition.

The update is possible if the table is defined with the row movement clause enabled.

8.5.3.9.2 Function Restrictions

The function restrictions for parallel DML are the same as those for parallel DDL and parallel query.

See About Parallel Execution of Functions for more information.

8.5.3.10 Data Integrity Restrictions

The interactions of integrity constraints and parallel DML statements are introduced in the topic.

This section contains following topics:

8.5.3.10.1 NOT NULL and CHECK

The integrity constraints for NOT NULL and CHECK are discussed in this topic.

NOT NULL and CHECK integrity constraints are allowed. They are not a problem for parallel DML because they are enforced on the column and row level, respectively.

8.5.3.10.2 UNIQUE and PRIMARY KEY

The integrity constraints for UNIQUE and PRIMARY KEY are discussed in this topic.

UNIQUE and PRIMARY KEY integrity constraints are allowed.

8.5.3.10.3 FOREIGN KEY (Referential Integrity)

Restrictions for referential integrity occur whenever a DML operation on one table could cause a recursive DML operation on another table.

These restrictions also apply when, to perform an integrity check, it is necessary to see simultaneously all changes made to the object being modified.

Table 8-1 lists all of the operations that are possible on tables that are involved in referential integrity constraints.

Table 8-1 Referential Integrity Restrictions

DML Statement Issued on Parent Issued on Child Self-Referential

INSERT

(Not applicable)

Not parallelized

Not parallelized

MERGE

(Not applicable)

Not parallelized

Not parallelized

UPDATE No Action

Supported

Supported

Not parallelized

DELETE No Action

Supported

Supported

Not parallelized

DELETE Cascade

Not parallelized

(Not applicable)

Not parallelized

8.5.3.10.4 Delete Cascade

The delete cascade data integrity restrictions are discussed in this topic.

Deletion on tables having a foreign key with delete cascade is not parallelized because parallel execution servers attempt to delete rows from multiple partitions (parent and child tables).

8.5.3.10.5 Self-Referential Integrity

DML on tables with self-referential integrity constraints is not parallelized if the referenced keys (primary keys) are involved.

For DML on all other columns, parallelism is possible.

8.5.3.10.6 Deferrable Integrity Constraints

The deferrable integrity constraints are discussed in this topic.

If any deferrable constraints apply to the table being operated on, the DML operation is not executed in parallel.

8.5.3.11 Trigger Restrictions

A DML operation is not executed in parallel if the affected tables contain enabled triggers that may get invoked as a result of the statement.

This implies that DML statements on tables that are being replicated are not parallelized.

Relevant triggers must be disabled to parallelize DML on the table. If you enable or disable triggers, then the dependent shared cursors are invalidated.

8.5.3.12 Distributed Transaction Restrictions

The distributed transaction restrictions are discussed in this topic.

A DML operation cannot be executed in parallel if it is in a distributed transaction or if the DML or the query operation is on a remote object.

8.5.3.13 Examples of Distributed Transaction Parallelization

Several examples of distributed transaction processing are shown in this topic.

In the first example, the DML statement queries a remote object. The DML operation is executed serially without notification because it references a remote object.

INSERT /*+ APPEND PARALLEL (t3,2) */ INTO t3 SELECT * FROM t4@dblink;

In the next example, the DML operation is applied to a remote object. The DELETE operation is not parallelized because it references a remote object.

DELETE /*+ PARALLEL (t1, 2) */ FROM t1@dblink;

In the last example, the DML operation is in a distributed transaction. The DELETE operation is not executed in parallel because it occurs in a distributed transaction (which is started by the SELECT statement).

SELECT * FROM t1@dblink; 
 DELETE /*+ PARALLEL (t2,2) */ FROM t2;
 COMMIT; 
8.5.3.14 Concurrent Execution of Union All

Set operators like UNION or UNION ALL consist of multiple queries (branches) combined to a single SQL statement.

Traditionally, set operators are processed in a sequential manner. Individual branches can be processed in serial or parallel, but only one branch at a time, one branch after another. While this approach satisfies many use cases, there are situations where the processing of multiple branches of a UNION or UNION ALL statement should occur concurrently. The most typical situation is when several or all branches are remote SQL statements. In this situation, concurrent processing on all participating remote systems is desired to speed up the overall processing time without increasing the workload of any participating system.

The default behavior of concurrent execution for UNION or UNION ALL statements is controlled by the setting of the OPTIMIZER_FEATURES_ENABLE initialization parameter. When set to 12.1.0.1 or higher, concurrent execution is enabled by default. Any statement where at least one branch of the statement is local and is considered being processed in parallel, the entire UNION or UNION ALL statement is also processed concurrently. The system calculates the DOP for every individual local branch of the statement and chooses the highest DOP for the execution of the entire UNION or UNION ALL statement. The system then works concurrently on as many branches as possible, using the chosen DOP both for parallelization of the branches that are processed in parallel, and as concurrent workers on serial and remote statements.

When the OPTIMIZER_FEATURES_ENABLE initialization parameter is set to a value less than 12.1.0.1, concurrent execution of UNION or UNION ALL statements must be enabled explicitly by using the PQ_CONCURRENT_UNION hint.

However, unlike the sequential processing of one branch after another, the concurrent processing does not guarantee an ordered return of the results of the individual branches. If an ordered return of one branch after another is required, then you either must disable concurrent processing using the NO_PQ_CONCURRENT_UNION hint or you must augment the SQL statement to uniquely identify individual branches of the statement and to sort on this specified identifier.

UNION or UNION ALL statements that only consist of serial or remote branches are not processed concurrently unless specifically using the PQ_CONCURRENT_UNION hint. The DOP of this SQL statement is at most the number of serial and remote inputs.

Whether or not concurrent processing of a UNION or UNION ALL statement occurs can be easily identified with the execution plan of the SQL statements. When executed in parallel, the execution of serial and remote branches is managed with a row source identifiable as PX SELECTOR. Statements that are not processed concurrently show the query coordinator (QC) as coordinator of serial and remote branches.

In Example 8-4, the SQL statement consists of local and remote branches. The SQL statement loads information about gold and platinum customers from the local database, and the information about customers from three major cities from remote databases. Because the local select statements occur in parallel, this processing is automatically performed in parallel. Each serial branch is executed by only one parallel execution server process. Because each parallel execution server can execute one serial branch, they are executed concurrently.

Example 8-4 Explain Plan for UNION ALL

SQL> EXPLAIN PLAN FOR INSERT INTO all_customer 
  SELECT * FROM GOLD_customer UNION ALL 
  SELECT * FROM PLATINUM_customer UNION ALL 
  SELECT * FROM SF_customer@san_francisco UNION ALL 
  SELECT * FROM LA_customer@los_angeles UNION ALL 
  SELECT * FROM LV_customer@las_vegas;
-------------------------------------------------------------------------------
| Id | Operation              | Name          | TQ/Ins |IN-OUT | PQ Distrib|
-------------------------------------------------------------------------------
| 0 | INSERT STATEMENT         |              |        |       |           |
| 1 |  LOAD TABLE CONVENTIONAL | ALL_CUSTOMER |        |       |           |
| 2 |   PX COORDINATOR         |              |        |       |           |
| 3 |    PX SEND QC (RANDOM)   | :TQ10003     |        | P->S  | QC (RAND) |
| 4 |     UNION-ALL            |              |        | PCWP  |           |
| 5 |      PX BLOCK ITERATOR   |              |        | PCWC  |           |
| 6 |       TABLE ACCESS FULL  | GOLD_CUSTOMER|        | PCWP  |           |
| 7 |      PX BLOCK ITERATOR   |              |        | PCWC  |           |
| 8 |       TABLE ACCESS FULL  | PLATINUM_CUST|        | PCWP  |           |
| 9 |      PX SELECTOR         |              |        | PCWP  |           |
|10 |       REMOTE             | SF_CUSTOMER  |        | PCWP  |           |
|11 |      PX SELECTOR         |              |        | PCWP  |           |
|12 |       REMOTE             | LA_CUSTOMER  |        | PCWP  |           |
|13 |      PX SELECTOR         |              |        | PCWP  |           |
|14 |       REMOTE             | LV_CUSTOMER  |        | PCWP  |           |
-------------------------------------------------------------------------------

8.5.4 About Parallel Execution of Functions

SQL statements can contain user-defined functions written in PL/SQL, in Java, or as external procedures in C that can appear as part of the SELECT list, SET clause, or WHERE clause.

When the SQL statement is parallelized, these functions are executed on a per-row basis by the parallel execution server process. Any PL/SQL package variables or Java static attributes used by the function are entirely private to each individual parallel execution process and are newly initialized when each row is processed, rather than being copied from the original session. Because of this process, not all functions generate correct results if executed in parallel.

User-written table functions can appear in the statement's FROM list. These functions act like source tables in that they produce row output. Table functions are initialized once during the statement at the start of each parallel execution process. All variables are entirely private to the parallel execution process.

This section contains the following topics:

8.5.4.1 Functions in Parallel Queries

User functions can be executed in parallel in a SQL query statement, or a subquery in a DML or DDL statement.

In a SELECT statement or a subquery in a DML or DDL statement, a user-written function may be executed in parallel in any of the following cases:

  • If it has been declared with the PARALLEL_ENABLE keyword

  • If it is declared in a package or type and has a PRAGMA RESTRICT_REFERENCES clause that indicates all of WNDS, RNPS, and WNPS

  • If it is declared with CREATE FUNCTION and the system can analyze the body of the PL/SQL code and determine that the code neither writes to the database nor reads or modifies package variables

Other parts of a query or subquery can sometimes execute in parallel even if a given function execution must remain serial.

See Also:

8.5.4.2 Functions in Parallel DML and DDL Statements

A user function can be executed in a parallel DML or DDL statement under certain conditions.

In a parallel DML or DDL statement, as in a parallel query, a user-written function may be executed in parallel in any of the following cases:

  • If it has been declared with the PARALLEL_ENABLE keyword

  • If it is declared in a package or type and has a PRAGMA RESTRICT_REFERENCES clause that indicates all of RNDS, WNDS, RNPS, and WNPS

  • If it is declared with the CREATE FUNCTION statement and the system can analyze the body of the PL/SQL code and determine that the code neither reads nor writes to the database or reads or modifies package variables

For a parallel DML statement, any function call that cannot be executed in parallel causes the entire DML statement to be executed serially. For an INSERT SELECT or CREATE TABLE AS SELECT statement, function calls in the query portion are parallelized according to the parallel query rules described in this section. The query may be parallelized even if the remainder of the statement must execute serially, or vice versa.

8.5.5 About Other Types of Parallelism

An Oracle Database can use parallelism in multiple types of operations.

In addition to parallel SQL execution, Oracle Database can use parallelism for the following types of operations:

  • Parallel recovery

  • Parallel propagation (replication)

  • Parallel load (external tables and the SQL*Loader utility)

Like parallel SQL, parallel recovery, propagation, and external table loads are performed by a parallel execution coordinator and multiple parallel execution servers. Parallel load using SQL*Loader, however, uses a different mechanism.

The behavior of the parallel execution coordinator and parallel execution servers may differ, depending on what kind of operation they perform (SQL, recovery, or propagation). For example, if all parallel execution servers in the pool are occupied and the maximum number of parallel execution servers has been started:

  • In parallel SQL and external table loads, the parallel execution coordinator switches to serial processing.

  • In parallel propagation, the parallel execution coordinator returns an error.

For a given session, the parallel execution coordinator coordinates only one kind of operation. A parallel execution coordinator cannot coordinate, for example, parallel SQL and parallel recovery or propagation at the same time.

See Also:

8.5.6 Degree of Parallelism Rules for SQL Statements

The parallelization decision for SQL statements has two components: the decision to parallelize and the degree of parallelism (DOP).

These components are determined differently for queries, DDL operations, and DML operations.

The decision to parallelize is discussed in the following sections:

The degree of parallelism for various types of SQL statements can be determined by statement or object level PARALLEL hints, PARALLEL clauses, ALTER SESSION FORCE PARALLEL statements, automatic degree of parallelism (Auto DOP), or table or index PARALLEL declarations. When more than one of these methods are used, the Oracle Database uses precedence rules to determine which method is used to determine the DOP.

Table 8-2 shows the precedence rules for determining the degree of parallelism (DOP) for various types of SQL statements. In the table, the smaller priority number indicates that the method takes precedence over higher numbers. For example, priority (1) takes precedence over priority (2), priority (3), priority (4), and priority (5).

Table 8-2 Parallelization Priority Order

Parallel Operation Statement Level PARALLEL Hint Object Level PARALLEL Hint PARALLEL Clause ALTER SESSION Auto DOP Parallel Declaration

Parallel query table/index scan. For more information, refer to Rules for Parallelizing Queries.

Priority (1)

Priority (2)

N/A

Priority (3) FORCE PARALLEL QUERY

Priority (4)

Priority (5)

Parallel UPDATE ,DELETE, or MERGE. For more information, refer to Rules for UPDATE, MERGE, and DELETE.

Priority (1)

Priority (2)

N/A

Priority (3) FORCE PARALLEL DML

Priority (4)

Priority (5) of the target table

INSERT operation of parallel INSERT... SELECT . For more information, refer to Rules for INSERT SELECT.

Priority (1)

Priority (2)

N/A

Priority (3) FORCE PARALLEL DML

Priority (4)

Priority (5) of table being inserted into

SELECT operation of INSERT SELECT when INSERT is serial. For more information, refer to Rules for INSERT SELECT.

Priority (1)

Priority (2)

N/A

Priority (3) FORCE PARALLEL QUERY

Priority (4)

Priority (5) of table being selected from

CREATE operation of parallel CREATE TABLE AS SELECT (partitioned or nonpartitioned table). For more information, refer to Rules for CREATE TABLE AS SELECT.

Priority (1)

N/A

Priority (4)

Priority (2) FORCE PARALLEL DDL

Priority (3)

N/A

SELECT operation of CREATE TABLE AS SELECT when CREATE is serial. For more information, refer to Rules for CREATE TABLE AS SELECT.

Priority (1)

Priority (2)

N/A

Priority (3) FORCE PARALLEL QUERY

Priority (4)

Priority (5)

Other DDL operations. For more information, refer to Rules for DDL Statements.

N/A

N/A

Priority (3)

Priority (1) FORCE PARALLEL DDL

Priority (2)

N/A

See Also:

8.6 About Initializing and Tuning Parameters for Parallel Execution

You can use parameters to initialize and tune parallel execution.

Oracle Database computes defaults for the parallel execution parameters based on the value at database startup of CPU_COUNT and PARALLEL_THREADS_PER_CPU. The parameters can also be manually tuned, increasing or decreasing their values to suit specific system configurations or performance goals. For example, on systems where parallel execution is never used, PARALLEL_MAX_SERVERS can be set to zero.

You can also manually tune parallel execution parameters. Parallel execution is enabled by default.

Initializing and tuning parallel execution is discussed in the following topics:

8.6.1 Default Parameter Settings

Oracle Database automatically sets parallel execution parameters by default.

The parallel execution parameters are shown in Table 8-3.

Table 8-3 Parameters and Their Defaults

Parameter Default Comments

PARALLEL_ADAPTIVE_MULTI_USER

FALSE

Causes parallel execution SQL to throttle degree of parallelism (DOP) requests to prevent system overload.

PARALLEL_ADAPTIVE_MULTI_USER is deprecated in Oracle Database 12c Release 2 (12.2.0.1) to be desupported in a future release. Oracle recommends using parallel statement queuing instead.

PARALLEL_DEGREE_LIMIT

CPU

Controls the maximum DOP a statement can have when automatic DOP is in use. The maximum DOP is

SUM(CPU_COUNT)*PARALLEL_THREADS_PER_CPU

The value AUTO for PARALLEL_DEGREE_LIMIT has the same functionality as the value CPU.

PARALLEL_DEGREE_POLICY

MANUAL

Controls whether auto DOP, parallel statement queuing and in-memory parallel execution are used. By default, all of these features are disabled.

PARALLEL_EXECUTION_MESSAGE_SIZE

16 KB

Specifies the size of the buffers used by the parallel execution servers to communicate among themselves and with the query coordinator. These buffers are allocated out of the shared pool.

PARALLEL_FORCE_LOCAL

FALSE

Restricts parallel execution to the current Oracle RAC instance.

PARALLEL_INSTANCE_GROUP None. By default, parallel execution is enabled across all currently active instances. Lets you restrict parallel query operations to a limited number of instances. Used in conjunction with services and also with the deprecated parameter INSTANCE_GROUPS.

PARALLEL_MAX_ SERVERS

See PARALLEL_MAX_SERVERS.

Specifies the maximum number of parallel execution processes and parallel recovery processes for an instance. As demand increases, Oracle Database increases the number of processes from the number created at instance startup up to this value.

If you set this parameter too low, some queries may not have a parallel execution process available to them during query processing. If you set it too high, memory resource shortages may occur during peak periods, which can degrade performance.

PARALLEL_MIN_DEGREE 1 Controls the minimum degree of parallelism computed by automatic degree of parallelism.

PARALLEL_MIN_SERVERS

CPU_COUNT * PARALLEL_THREADS_PER_CPU * 2

Specifies the number of parallel execution processes to be started and reserved for parallel operations, when Oracle Database is started up. Increasing this setting can help balance the startup cost of a parallel statement, but requires greater memory usage as these parallel execution processes are not removed until the database is shut down.

PARALLEL_MIN_PERCENT

0

Specifies the minimum percentage of requested parallel execution processes required for parallel execution. With the default value of 0, a parallel statement executes serially if no parallel server processes are available.

PARALLEL_MIN_TIME_THRESHOLD

AUTO

Specifies the execution time, as estimated by the optimizer, above which a statement is considered for automatic parallel query and automatic derivation of DOP. Note that the interpretation of AUTO is dependent on whether or not Database In-Memory is used.

PARALLEL_SERVERS_TARGET

See PARALLEL_SERVERS_TARGET.

Specifies the number of parallel execution server processes available to run queries before parallel statement queuing is used. Note that parallel statement queuing is only active if PARALLEL_DEGREE_POLICY is set to AUTO.

PARALLEL_THREADS_PER_CPU

1

Describes the number of parallel execution processes or threads that a CPU can handle during parallel execution.

You can set some parameters in such a way that Oracle Database is constrained. For example, if you set PROCESSES to 20, you are not be able to get 25 child processes.

See Also:

Oracle Database Reference for more information about initialization parameters

8.6.2 Forcing Parallel Execution for a Session

Youu can force parallelism for a session.

If you are sure you want to execute in parallel and want to avoid setting the DOP for a table or modifying the queries involved, you can force parallelism with the following statement:

ALTER SESSION FORCE PARALLEL QUERY;

All subsequent queries are executed in parallel provided no restrictions are violated. You can also force DML and DDL statements. This clause overrides any parallel clause specified in subsequent statements in the session, but is overridden by a parallel hint.

In typical OLTP environments, for example, the tables are not set parallel, but nightly batch scripts may want to collect data from these tables in parallel. By setting the DOP in the session, the user avoids altering each table in parallel and then altering it back to serial when finished.

8.6.3 Tuning General Parameters for Parallel Execution

The discussion about tuning general parameters for parallel execution is introduced in the topic.

This section discusses the following topics:

8.6.3.1 Parameters Establishing Resource Limits for Parallel Operations

You can set initialization parameters to determine resource limits.

The parameters that establish resource limits are discussed in the following topics:

See Also:

Oracle Database Reference for information about initialization parameters

8.6.3.1.1 PARALLEL_FORCE_LOCAL

The PARALLEL_FORCE_LOCAL parameter specifies whether a SQL statement executed in parallel is restricted to a single instance in an Oracle RAC environment.

By setting this parameter to TRUE, you restrict the scope of the parallel server processed to the single Oracle RAC instance where the query coordinator is running.

The recommended value for the PARALLEL_FORCE_LOCAL parameter is FALSE.

See Also:

Oracle Database Reference for information about the PARALLEL_FORCE_LOCAL initialization parameter

8.6.3.1.2 PARALLEL_MAX_SERVERS

The PARALLEL_MAX_SERVERS parameter specifies the maximum number of parallel execution processes and parallel recovery processes for an instance.

As demand increases, Oracle Database increases the number of processes from the number created at instance startup up to this value.

For example, setting the value to 64 enables you to run four parallel queries simultaneously, if each query is using two worker sets with a DOP of 8 for each set.

When Users Have Too Many Processes

When concurrent users have too many query server processes, memory contention (paging), I/O contention, or excessive context switching can occur.

This contention can reduce system throughput to a level lower than if parallel execution were not used. Increase the PARALLEL_MAX_SERVERS value only if the system has sufficient memory and I/O bandwidth for the resulting load.

You can use performance monitoring tools of the operating system to determine how much memory, swap space and I/O bandwidth are free. Look at the run queue lengths for both your CPUs and disks, and the service time for I/O operations on the system. Verify that the system has sufficient swap space to add more processes. Limiting the total number of query server processes might restrict the number of concurrent users who can execute parallel operations, but system throughput tends to remain stable.

When to Limit the Number of Resources for a User using a Consumer Group

When necessary, you can limit the amount of parallelism available to a given user by establishing a resource consumer group for the user.

Do this to limit the number of sessions, concurrent logons, and the number of parallel processes that any one user or group of users can have.

Each query server process working on a parallel execution statement is logged on with a session ID. Each process counts against the user's limit of concurrent sessions. For example, to limit a user to 10 parallel execution processes, set the user's limit to 11. One process is for the parallel execution coordinator and the other 10 consist of two sets of query servers. This would allow one session for the parallel execution coordinator and 10 sessions for the parallel execution processes.

See Also:

8.6.3.1.3 PARALLEL_MIN_PERCENT

The PARALLEL_MIN_PERCENT parameter enables users to wait for an acceptable DOP, depending on the application in use.

The recommended value for the PARALLEL_MIN_PERCENT parameter is 0 (zero). Setting this parameter to values other than 0 (zero) causes Oracle Database to return an error when the requested DOP cannot be satisfied by the system at a given time. For example, if you set PARALLEL_MIN_PERCENT to 50, which translates to 50 percent, and the DOP is reduced by 50 percent or greater because of the adaptive algorithm or because of a resource limitation, then Oracle Database returns ORA-12827. For example:

SELECT /*+ FULL(e) PARALLEL(e, 8) */ d.department_id, SUM(SALARY)
  FROM employees e, departments d WHERE e.department_id = d.department_id
  GROUP BY d.department_id ORDER BY d.department_id; 

Oracle Database responds with this message:

ORA-12827: insufficient number of parallel execution servers

See Also:

Oracle Database Reference for information about the PARALLEL_MIN_PERCENT initialization parameter

8.6.3.1.4 PARALLEL_MIN_SERVERS

The PARALLEL_MIN_SERVERS parameter specifies the number of processes to be started in a single instance that are reserved for parallel operations.

Setting PARALLEL_MIN_SERVERS balances the startup cost against memory usage. Processes started using PARALLEL_MIN_SERVERS do not exit until the database is shut down. This way, when a query is issued, the processes are likely to be available.

See Also:

Oracle Database Reference for information about the PARALLEL_MIN_SERVERS initialization parameter

8.6.3.1.5 PARALLEL_MIN_TIME_THRESHOLD

The PARALLEL_MIN_TIME_THRESHOLD parameter specifies the minimum execution time a statement should have before the statement is considered for automatic degree of parallelism.

See Also:

Oracle Database Reference for information about the PARALLEL_MIN_TIME_THRESHOLD initialization parameter

8.6.3.1.6 PARALLEL_SERVERS_TARGET

The PARALLEL_DEGREE_POLICY parameter specifies the number of parallel server processes allowed to run parallel statements before statement queuing is used.

When PARALLEL_DEGREE_POLICY is set to AUTO, statements that require parallel execution are queued if the number of parallel processes currently in use on the system equals or is greater than PARALLEL_SERVERS_TARGET. This is not the maximum number of parallel server processes allowed on a system (that is controlled by PARALLEL_MAX_SERVERS). However, PARALLEL_SERVERS_TARGET and parallel statement queuing is used to ensure that each statement that requires parallel execution is allocated the necessary parallel server resources and the system is not flooded with too many parallel server processes.

See Also:

Oracle Database Reference for information about the PARALLEL_SERVERS_TARGET initialization parameter

8.6.3.1.7 SHARED_POOL_SIZE

The SHARED_POOL_SIZE parameter specifies the memory size of the shared pool.

Parallel execution requires memory resources in addition to those required by serial SQL execution. Additional memory is used for communication and passing data between query server processes and the query coordinator.

Oracle Database allocates memory for query server processes from the shared pool. Tune the shared pool as follows:

  • Allow for other clients of the shared pool, such as shared cursors and stored procedures.

  • Remember that larger values improve performance in multiuser systems, but smaller values use less memory.

  • You can then monitor the number of buffers used by parallel execution and compare the shared pool PX msg pool to the current high water mark reported in output from the view V$PX_PROCESS_SYSSTAT.

    Note:

    If you do not have enough memory available, error message 12853 occurs (insufficient memory for PX buffers: current stringK, max needed stringK). This is caused by having insufficient SGA memory available for PX buffers. You must reconfigure the SGA to have at least (MAX - CURRENT) bytes of additional memory.

By default, Oracle Database allocates parallel execution buffers from the shared pool.

If Oracle Database displays the following error on startup, you should reduce the value for SHARED_POOL_SIZE low enough so your database starts:

ORA-27102: out of memory 
SVR4 Error: 12: Not enough space 

After reducing the value of SHARED_POOL_SIZE, you might see the error:

ORA-04031: unable to allocate 16084 bytes of shared memory 
   ("SHARED pool","unknown object","SHARED pool heap","PX msg pool") 

If so, execute the following query to determine why Oracle Database could not allocate the 16,084 bytes:

SELECT NAME, SUM(BYTES) FROM V$SGASTAT WHERE UPPER(POOL)='SHARED POOL' 
  GROUP BY ROLLUP (NAME); 

Your output should resemble the following:

NAME                       SUM(BYTES) 
-------------------------- ---------- 
PX msg pool                   1474572 
free memory                    562132
                              2036704 

If you specify SHARED_POOL_SIZE and the amount of memory you specify to reserve is bigger than the pool, Oracle Database does not allocate all the memory it can get. Instead, it leaves some space. When the query runs, Oracle Database tries to get what it needs. Oracle Database uses the 560 KB and needs another 16 KB when it fails. The error does not report the cumulative amount that is needed. The best way of determining how much more memory is needed is to use the formulas in Additional Memory Requirements for Message Buffers.

To resolve the problem in the current example, increase the value for SHARED_POOL_SIZE. As shown in the sample output, the SHARED_POOL_SIZE is about 2 MB. Depending on the amount of memory available, you could increase the value of SHARED_POOL_SIZE to 4 MB and attempt to start your database. If Oracle Database continues to display an ORA-4031 message, gradually increase the value for SHARED_POOL_SIZE until startup is successful.

See Also:

Oracle Database Reference for information about the SHARED_POOL_SIZE initialization parameter

8.6.3.1.8 Additional Memory Requirements for Message Buffers

Additional memory requirements for message buffers and cursors when using parallel execution plans are discussed in this topic.

After you determine the initial setting for the shared pool, you must calculate additional memory requirements for message buffers and determine how much additional space you need for cursors.

Required Memory for Message Buffers

You must increase the value for the SHARED_POOL_SIZE parameter to accommodate message buffers. The message buffers allow query server processes to communicate with each other.

Oracle Database uses a fixed number of buffers for each virtual connection between producer query servers and consumer query servers. Connections increase as the square of the DOP increases. For this reason, the maximum amount of memory used by parallel execution is bound by the highest DOP allowed on your system. You can control this value by using either the PARALLEL_MAX_SERVERS parameter or by using policies and profiles.

To calculate the amount of memory required, use one of the following formulas:

  • For SMP systems:

    mem in bytes = (3 x size x users x groups x connections)
    
  • For Oracle Real Application Clusters and MPP systems:

    mem in bytes = ((3 x local) + (2 x remote)) x (size x users x groups) 
      / instances
    

Each instance uses the memory computed by the formula.

The terms are:

  • SIZE = PARALLEL_EXECUTION_MESSAGE_SIZE

  • USERS = the number of concurrent parallel execution users that you expect to have running with the optimal DOP

  • GROUPS = the number of query server process groups used for each query

    A simple SQL statement requires only one group. However, if your queries involve subqueries which are processed in parallel, then Oracle Database uses an additional group of query server processes.

  • CONNECTIONS = (DOP2 + 2 x DOP)

    If your system is a cluster or MPP, then you should account for the number of instances because this increases the DOP. In other words, using a DOP of 4 on a two-instance cluster results in a DOP of 8. A value of PARALLEL_MAX_SERVERS times the number of instances divided by four is a conservative estimate to use as a starting point.

  • LOCAL = CONNECTIONS/INSTANCES

  • REMOTE = CONNECTIONS - LOCAL

Add this amount to your original setting for the shared pool. However, before setting a value for either of these memory structures, you must also consider additional memory for cursors, as explained in the following section.

Additional Memory for Cursors

Parallel execution plans consume more space in the SQL area than serial execution plans. You should regularly monitor shared pool resource use to ensure that the memory used by both messages and cursors can accommodate your system's processing requirements.

8.6.3.1.9 Monitor Memory Usage After Processing Begins

Whether you are using automated or manual tuning, you should monitor usage on an on-going basis to ensure the size of memory is not too large or too small.

The formulas in this section are just starting points. To ensure the correct memory size, tune the shared pool using the following query:

SELECT POOL, NAME, SUM(BYTES) FROM V$SGASTAT WHERE POOL LIKE '%pool%'
  GROUP BY ROLLUP (POOL, NAME);

Your output should resemble the following:

POOL        NAME                       SUM(BYTES) 
----------- -------------------------- ---------- 
shared pool Checkpoint queue                38496 
shared pool KGFF heap                        1964 
shared pool KGK heap                         4372 
shared pool KQLS heap                     1134432 
shared pool LRMPD SGA Table                 23856 
shared pool PLS non-lib hp                   2096 
shared pool PX subheap                     186828 
shared pool SYSTEM PARAMETERS               55756 
shared pool State objects                 3907808 
shared pool character set memory            30260 
shared pool db_block_buffers               200000 
shared pool db_block_hash_buckets           33132 
shared pool db_files                       122984 
shared pool db_handles                      52416 
shared pool dictionary cache               198216 
shared pool dlm shared memory             5387924 
shared pool event statistics per sess      264768 
shared pool fixed allocation callback        1376 
shared pool free memory                  26329104 
shared pool gc_*                            64000 
shared pool latch nowait fails or sle       34944 
shared pool library cache                 2176808 
shared pool log_buffer                      24576 
shared pool log_checkpoint_timeout          24700 
shared pool long op statistics array        30240 
shared pool message pool freequeue         116232 
shared pool miscellaneous                  267624 
shared pool processes                       76896 
shared pool session param values            41424 
shared pool sessions                       170016 
shared pool sql area                      9549116 
shared pool table columns                  148104 
shared pool trace_buffers_per_process     1476320 
shared pool transactions                    18480 
shared pool trigger inform                  24684 
shared pool                              52248968 
                                         90641768 

Evaluate the memory used as shown in your output, and alter the setting for SHARED_POOL_SIZE based on your processing needs.

To obtain more memory usage statistics, execute the following query:

SELECT * FROM V$PX_PROCESS_SYSSTAT WHERE STATISTIC LIKE 'Buffers%';

Your output should resemble the following:

STATISTIC                           VALUE 
-------------------                 ----- 
Buffers Allocated                   23225 
Buffers Freed                       23225 
Buffers Current                         0 
Buffers HWM                          3620 

The amount of memory used appears in the Buffers Current and Buffers HWM statistics. Calculate a value in bytes by multiplying the number of buffers by the value for PARALLEL_EXECUTION_MESSAGE_SIZE. Compare the high water mark to the parallel execution message pool size to determine if you allocated too much memory. For example, in the first output, the value for large pool as shown in px msg pool is 38,092,812 or 38 MB. The Buffers HWM from the second output is 3,620, which when multiplied by a parallel execution message size of 4,096 is 14,827,520, or approximately 15 MB. In this case, the high water mark has reached approximately 40 percent of its capacity.

8.6.3.2 Parameters Affecting Resource Consumption

The parameters affecting resource consumption are discussed in the topic.

Note:

Before considering the following section, you should read the descriptions of the MEMORY_TARGET and MEMORY_MAX_TARGET initialization parameters for details. The PGA_AGGREGATE_TARGET initialization parameter need not be set as MEMORY_TARGET autotunes the SGA and PGA components.

The first group of parameters discussed in this section affects memory and resource consumption for all parallel operations, in particular, for parallel execution. These parameters are:

A second subset of parameters are discussed in Parameters Affecting Resource Consumption for Parallel DML and Parallel DDL.

To control resource consumption, you should configure memory at two levels:

  • At the database level, so the system uses an appropriate amount of memory from the operating system.

  • At the operating system level for consistency.

    On some platforms, you might need to set operating system parameters that control the total amount of virtual memory available, totalled across all processes.

A large percentage of the memory used in data warehousing operations (compared to OLTP) is more dynamic. This memory comes from Process Global Area (PGA), and both the size of process memory and the number of processes can vary greatly. Use the PGA_AGGREGATE_TARGET initialization parameter to control both the process memory and the number of processes in such cases. Explicitly setting PGA_AGGREGATE_TARGET along with MEMORY_TARGET ensures that autotuning still occurs but PGA_AGGREGATE_TARGET is not tuned below the specified value.

See Also:

8.6.3.2.1 PGA_AGGREGATE_TARGET

You can enable automatic PGA memory management with the setting of initialization parameters, such as PGA_AGGREGATE_TARGET.

You can simplify and improve the way PGA memory is allocated by enabling automatic PGA memory management. In this mode, Oracle Database dynamically adjusts the size of the portion of the PGA memory dedicated to work areas, based on an overall PGA memory target explicitly set by the DBA. To enable automatic PGA memory management, you must set the initialization parameter PGA_AGGREGATE_TARGET. For new installations, PGA_AGGREGATE_TARGET and SGA_TARGET are set automatically by the database configuration assistant (DBCA), and MEMORY_TARGET is zero. That is, automatic memory management is disabled. Therefore, automatic tuning of the aggregate PGA is enabled by default. However, the aggregate PGA does not grow unless you enable automatic memory management by setting MEMORY_TARGET to a nonzero value.

See Also:

8.6.3.2.1.1 HASH_AREA_SIZE

This parameter has been deprecated.

HASH_AREA_SIZE has been deprecated and you should use PGA_AGGREGATE_TARGET instead. For information, refer to PGA_AGGREGATE_TARGET.

8.6.3.2.1.2 SORT_AREA_SIZE

This parameter has been deprecated.

SORT_AREA_SIZE has been deprecated and you should use PGA_AGGREGATE_TARGET instead. For information, refer to PGA_AGGREGATE_TARGET.

8.6.3.2.2 PARALLEL_EXECUTION_MESSAGE_SIZE

The PARALLEL_EXECUTION_MESSAGE_SIZE parameter specifies the size of the buffer used for parallel execution messages.

The default value of PARALLEL_EXECUTION_MESSAGE_SIZE is operating system-specific, but is typically 16 K. This value should be adequate for most applications.

See Also:

Oracle Database Reference for information about the PARALLEL_EXECUTION_MESSAGE_TIME initialization parameter

8.6.3.2.3 Parameters Affecting Resource Consumption for Parallel DML and Parallel DDL

The parameters affecting resource consumption for parallel DML and parallel DDL operations are introduced in this topic.

The parameters that affect parallel DML and parallel DDL resource consumption are:

Parallel insert, update, and delete operations require more resources than serial DML operations. Similarly, PARALLEL CREATE TABLE AS SELECT and PARALLEL CREATE INDEX can require more resources. For this reason, you may need to increase the value of several additional initialization parameters. These parameters do not affect resources for queries.

See Also:

Oracle Database Reference for information about initialization parameters

8.6.3.2.3.1 TRANSACTIONS

The TRANSACTIONS parameter affects the number of transactions under parallel DML and DDL.

For parallel DML and DDL, each query server process starts a transaction. The parallel execution coordinator uses the two-phase commit protocol to commit transactions; therefore, the number of transactions being processed increases by the DOP. Consequently, you might need to increase the value of the TRANSACTIONS initialization parameter.

The TRANSACTIONS parameter specifies the maximum number of concurrent transactions. The default value of TRANSACTIONS assumes no parallelism. For example, if you have a DOP of 20, you have 20 more new server transactions (or 40, if you have two server sets) and 1 coordinator transaction. In this case, you should increase TRANSACTIONS by 21 (or 41) if the transactions are running in the same instance. If you do not set this parameter, Oracle Database sets it to a value equal to 1.1 x SESSIONS. This discussion does not apply if you are using server-managed undo.

8.6.3.2.3.2 FAST_START_PARALLEL_ROLLBACK

If a system fails when there are uncommitted parallel DML or DDL transactions, you can speed up transaction recovery during startup by using the FAST_START_PARALLEL_ROLLBACK parameter.

The FAST_START_PARALLEL_ROLLBACK parameter controls the DOP used when recovering terminated transactions. Terminated transactions are transactions that are active before a system failure. By default, the DOP is chosen to be at most two times the value of the CPU_COUNT parameter.

If the default DOP is insufficient, set the parameter to HIGH. This gives a maximum DOP of at most four times the value of the CPU_COUNT parameter. This feature is available by default.

8.6.3.2.3.3 DML_LOCKS

The DML_LOCKS parameter should be set to account for the number of locks held by a parallel DML operation.

The DML_LOCKS parameter specifies the maximum number of DML locks. Its value should equal the total number of locks on all tables referenced by all users. A parallel DML operation's lock requirement is very different from serial DML. Parallel DML holds many more locks, so you should increase the value of the DML_LOCKS parameter by equal amounts.

Note:

Parallel DML operations are not performed when the table lock of the target table is disabled.

Table 8-4 shows the types of locks acquired by coordinator and parallel execution server processes for different types of parallel DML statements. Using this information, you can determine the value required for these parameters.

Table 8-4 Locks Acquired by Parallel DML Statements

Type of Statement Coordinator Process Acquires: Each Parallel Execution Server Acquires:

Parallel UPDATE or DELETE into partitioned table; WHERE clause pruned to a subset of partitions or subpartitions

1 table lock SX

1 partition lock X for each pruned partition or subpartition

1 table lock SX

1 partition lock NULL for each pruned partition or subpartition owned by the query server process

1 partition-wait lock S for each pruned partition or subpartition owned by the query server process

Parallel row-migrating UPDATE into partitioned table; WHERE clause pruned to a subset of partitions or subpartitions

1 table lock SX

1 partition X lock for each pruned partition or subpartition

1 partition lock SX for all other partitions or subpartitions

1 table lock SX

1 partition lock NULL for each pruned partition or subpartition owned by the query server process

1 partition-wait lock S for each pruned partition owned by the query server process

1 partition lock SX for all other partitions or subpartitions

Parallel UPDATE, MERGE, DELETE, or INSERT into partitioned table

1 table lock SX

Partition locks X for all partitions or subpartitions

1 table lock SX

1 partition lock NULL for each partition or subpartition

1 partition-wait lock S for each partition or subpartition

Parallel INSERT into partitioned table; destination table with partition or subpartition clause

1 table lock SX

1 partition lock X for each specified partition or subpartition

1 table lock SX

1 partition lock NULL for each specified partition or subpartition

1 partition-wait lock S for each specified partition or subpartition

Parallel INSERT into nonpartitioned table

1 table lock X

None

Note:

Table, partition, and partition-wait DML locks all appear as TM locks in the V$LOCK view.

Consider a table with 600 partitions running with a DOP of 100. Assume all partitions are involved in a parallel UPDATE or DELETE statement with no row-migrations.

The coordinator acquires:

  • 1 table lock SX

  • 600 partition locks X

Total server processes acquire:

  • 100 table locks SX

  • 600 partition locks NULL

  • 600 partition-wait locks S

8.6.3.3 Parameters Related to I/O

The parameters that affect I/O are introduced in this topic:

The parameters that affect I/O are:

These parameters also affect the optimizer, which ensures optimal performance for parallel execution of I/O operations.

See Also:

Oracle Database Reference for information about initialization parameters

8.6.3.3.1 DB_CACHE_SIZE

The DB_CACHE_SIZE parameter sets the size of the DEFAULT buffer pool for buffers with the primary block size.

When you perform parallel update, merge, and delete operations, the buffer cache behavior is very similar to any OLTP system running a high volume of updates.

8.6.3.3.2 DB_BLOCK_SIZE

The DB_BLOCK_SIZE parameter sets the size of Oracle database blocks.

The recommended value for this parameter is 8 KB or 16 KB.

Set the database block size when you create the database. If you are creating a new database, use a large block size such as 8 KB or 16 KB.

8.6.3.3.3 DB_FILE_MULTIBLOCK_READ_COUNT

The DB_FILE_MULTIBLOCK_READ_COUNT parameter determines how many database blocks are read with a single operating system READ call.

The default value of this parameter is a value that corresponds to the maximum I/O size that can be performed efficiently. The maximum I/O size value is platform-dependent and is 1 MB for most platforms. If you set DB_FILE_MULTIBLOCK_READ_COUNT to an excessively high value, your operating system lowers the value to the highest allowable level when you start your database.

8.6.3.3.4 DISK_ASYNCH_IO and TAPE_ASYNCH_IO

The DISK_ASYNCH_IO and TAPE_ASYNCH_IO parameters enable or disable the operating system's asynchronous I/O facility.

The recommended value for the both DISK_ASYNCH_IO and TAPE_ASYNCH_IOparameters is TRUE. These parameters enable query server processes to overlap I/O requests with processing when performing table scans. If the operating system supports asynchronous I/O, leave these parameters at the default value of TRUE. Figure 8-6 illustrates how asynchronous read works.

Asynchronous operations are currently supported for parallel table scans, hash joins, sorts, and serial table scans. However, this feature can require operating system-specific configuration and may not be supported on all platforms.

8.7 Monitoring Parallel Execution Performance

You should perform the following types of monitoring when trying to diagnose parallel execution performance problems.

These types of monitoring include:

See Also:

Oracle Database Reference for information about dynamic views

8.7.1 Monitoring Parallel Execution Performance with Dynamic Performance Views

You can monitor parallel execution performance with dynamic performance views.

Oracle Database real-time monitoring feature enables you to monitor the performance of SQL statements while they are executing. SQL monitoring is automatically started when a SQL statement runs parallel or when it has consumed at least 5 seconds of CPU or I/O time for a single execution.

After your system has run for a few days, you should monitor parallel execution performance statistics to determine whether your parallel processing is optimal. Do this using any of the views discussed in this section.

In Oracle Real Application Clusters, global versions of the views described in this section aggregate statistics from multiple instances. The global views have names beginning with G, such as GV$FILESTAT for V$FILESTAT, and so on.

See Also:

Oracle Database SQL Tuning Guide for more information about monitoring performance

8.7.1.1 V$PX_BUFFER_ADVICE

You can monitor parallel execution performance with the V$PX_BUFFER_ADVICE dynamic performance view.

The V$PX_BUFFER_ADVICE view provides statistics on historical and projected maximum buffer usage by all parallel queries. You can consult this view to reconfigure SGA size in response to insufficient memory problems for parallel queries.

8.7.1.2 V$PX_SESSION

You can monitor parallel execution performance with the V$PX_SESSION dynamic performance view.

The V$PX_SESSION view shows data about query server sessions, groups, sets, and server numbers. It also displays real-time data about the processes working on behalf of parallel execution. This table includes information about the requested degree of parallelism (DOP) and the actual DOP granted to the operation.

8.7.1.3 V$PX_SESSTAT

You can monitor parallel execution performance with the V$PX_SESSTAT dynamic performance view.

The V$PX_SESSTAT view provides a join of the session information from V$PX_SESSION and the V$SESSTAT table. Thus, all session statistics available to a standard session are available for all sessions performed using parallel execution.

8.7.1.4 V$PX_PROCESS

You can monitor parallel execution performance with the V$PX_PROCESS dynamic performance view.

The V$PX_PROCESS view contains information about the parallel processes, including status, session ID, process ID, and other information.

8.7.1.5 V$PX_PROCESS_DETAIL

V$PX_PROCESS_DETAIL lists statistics for each of the active PX (Parallel Execution) servers on an instance.

This dynamic performance view is similar to V$PX_PROCESS, but provides more detailed information. It may offer additional insight into PX process management in the database, which can help you to better adjust the environment to the actual workload.

8.7.1.6 V$PX_PROCESS_SYSSTAT

You can monitor parallel execution performance with the V$PX_PROCESS_SYSSTAT dynamic performance view.

The V$PX_PROCESS_SYSSTAT view shows the status of query servers and provides buffer allocation statistics.

8.7.1.7 V$PQ_SESSTAT

You can monitor parallel execution performance with the V$PQ_SESSTAT dynamic performance view.

The V$PQ_SESSTAT view shows the status of all current server groups in the system such as data about how queries allocate processes and how the multiuser and load balancing algorithms are affecting the default and hinted values.

You might need to adjust some parameter settings to improve performance after reviewing data from these views. In this case, refer to the discussion of Tuning General Parameters for Parallel Execution. Query these views periodically to monitor the progress of long-running parallel operations.

For many dynamic performance views, you must set the parameter TIMED_STATISTICS to TRUE in order for Oracle Database to collect statistics for each view. You can use the ALTER SYSTEM or ALTER SESSION statements to turn TIMED_STATISTICS on and off.

8.7.1.8 V$PQ_TQSTAT

You can monitor parallel execution performance with the V$PQ_TQSTAT dynamic performance view.

As a simple example, consider a hash join between two tables, with a join on a column with only two distinct values. At best, this hash function has one hash value to parallel execution server A and the other to parallel execution server B. A DOP of two is fine, but, if it is four, then at least two parallel execution servers have no work. To discover this type of deviation, use a query similar to the following example:

SELECT dfo_number, tq_id, server_type, process, num_rows
FROM V$PQ_TQSTAT ORDER BY dfo_number DESC, tq_id, server_type, process;

The best way to resolve this problem might be to choose a different join method; a nested loop join might be the best option. Alternatively, if one join table is small relative to the other, a BROADCAST distribution method can be hinted using PQ_DISTRIBUTE hint. The optimizer considers the BROADCAST distribution method, but requires OPTIMIZER_FEATURES_ENABLE set to 9.0.2 or higher.

Now, assume that you have a join key with high cardinality, but one value contains most of the data, for example, lava lamp sales by year. The only year that had big sales was 1968, and the parallel execution server for the 1968 records is overwhelmed. You should use the same corrective actions as described in the previous paragraph.

The V$PQ_TQSTAT view provides a detailed report of message traffic at the table queue level. V$PQ_TQSTAT data is valid only when queried from a session that is executing parallel SQL statements. A table queue is the pipeline between query server groups, between the parallel execution coordinator and a query server group, or between a query server group and the coordinator. The table queues are represented explicitly in the operation column by PX SEND <partitioning type> (for example, PX SEND HASH) and PX RECEIVE.

V$PQ_TQSTAT has a row for each query server process that it reads from or writes to in each table queue. A table queue connecting 10 consumer processes to 10 producer processes has 20 rows in the view. Total the bytes column and group by TQ_ID, and the table queue identifier, to obtain the total number of bytes sent through each table queue. Compare this to the optimizer estimates; large variations might indicate a need to analyze the data using a larger sample.

Compute the variance of bytes grouped by TQ_ID. Large variances indicate workload imbalances. You should investigate large variances to determine whether the producers start out with unequal distributions of data, or whether the distribution itself is skewed. If the data itself is skewed, this might indicate a low cardinality, or low number of distinct values.

8.7.1.9 V$RSRC_CONS_GROUP_HISTORY

You can monitor parallel execution performance with the V$RSRC_CONS_GROUP_HISTORY dynamic performance view.

The V$RSRC_CONS_GROUP_HISTORY view displays a history of consumer group statistics for each entry in V$RSRC_PLAN_HISTORY that has a non-NULL plan, including information about parallel statement queuing.

8.7.1.10 V$RSRC_CONSUMER_GROUP

You can monitor parallel execution performance with the V$RSRC_CONSUMER_GROUP dynamic performance view.

The V$RSRC_CONSUMER_GROUP view displays data related to currently active resource consumer groups, including information about parallel statements.

8.7.1.11 V$RSRC_PLAN

You can monitor parallel execution performance with the V$RSRC_PLAN dynamic performance view.

The V$RSRC_PLAN view displays the names of all currently active resource plans, including the state of parallel statement queuing.

8.7.1.12 V$RSRC_PLAN_HISTORY

You can monitor parallel execution performance with the V$RSRC_PLAN_HISTORY dynamic performance view.

The V$RSRC_PLAN_HISTORY displays a history of when a resource plan was enabled, disabled, or modified on the instance. The history includes the state of parallel statement queuing

8.7.1.13 V$RSRC_SESSION_INFO

You can monitor parallel execution performance with the V$RSRC_SESSION_INFO dynamic performance view.

The V$RSRC_SESSION_INFO view displays resource manager statistics per session, including parallel statement queue statistics. Columns include PQ_SERVERS and PQ_STATUS.

The PQ_SERVERS column of the V$RSRC_SESSION_INFO view contains the number of active parallel servers if the session is active and running the parallel query. If the query is queued, the number of parallel servers that this query is trying to run with is shown.

The PQ_STATUS column maintains the reason that a parallel statement is queued

See Also:

Oracle Database Reference for information about the V$RSRC_SESSION_INFO view

8.7.1.14 V$RSRCMGRMETRIC

You can monitor parallel execution performance with the V$RSRCMGRMETRIC dynamic performance view.

The V$RSRCMGRMETRIC view displays statistics related to parallel statement queuing.

Statistics related to parallel statement queuing are added to the resource manager metrics that takes statistics for a given one-minute window and retains them for approximately one hour.

Columns include AVG_ACTIVE_PARALLEL_STMTS, AVG_QUEUED_PARALLEL_STMTS, AVG_ACTIVE_PARALLEL_SERVERS, AVG_QUEUED_PARALLEL_SERVERS, and PARALLEL_SERVERS_LIMIT.

See Also:

Oracle Database Reference for information about the V$RSRCMGRMETRIC view

8.7.2 Monitoring Session Statistics

You can monitor session statistics with the dynamic performance views to diagnose parallel execution performance.

Use GV$PX_SESSION to determine the configuration of the server group executing in parallel. In this example, session 9 is the query coordinator, while sessions 7 and 21 are in the first group, first set. Sessions 18 and 20 are in the first group, second set. The requested and granted DOP for this query is 2, as shown by the output from the following query:

SELECT QCSID, SID, INST_ID "Inst", SERVER_GROUP "Group", SERVER_SET "Set",
  DEGREE "Degree", REQ_DEGREE "Req Degree"
FROM GV$PX_SESSION ORDER BY QCSID, QCINST_ID, SERVER_GROUP, SERVER_SET;

Your output should resemble the following:

QCSID      SID        Inst       Group      Set        Degree     Req Degree 
---------- ---------- ---------- ---------- ---------- ---------- ---------- 
         9          9          1 
         9          7          1          1          1          2          2 
         9         21          1          1          1          2          2 
         9         18          1          1          2          2          2 
         9         20          1          1          2          2          2 

For a single instance, use SELECT FROM V$PX_SESSION and do not include the column name Instance ID.

The processes shown in the output from the previous example using GV$PX_SESSION collaborate to complete the same task. The next example shows the execution of a join query to determine the progress of these processes in terms of physical reads. Use this query to track any specific statistic:

SELECT QCSID, SID, INST_ID "Inst", SERVER_GROUP "Group", SERVER_SET "Set",
  NAME "Stat Name", VALUE
FROM GV$PX_SESSTAT A, V$STATNAME B
WHERE A.STATISTIC# = B.STATISTIC# AND NAME LIKE 'PHYSICAL READS'
  AND VALUE > 0 ORDER BY QCSID, QCINST_ID, SERVER_GROUP, SERVER_SET;

Your output should resemble the following:

QCSID  SID   Inst   Group  Set    Stat Name          VALUE      
------ ----- ------ ------ ------ ------------------ ---------- 
     9     9      1               physical reads           3863 
     9     7      1      1      1 physical reads              2 
     9    21      1      1      1 physical reads              2 
     9    18      1      1      2 physical reads              2 
     9    20      1      1      2 physical reads              2 

Use the previous type of query to track statistics in V$STATNAME. Repeat this query as often as required to observe the progress of the query server processes.

The next query uses V$PX_PROCESS to check the status of the query servers.

SELECT * FROM V$PX_PROCESS;

Your output should resemble the following:

SERV STATUS    PID    SPID      SID    SERIAL# IS_GV   CON_ID
---- --------- ------ --------- ------ ------- -----  -------
P002 IN USE        16     16955     21   7729  FALSE        0
P003 IN USE        17     16957     20   2921  FALSE        0
P004 AVAILABLE     18     16959                FALSE        0
P005 AVAILABLE     19     16962                FALSE        0
P000 IN USE        12      6999     18   4720  FALSE        0
P001 IN USE        13      7004      7    234  FALSE        0

See Also:

Monitoring Parallel Execution Performance with Dynamic Performance Views for descriptions of the dynamic performance views used in the examples

8.7.3 Monitoring System Statistics

You can monitor system statistics with the dynamic performance views to diagnose parallel execution performance.

The V$SYSSTAT and V$SESSTAT views contain several statistics for monitoring parallel execution. Use these statistics to track the number of parallel queries, DMLs, DDLs, data flow operators (DFOs), and operations. Each query, DML, or DDL can have multiple parallel operations and multiple DFOs.

In addition, statistics also count the number of query operations for which the DOP was reduced, or downgraded, due to either the adaptive multiuser algorithm or the depletion of available parallel execution servers.

Finally, statistics in these views also count the number of messages sent on behalf of parallel execution. The following syntax is an example of how to display these statistics:

SELECT NAME, VALUE FROM GV$SYSSTAT
  WHERE UPPER (NAME) LIKE '%PARALLEL OPERATIONS%'
  OR UPPER (NAME) LIKE '%PARALLELIZED%' OR UPPER (NAME) LIKE '%PX%';

Your output should resemble the following:

NAME                                               VALUE      
-------------------------------------------------- ---------- 
queries parallelized                                      347 
DML statements parallelized                                 0 
DDL statements parallelized                                 0 
DFO trees parallelized                                    463 
Parallel operations not downgraded                         28 
Parallel operations downgraded to serial                   31 
Parallel operations downgraded 75 to 99 pct               252 
Parallel operations downgraded 50 to 75 pct               128 
Parallel operations downgraded 25 to 50 pct                43 
Parallel operations downgraded 1 to 25 pct                 12 
PX local messages sent                                  74548 
PX local messages recv'd                                74128 
PX remote messages sent                                     0 
PX remote messages recv'd                                   0 

The following query shows the current wait state of each worker (child process) and query coordinator process on the system:

SELECT px.SID "SID", p.PID, p.SPID "SPID", px.INST_ID "Inst",
       px.SERVER_GROUP "Group", px.SERVER_SET "Set",
       px.DEGREE "Degree", px.REQ_DEGREE "Req Degree", w.event "Wait Event"
FROM GV$SESSION s, GV$PX_SESSION px, GV$PROCESS p, GV$SESSION_WAIT w
WHERE s.sid (+) = px.sid AND s.inst_id (+) = px.inst_id AND
      s.sid = w.sid (+) AND s.inst_id = w.inst_id (+) AND
      s.paddr = p.addr (+) AND s.inst_id = p.inst_id (+)
ORDER BY DECODE(px.QCINST_ID,  NULL, px.INST_ID,  px.QCINST_ID), px.QCSID, 
DECODE(px.SERVER_GROUP, NULL, 0, px.SERVER_GROUP), px.SERVER_SET, px.INST_ID;

8.7.4 Monitoring Operating System Statistics

There is considerable overlap between information available in Oracle Database and information available though operating system utilities, such as sar and vmstat on UNIX-based systems.

Operating systems provide performance statistics on I/O, communication, CPU, memory and paging, scheduling, and synchronization primitives. The V$SESSTAT view provides the major categories of operating system statistics as well.

Typically, operating system information about I/O devices and semaphore operations is harder to map back to database objects and operations than is Oracle Database information. However, some operating systems have good visualization tools and efficient means of collecting the data.

Operating system information about CPU and memory usage is very important for assessing performance. Probably the most important statistic is CPU usage. The goal of low-level performance tuning is to become CPU bound on all CPUs. After this is achieved, you can work at the SQL level to find an alternate plan that might be more I/O intensive but use less CPU.

Operating system memory and paging information is valuable for fine tuning the many system parameters that control how memory is divided among memory-intensive data warehouse subsystems like parallel communication, sort, and hash join.

8.8 Tips for Tuning Parallel Execution

Various ideas for improving performance in a parallel execution environment are discussed under this section.

This section contains the following topics:

8.8.1 Implementing a Parallel Execution Strategy

Implementing a good parallel execution strategy is important to ensure high performance.

Recommendations for a good strategy include:

  • Implement a simple setup to understand what is happening in your system.

  • Use resource manager to specify the maximum degree of parallelism (DOP) for consumer groups so that each group is allotted a specific amount of processing resources without overwhelming the system. A resource management policy is needed when using parallel execution to keep the system under control, and to ensure SQL statements are able to execute in parallel.

  • Base your strategy on the amount of system resources you want to make available for parallel execution. Adjust the values of the parameters PARALLEL_MAX_SERVERS and PARALLEL_SERVERS_TARGET to limit the number of parallel execution (PX) servers running in the system.

  • Consider taking an ELT (Extract, Load, and Transform) strategy rather than an ETL (Extract, Transform, and Load) strategy.

  • Use external tables with a parallel SQL statement, such as CTAS or IAS, for faster data loads

8.8.2 Optimizing Performance by Creating and Populating Tables in Parallel

To optimize parallel execution performance for queries that retrieve large result sets, create and populate tables in parallel.

Oracle Database cannot return results to a user process in parallel. If a query returns a large number of rows, execution of the query might indeed be faster. However, the user process can receive the rows only serially. To optimize parallel execution performance for queries that retrieve large result sets, use PARALLEL CREATE TABLE AS SELECT or direct-path INSERT to store the result set in the database. At a later time, users can view the result set serially.

Performing the SELECT in parallel does not influence the CREATE statement. If the CREATE statement is executed in parallel, however, the optimizer tries to make the SELECT run in parallel also.

When combined with the NOLOGGING option, the parallel version of CREATE TABLE AS SELECT provides a very efficient intermediate table facility, for example:

CREATE TABLE summary PARALLEL NOLOGGING AS SELECT dim_1, dim_2 ..., 
SUM (meas_1)
FROM facts GROUP BY dim_1, dim_2;

These tables can also be incrementally loaded with parallel INSERT. You can take advantage of intermediate tables using the following techniques:

  • Common subqueries can be computed once and referenced many times. This can allow some queries against star schemas (in particular, queries without selective WHERE-clause predicates) to be better parallelized. Star queries with selective WHERE-clause predicates using the star-transformation technique can be effectively parallelized automatically without any modification to the SQL.

  • Decompose complex queries into simpler steps to provide application-level checkpoint or restart. For example, a complex multitable join on a one terabyte database could run for dozens of hours. A failure during this query would mean starting over from the beginning. Using CREATE TABLE AS SELECT or PARALLEL INSERT AS SELECT, you can rewrite the query as a sequence of simpler queries that run for a few hours each. If a system failure occurs, the query can be restarted from the last completed step.

  • Implement manual parallel delete operations efficiently by creating a new table that omits the unwanted rows from the original table, and then dropping the original table. Alternatively, you can use the convenient parallel delete feature, which directly deletes rows from the original table.

  • Create summary tables for efficient multidimensional drill-down analysis. For example, a summary table might store the sum of revenue grouped by month, brand, region, and salesperson.

  • Reorganize tables, eliminating chained rows, compressing free space, and so on, by copying the old table to a new table. This is much faster than export/import and easier than reloading.

Be sure to use the DBMS_STATS package to gather optimizer statistics on newly created tables. To avoid I/O bottlenecks, specify a tablespace that is striped across at least as many physical disks as CPUs. To avoid fragmentation in allocating space, the number of files in a tablespace should be a multiple of the number of CPUs.

See Also:

Oracle Database Data Warehousing Guide for information about parallel execution in data warehouses

8.8.3 Using EXPLAIN PLAN to Show Parallel Operations Plans

Use the EXPLAIN PLAN statement to see the execution plans for parallel queries.

The EXPLAIN PLAN output shows optimizer information in the COST, BYTES, and CARDINALITY columns. You can also use the utlxplp.sql script to present the EXPLAIN PLAN output with all relevant parallel information.

There are several ways to optimize the parallel execution of join statements. You can alter system configuration, adjust parameters as discussed earlier in this chapter, or use hints, such as the DISTRIBUTION hint.

The key points when using EXPLAIN PLAN are to:

  • Verify optimizer selectivity estimates. If the optimizer thinks that only one row is produced from a query, it tends to favor using a nested loop. This could be an indication that the tables are not analyzed or that the optimizer has made an incorrect estimate about the correlation of multiple predicates on the same table. Extended statistics or a hint may be required to provide the optimizer with the correct selectivity or to force the optimizer to use another join method.

  • Use hash join on low cardinality join keys. If a join key has few distinct values, then a hash join may not be optimal. If the number of distinct values is less than the degree of parallelism (DOP), then some parallel query servers may be unable to work on the particular query.

  • Consider data skew. If a join key involves excessive data skew, a hash join may require some parallel query servers to work more than others. Consider using a hint to cause a BROADCAST distribution method if the optimizer did not choose it. The optimizer considers the BROADCAST distribution method only if the OPTIMIZER_FEATURES_ENABLE is set to 9.0.2 or higher. See V$PQ_TQSTAT for more information.

8.8.3.1 Example: Using EXPLAIN PLAN to Show Parallel Operations

You can use EXPLAIN PLAN to show parallel operations.

The following example illustrates how the optimizer intends to execute a parallel query:

explain plan for 
  SELECT /*+ PARALLEL */ cust_first_name, cust_last_name 
   FROM customers c, sales s WHERE c.cust_id = s.cust_id;

----------------------------------------------------------
| Id  | Operation                       |  Name          |
----------------------------------------------------------
|   0 | SELECT STATEMENT                |                |
|   1 |  PX COORDINATOR                 |                |
|   2 |   PX SEND QC (RANDOM)           | :TQ10000       |
|   3 |    NESTED LOOPS                 |                |
|   4 |     PX BLOCK ITERATOR           |                |
|   5 |      TABLE ACCESS FULL          | CUSTOMERS      |
|   6 |     PARTITION RANGE ALL         |                |
|   7 |      BITMAP CONVERSION TO ROWIDS|                |
|   8 |       BITMAP INDEX SINGLE VALUE | SALES_CUST_BIX |
----------------------------------------------------------

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 2

8.8.4 Additional Considerations for Parallel DML

Additional considerations when using parallel DML operations are introduced in this topic.

When you want to refresh your data warehouse database using parallel insert, update, or delete operations on a data warehouse, there are additional issues to consider when designing the physical database. These considerations do not affect parallel execution operations. These issues are:

8.8.4.1 Parallel DML and Direct-Path Restrictions

The restrictions for parallel DML and direct-path operations are identified in this topic.

If a parallel restriction is violated, then the operation is simply performed serially. If a direct-path INSERT restriction is violated, then the APPEND hint is ignored and a conventional insert operation is performed. No error message is returned.

8.8.4.2 Limitation on the Degree of Parallelism

There are certain limitations on the degree of parallelism based on the software level of Oracle Database in use.

For tables that do not have the parallel DML itl invariant property (tables created before Oracle9i Release 2 (9.2) or tables that were created with the COMPATIBLE initialization parameter set to less than 9.2), the degree of parallelism (DOP) equals the number of partitions or subpartitions. That means that, if the table is not partitioned, the query runs serially. To determine which tables do not have this property, issue the following statement:

SELECT u.name, o.name FROM obj$ o, tab$ t, user$ u
 WHERE o.obj# = t.obj# AND o.owner# = u.user#
 AND bitand(t.property,536870912) != 536870912;

See Also:

Oracle Database Concepts for information about the interested transaction list (ITL), also called the transaction table

8.8.4.3 When to Increase INITRANS

You should increase the value of INITRANS under certain situations.

If you have global indexes, a global index segment and global index blocks are shared by server processes of the same parallel DML statement. Even if the operations are not performed against the same row, the server processes can share the same index blocks. Each server transaction needs one transaction entry in the index block header before it can make changes to a block.

In this situation, when using the CREATE INDEX or ALTER INDEX statements, you should set INITRANS, the initial number of transactions allocated within each data block, to a large value, such as the maximum DOP against this index.

8.8.4.4 Limitation on Available Number of Transaction Free Lists for Segments

There is a limitation on the available number of transaction free lists for segments in dictionary-managed tablespaces.

After a segment has been created, the number of process and transaction free lists is fixed and cannot be altered. If you specify a large number of process free lists in the segment header, you might find that this limits the number of transaction free lists that are available. You can abate this limitation the next time you re-create the segment header by decreasing the number of process free lists; this leaves more room for transaction free lists in the segment header.

For UPDATE and DELETE operations, each server process can require its own transaction free list. The parallel DML DOP is thus effectively limited by the smallest number of transaction free lists available on the table and on any of the global indexes the DML statement must maintain. For example, if the table has 25 transaction free lists and the table has two global indexes, one with 50 transaction free lists and one with 30 transaction free lists, the DOP is limited to 25. If the table had 40 transaction free lists, the DOP would have been limited to 30.

The FREELISTS parameter of the STORAGE clause is used to set the number of process free lists. By default, no process free lists are created.

The default number of transaction free lists depends on the block size. For example, if the number of process free lists is not set explicitly, a 4 KB block has about 80 transaction free lists by default. The minimum number of transaction free lists is 25.

8.8.4.5 Multiple Archivers for Large Numbers of Redo Logs

Multiple archiver processes are needed for archiving large numbers of redo logs.

Parallel DDL and parallel DML operations can generate a large number of redo logs. A single ARCH process to archive these redo logs might not be able to keep up. To avoid this problem, you can spawn multiple archiver processes manually or by using a job queue.

8.8.4.6 Database Writer Process (DBWn) Workload

There are situations when you should increase the number of database writer processes.

Parallel DML operations use a large number of data, index, and undo blocks in the buffer cache during a short interval. For example, suppose you see a high number of free_buffer_waits after querying the V$SYSTEM_EVENT view, as in the following syntax:

SELECT TOTAL_WAITS FROM V$SYSTEM_EVENT WHERE EVENT = 'FREE BUFFER WAITS';

In this case, you should consider increasing the DBWn processes. If there are no waits for free buffers, the query does not return any rows.

8.8.4.7 [NO]LOGGING Clause

Understand the considerations when setting the [NO]LOGGING clause.

The [NO]LOGGING clause applies to tables, partitions, tablespaces, and indexes. Virtually no log is generated for certain operations (such as direct-path INSERT) if the NOLOGGING clause is used. The NOLOGGING attribute is not specified at the INSERT statement level but is instead specified when using the ALTER or CREATE statement for a table, partition, index, or tablespace.

When a table or index has NOLOGGING set, neither parallel nor serial direct-path INSERT operations generate redo logs. Processes running with the NOLOGGING option set run faster because no redo is generated. However, after a NOLOGGING operation against a table, partition, or index, if a media failure occurs before a backup is performed, then all tables, partitions, and indexes that have been modified might be corrupted.

Direct-path INSERT operations (except for dictionary updates) never generate redo logs if the NOLOGGING clause is used. The NOLOGGING attribute does not affect undo, only redo. To be precise, NOLOGGING allows the direct-path INSERT operation to generate a negligible amount of redo (range-invalidation redo, as opposed to full image redo).

For backward compatibility, [UN]RECOVERABLE is still supported as an alternate keyword with the CREATE TABLE statement. This alternate keyword might not be supported, however, in future releases.

At the tablespace level, the logging clause specifies the default logging attribute for all tables, indexes, and partitions created in the tablespace. When an existing tablespace logging attribute is changed by the ALTER TABLESPACE statement, then all tables, indexes, and partitions created after the ALTER statement have the new logging attribute; existing ones do not change their logging attributes. The tablespace-level logging attribute can be overridden by the specifications at the table, index, or partition level.

The default logging attribute is LOGGING. However, if you have put the database in NOARCHIVELOG mode, by issuing ALTER DATABASE NOARCHIVELOG, then all operations that can be done without logging do not generate logs, regardless of the specified logging attribute.

8.8.5 Optimizing Performance by Creating Indexes in Parallel

You can optimize performance by creating indexes in parallel.

Multiple processes can work simultaneously to create an index. By dividing the work necessary to create an index among multiple server processes, Oracle Database can create the index more quickly than if a single server process created the index serially.

Parallel index creation works in much the same way as a table scan with an ORDER BY clause. The table is randomly sampled and a set of index keys is found that equally divides the index into the same number of pieces as the DOP. A first set of query processes scans the table, extracts key-rowid pairs, and sends each pair to a process in a second set of query processes based on a key. Each process in the second set sorts the keys and builds an index in the usual fashion. After all index pieces are built, the parallel execution coordinator simply concatenates the pieces (which are ordered) to form the final index.

Parallel local index creation uses a single server set. Each server process in the set is assigned a table partition to scan and for which to build an index partition. Because half as many server processes are used for a given DOP, parallel local index creation can be run with a higher DOP. However, the DOP is restricted to be less than or equal to the number of index partitions you want to create. To avoid this limitation, you can use the DBMS_PCLXUTIL package.

You can optionally specify that no redo and undo logging should occur during index creation. This can significantly improve performance but temporarily renders the index unrecoverable. Recoverability is restored after the new index is backed up. If your application can tolerate a window where recovery of the index requires it to be re-created, then you should consider using the NOLOGGING clause.

The PARALLEL clause in the CREATE INDEX statement is the only way in which you can specify the DOP for creating the index. If the DOP is not specified in the parallel clause of the CREATE INDEX statement, then the number of CPUs is used as the DOP. If there is no PARALLEL clause, index creation is done serially.

When creating an index in parallel, the STORAGE clause refers to the storage of each of the subindexes created by the query server processes. Therefore, an index created with an INITIAL value of 5 MB and a DOP of 12 consumes at least 60 MB of storage during index creation because each process starts with an extent of 5 MB. When the query coordinator process combines the sorted subindexes, some extents might be trimmed, and the resulting index might be smaller than the requested 60 MB.

When you add or enable a UNIQUE or PRIMARY KEY constraint on a table, you cannot automatically create the required index in parallel. Instead, manually create an index on the desired columns, using the CREATE INDEX statement and an appropriate PARALLEL clause, and then add or enable the constraint. Oracle Database then uses the existing index when enabling or adding the constraint.

Multiple constraints on the same table can be enabled concurrently and in parallel if all the constraints are in the ENABLE NOVALIDATE state. In the following example, the ALTER TABLE ENABLE CONSTRAINT statement performs the table scan that checks the constraint in parallel:

CREATE TABLE a (a1 NUMBER CONSTRAINT ach CHECK (a1 > 0) ENABLE NOVALIDATE)
PARALLEL; 
INSERT INTO a values (1);
COMMIT;
ALTER TABLE a ENABLE CONSTRAINT ach;

8.8.6 Parallel DML Tips

The tips for parallel DML functionality are introduced in this topic.

The topics covered include:

See Also:

8.8.6.1 Parallel DML Tip 1: INSERT

Parallel DML when using the SQL INSERT statement is discussed in this topic.

The functionality available using an INSERT statement can be summarized as shown in Table 8-5:

Table 8-5 Summary of INSERT Features

Insert Type Parallel Serial NOLOGGING

Conventional

No

See text in this section for information about using the NOAPPEND hint with parallel DML enabled to perform a parallel conventional insert.

Yes

No

Direct-path

INSERT

(APPEND)

Yes, but requires

ALTER SESSION ENABLE PARALLEL DML or the ENABLE_PARALLEL_DML SQL hint to enable PARALLEL DML mode

and one of the following:

  • Table PARALLEL attribute or PARALLEL hint to explicitly set parallelism

  • APPEND hint to explicitly set mode

Or the following

ALTER SESSION FORCE PARALLEL DML to force PARALLEL DML mode

Yes, but requires:

APPEND hint

Yes, but requires:

NOLOGGING attribute set for partition or table

If parallel DML is enabled and there is a PARALLEL hint or PARALLEL attribute set for the table in the data dictionary, then insert operations are parallel and appended, unless a restriction applies. If either the PARALLEL hint or PARALLEL attribute is missing, the insert operation is performed serially. Automatic DOP only parallelizes the DML part of a SQL statement if and only if parallel DML is enabled or forced.

If parallel DML is enabled, then you can use the NOAPPEND hint to perform a parallel conventional insert operation. For example, you can use /*+ noappend parallel */ with the SQL INSERT statement to perform a parallel conventional insert.

SQL> INSERT /*+ NOAPPEND PARALLEL */ INTO sales_hist SELECT * FROM sales;

The advantage of the parallel conventional insert operation is the ability to perform online operations with none of the restrictions of direct-path INSERT. The disadvantage of the parallel conventional insert operation is that this process may be slower than direct-path INSERT.

8.8.6.2 Parallel DML Tip 2: Direct-Path INSERT

Parallel DML when using Direct-Path INSERT operations is discussed in this topic.

The append mode is the default during a parallel insert operation. Data is always inserted into a new block, which is allocated to the table. Using the APPEND hint is optional. You should use append mode to increase the speed of INSERT operations, but not when space utilization must be optimized. You can use NOAPPEND to override append mode.

The APPEND hint applies to both serial and parallel insert operation. Serial insertions are also faster if you use this hint. The APPEND hint, however, does require more space and locking overhead.

You can use NOLOGGING with APPEND to make the process even faster. NOLOGGING means that no redo log is generated for the operation. NOLOGGING is never the default; use it when you want to optimize performance. It should not typically be used when recovery is needed for the table or partition. If recovery is needed, be sure to perform a backup immediately after the operation. Use the ALTER TABLE [NO]LOGGING statement to set the appropriate value.

8.8.6.3 Parallel DML Tip 3: Parallelizing INSERT, MERGE, UPDATE, and DELETE

Parallel DML when using insert, merge, update, and delete operations is discussed in this topic.

When the table or partition has the PARALLEL attribute in the data dictionary, that attribute setting is used to determine parallelism of INSERT, UPDATE, and DELETE statements and queries. An explicit PARALLEL hint for a table in a statement overrides the effect of the PARALLEL attribute in the data dictionary.

You can use the NO_PARALLEL hint to override a PARALLEL attribute for the table in the data dictionary. In general, hints take precedence over attributes.

DML operations are considered for parallelization if the session has been enabled in the PARALLEL DML mode with the ALTER SESSION ENABLE PARALLEL DML statement or a specific SQL statement has been enabled in the PARALLEL DML mode with the ENABLE_PARALLEL_DML hint. The mode does not affect parallelization of queries or of the query portions of a DML statement.

8.8.6.3.1 Parallelizing INSERT SELECT

In the INSERT ... SELECT statement, you can specify a PARALLEL hint after the INSERT keyword, in addition to the hint after the SELECT keyword.

The PARALLEL hint after the INSERT keyword applies to the INSERT operation only, and the PARALLEL hint after the SELECT keyword applies to the SELECT operation only. Thus, parallelism of the INSERT and SELECT operations are independent of each other. If one operation cannot be performed in parallel, it has no effect on whether the other operation can be performed in parallel.

The ability to parallelize insert operations causes a change in existing behavior if the user has explicitly enabled the session for parallel DML and if the table in question has a PARALLEL attribute set in the data dictionary entry. In that case, existing INSERT SELECT statements that have the select operation parallelized can also have their insert operation parallelized.

If you query multiple tables, you can specify multiple SELECT PARALLEL hints and multiple PARALLEL attributes.

Example 8-5 shows the addition of the new employees who were hired after the acquisition of ACME.

Example 8-5 Parallelizing INSERT SELECT

INSERT /*+ PARALLEL(employees) */ INTO employees
SELECT /*+ PARALLEL(ACME_EMP) */ *  FROM ACME_EMP;

The APPEND keyword is not required in this example because it is implied by the PARALLEL hint.

8.8.6.3.2 Parallelizing UPDATE and DELETE

The PARALLEL hint (placed immediately after the UPDATE or DELETE keyword) applies not only to the underlying scan operation, but also to the UPDATE or DELETE operation.

Alternatively, you can specify UPDATE or DELETE parallelism in the PARALLEL clause specified in the definition of the table to be modified.

If you have explicitly enabled parallel DML for the session or transaction, UPDATE or DELETE statements that have their query operation parallelized can also have their UPDATE or DELETE operation parallelized. Any subqueries or updatable views in the statement can have their own separate PARALLEL hints or clauses, but these parallel directives do not affect the decision to parallelize the update or delete. If these operations cannot be performed in parallel, it has no effect on whether the UPDATE or DELETE portion can be performed in parallel.

Example 8-6 shows the update operation to give a 10 percent salary raise to all clerks in Dallas.

Example 8-6 Parallelizing UPDATE and DELETE

UPDATE /*+ PARALLEL(employees) */ employees
 SET salary=salary * 1.1 WHERE job_id='CLERK' AND department_id IN
  (SELECT department_id FROM DEPARTMENTS WHERE location_id = 'DALLAS');

The PARALLEL hint is applied to the UPDATE operation and to the scan.

Example 8-7 shows the removal of all products of category 39 because that business line was recently spun off into a separate company.

Example 8-7 Parallelizing UPDATE and DELETE

DELETE /*+ PARALLEL(PRODUCTS) */ FROM PRODUCTS 
  WHERE category_id = 39;

Again, the parallelism is applied to the scan and UPDATE operations on the table employees.

8.8.7 Incremental Data Loading in Parallel

Parallel DML combined with the updatable join views facility provides an efficient solution for refreshing the tables of a data warehouse system.

To refresh tables is to update them with the differential data generated from the OLTP production system.

In the following example, assume a refresh of a table named customers that has columns c_key, c_name, and c_addr. The differential data contains either new rows or rows that have been updated since the last refresh of the data warehouse. In this example, the updated data is shipped from the production system to the data warehouse system by means of ASCII files. These files must be loaded into a temporary table, named diff_customer, before starting the refresh process. You can use SQL*Loader with both the parallel and direct options to efficiently perform this task. You can use the APPEND hint when loading in parallel as well.

After diff_customer is loaded, the refresh process can be started. It can be performed in two phases or by merging in parallel, as demonstrated in the following:

8.8.7.1 Optimizing Performance for Updating the Table in Parallel

How to optimize performance for updating a table in parallel is discussed in this topic.

The following statement is a straightforward SQL implementation of the update using subqueries:

UPDATE customers SET(c_name, c_addr) = (SELECT c_name, c_addr
  FROM diff_customer WHERE diff_customer.c_key = customer.c_key)
  WHERE c_key IN(SELECT c_key FROM diff_customer);

Unfortunately, the two subqueries in this statement affect performance.

An alternative is to rewrite this query using updatable join views. To rewrite the query, you must first add a primary key constraint to the diff_customer table to ensure that the modified columns map to a key-preserved table:

CREATE UNIQUE INDEX diff_pkey_ind ON diff_customer(c_key) PARALLEL NOLOGGING;

ALTER TABLE diff_customer ADD PRIMARY KEY (c_key);

You can then update the customers table with the following SQL statement:

UPDATE /*+ PARALLEL(cust_joinview) */
  (SELECT /*+ PARALLEL(customers) PARALLEL(diff_customer) */
  CUSTOMER.c_name AS c_name CUSTOMER.c_addr AS c_addr,
   diff_customer.c_name AS c_newname, diff_customer.c_addr AS c_newaddr
   FROM diff_customer
   WHERE customers.c_key = diff_customer.c_key) cust_joinview
   SET c_name = c_newname, c_addr = c_newaddr;

The underlying scans feeding the join view cust_joinview are done in parallel. You can then parallelize the update to further improve performance, but only if the customers table is partitioned.

8.8.7.2 Efficiently Inserting the New Rows into the Table in Parallel

How to efficiently insert new rows into a table in parallel is discussed in this topic.

The last phase of the refresh process consists of inserting the new rows from the diff_customer temporary table to the customers table. Unlike the update case, you cannot avoid having a subquery in the INSERT statement:

INSERT /*+PARALLEL(customers)*/ INTO customers SELECT * FROM diff_customer s);

However, you can guarantee that the subquery is transformed into an anti-hash join by using the HASH_AJ hint. Doing so enables you to use parallel INSERT to execute the preceding statement efficiently. Parallel INSERT is applicable even if the table is not partitioned.

8.8.7.3 Optimizing Performance by Merging in Parallel

How to optimize performance by merging in parallel is discussed in this topic.

You can combine update and insert operations into one statement, commonly known as a merge, as shown in the following example.

MERGE INTO customers USING diff_customer
ON (diff_customer.c_key = customer.c_key) WHEN MATCHED THEN
  UPDATE SET (c_name, c_addr) = (SELECT c_name, c_addr 
  FROM diff_customer WHERE diff_customer.c_key = customers.c_key) 
WHEN NOT MATCHED THEN
 INSERT VALUES (diff_customer.c_key,diff_customer.c_data);

The SQL statement in the previous example achieves the same result as all of the statements in Optimizing Performance for Updating the Table in Parallel and Efficiently Inserting the New Rows into the Table in Parallel.