Oracle® Database VLDB and Partitioning Guide 11g Release 2 (11.2) Part Number E16541-05 |
|
|
View PDF |
Parallel execution is the ability to apply multiple CPU and I/O resources to the execution of a single database operation. It dramatically reduces response time for data-intensive operations on large databases typically associated with decision support systems (DSS) and data warehouses. You can also implement parallel execution on OLTP system for batch processing or schema maintenance operations such as index creation. Parallel execution is sometimes called parallelism. Parallelism is the idea of breaking down a task so that, instead of one process doing all of the work in a query, many processes do part of the work at the same time. An example of this is when four processes combine to calculate the total sales for a year, each process handle one quarter of the year instead of a single processing handling all four quarters by itself. The improvement in performance can be quite significant. Parallel execution improves processing for:
Queries requiring large table scans, joins, or partitioned index scans
Creation of large indexes
Creation of large tables (including materialized views)
Bulk inserts, updates, merges, and deletes
You can also use parallel execution to access object types within an Oracle database. For example, you can use parallel execution to access large objects (LOBs).
This section contains the following topics:
Parallel execution benefits systems with all of the following characteristics:
Symmetric multiprocessors (SMPs), clusters, or massively parallel systems
Underutilized or intermittently used CPUs (for example, systems where CPU usage is typically less than 30%)
Sufficient memory to support additional memory-intensive processes, such as sorts, hashing, and I/O buffers
If your system lacks any of these characteristics, parallel execution might not significantly improve performance. In fact, parallel execution may reduce system performance on overutilized systems or systems with small I/O bandwidth.
The benefits of parallel execution can be seen in DSS and data warehousing environments. OLTP systems can also benefit from parallel execution during batch processing and during schema maintenance operations such as creation of indexes. The average simple DML or SELECT
statements that characterize OLTP applications would not see any benefit from being executed in parallel.
Parallel execution is not normally useful for:
Environments in which the typical query or transaction is very short (a few seconds or less). This includes most online transaction systems. Parallel execution is not useful in these environments because there is a cost associated with coordinating the parallel execution servers; for short transactions, the cost of this coordination may outweigh the benefits of parallelism.
Environments in which the CPU, memory, or I/O resources are heavily utilized. Parallel execution is designed to exploit additional available hardware resources; if no such resources are available, then parallel execution does not yield any benefits and indeed may be detrimental to performance.
Parallel execution is designed to effectively use multiple CPUs and disks to answer queries quickly. It is very I/O intensive by nature. To achieve optimal performance, each component in the hardware configuration must be sized to sustain the same level of throughput: from the CPUs and the Host Bus Adapters (HBAs) in the compute nodes, to the switches, and on into the I/O subsystem, including the storage controllers and the physical disks. If the system is an Oracle Real Application Cluster (Oracle RAC) then the interconnect also has to be size appropriately. The weakest link is going to limit the performance and scalability of operations in a configuration.
It is recommended to measure the maximum I/O performance a hardware configuration can achieve without the Oracle database. This measurement can be used as a baseline for the future system performance evaluations. Remember, it is not possible for parallel execution to achieve better I/O throughput then the underlying hardware can sustain. Oracle Database provides a free calibration tool called Orion, which is designed to measure the I/O performance of a system by simulating Oracle I/O workloads. A parallel execution typically performs large random I/Os.
You can use parallel execution for any of the following:
Access methods
Some examples are table scans, index fast full scans, and partitioned index range scans.
Join methods
Some examples are nested loop, sort merge, hash, and star transformation.
DDL statements
Some examples are CREATE
TABLE
AS
SELECT
, CREATE
INDEX
, REBUILD
INDEX
, REBUILD
INDEX
PARTITION
, and MOVE
/SPLIT
/COALESCE
PARTITION
.
You can normally use parallel DDL where you use regular DDL. There are, however, some additional details to consider when designing your database. One important restriction is that parallel DDL cannot be used on tables with object or LOB
columns.
All of these DDL operations can be performed in NOLOGGING
mode for either parallel or serial execution.
The CREATE
TABLE
statement for an index-organized table can be parallelized either with or without an AS
SELECT
clause.
Different parallelism is used for different operations. Parallel CREATE
(partitioned) TABLE
AS
SELECT
and parallel CREATE
(partitioned) INDEX
statements run with a degree of parallelism (DOP) equal to the number of partitions.
DML statements
Some examples are INSERT
AS
SELECT
, updates, deletes, and MERGE
operations.
Parallel DML (parallel insert, update, merge, and delete) uses parallel execution mechanisms to speed up or scale up large DML operations against large database tables and indexes. You can also use INSERT
... SELECT
statements to insert rows into multiple tables as part of a single DML statement. You can normally use parallel DML where you use regular DML.
Although data manipulation language normally includes queries, the term parallel DML refers only to inserts, updates, merges, and deletes done in parallel.
Parallel query
You can parallelize queries and subqueries in SELECT
statements, plus the query portions of DDL statements and DML statements (INSERT
, UPDATE
, DELETE
, and MERGE
).
Miscellaneous SQL operations
Some examples are GROUP
BY
, NOT
IN
, SELECT
DISTINCT
, UNION
, UNION
ALL
, CUBE
, and ROLLUP
, plus aggregate and table functions.
SQL*Loader
You can parallelize the use of SQL*Loader, where large amounts of data are routinely encountered. To speed up your loads, you can use a parallel direct-path load as in the following example:
sqlldr CONTROL=LOAD1.CTL DIRECT=TRUE PARALLEL=TRUE sqlldr CONTROL=LOAD2.CTL DIRECT=TRUE PARALLEL=TRUE sqlldr CONTROL=LOAD3.CTL DIRECT=TRUE PARALLEL=TRUE
Where you provide your userid and password on the command line. You can also use a parameter file to achieve the same result.
An important point to remember is that indexes are not maintained during a parallel load.