5
Parallelism and Partitioning

Data warehouses often contain large tables, and require techniques for both managing these large tables and providing good query performance across these large tables. This chapter discusses two key techniques for addressing these needs.

Parallel execution dramatically reduces response time for data-intensive operations on large databases typically associated with decision support systems (DSS). You can also implement parallel execution on certain types of online transaction processing (OLTP) and hybrid systems.

Note:
Parallel execution is available only with the Oracle8i Enterprise Edition.

This chapter explains how to implement parallel execution and tune your system to optimize parallel execution performance. The following topics are discussed:

Overview of Parallel Execution Tuning

Parallel execution is useful for many types of operations accessing significant amounts of data. Parallel execution improves processing for:

Large table scans and joins
Creation of large indexes
Partitioned index scans
Bulk inserts, updates, and deletes
Aggregations and copying

You can also use parallel execution to access object types within an Oracle database. For example, you can use parallel execution to access LOBs (large binary objects).

Parallel execution benefits systems if they have all of the following characteristics:

Symmetric multi-processors (SMP), clusters, or massively parallel systems
Sufficient I/O bandwidth
Underutilized or intermittently used CPUs (for example, systems where CPU usage is typically less than 30%)
Sufficient memory to support additional memory-intensive processes such as sorts, hashing, and I/O buffers

If your system lacks any of these characteristics, parallel execution may not significantly improve performance. In fact, parallel execution can reduce system performance on overutilized systems or systems with small I/O bandwidth.

Note:
The term parallel execution server designates server processes, or threads on NT systems, that perform parallel operations. This is not the same as the Oracle Parallel Server option, which refers to multiple Oracle instances accessing the same database.

When to Implement Parallel Execution

Parallel execution provides the greatest performance improvements in decision support systems (DSS). Online transaction processing (OLTP) systems also benefit from parallel execution, but usually only during batch processing.

During the day, most OLTP systems should probably not use parallel execution. During off-hours, however, parallel execution can effectively process high-volume batch operations. For example, a bank might use parallelized batch programs to perform millions of updates to apply interest to accounts.

The more common example of using parallel execution is for DSS. Complex queries, such as those involving joins of several tables or searches of very large tables, are often best executed in parallel.

See Chapter 18, "Tuning Parallel Execution", for further information regarding parallel execution.

Tuning Physical Database Layouts

This section describes how to tune the physical database layout for optimal performance of parallel execution. The following topics are discussed:

Types of Parallelism

Different parallel operations use different types of parallelism. The optimal physical database layout depends on what parallel operations are most prevalent in your application.

The basic unit of parallelism is a called a granule. The operation being parallelized (a table scan, table update, or index creation, for example) is divided by Oracle into granules. Parallel execution processes execute the operation one granule at a time. The number of granules and their size affect the degree of parallelism (DOP) you can use. It also affects how well the work is balanced across query server processes.

Block Range Granules

Block range granules are the basic unit of most parallel operations. This is true even on partitioned tables; it is the reason why, on Oracle, the parallel degree is not related to the number of partitions.

Block range granules are ranges of physical blocks from a table. Because they are based on physical data addresses, Oracle can size block range granules to allow better load balancing. Block range granules permit dynamic parallelism that does not depend on static preallocation of tables or indexes. On SMP (symmetric multi-processors) systems, granules are located on different devices to drive as many disks as possible. On many MPP (massively parallel processing) systems, block range granules are preferentially assigned to query server processes that have physical proximity to the disks storing the granules. Block range granules are also used with global striping.

When block range granules are used predominantly for parallel access to a table or index, administrative considerations (such as recovery or using partitions for deleting portions of data) may influence partition layout more than performance considerations. The number of disks that you stripe partitions over should be at least equal to the value of the DOP so that parallel execution efficiency is not reduced when or if partition pruning occurs.

Partition Granules

When partition granules are used, a query server process works on an entire partition or subpartition of a table or index. Because partition granules are statically determined when a table or index is created, partition granules do not allow as much flexibility in parallelizing an operation. This means that the allowable DOP might be limited, and that load might not be well balanced across query server processes.

Partition granules are the basic unit of parallel index range scans and parallel operations that modify multiple partitions of a partitioned table or index. These operations include parallel update, parallel delete, parallel direct-load insert into partitioned tables, parallel creation of partitioned indexes, and parallel creation of partitioned tables.

When partition granules are used for parallel access to a table or index, it is important that there be a relatively large number of partitions (ideally, three times the DOP), so Oracle can effectively balance work across the query server processes.

Partitioning Data

This section describes the partitioning features that significantly enhance data access and greatly improve overall applications performance. This is especially true for applications accessing tables and indexes with millions of rows and many gigabytes of data.

Partitioned tables and indexes facilitate administrative operations by allowing these operations to work on subsets of data. For example, you can add a new partition, organize an existing partition, or drop a partition with less than a second of interruption to a read-only application.

Using the partitioning methods described in this section can help you tune SQL statements to avoid unnecessary index and table scans (using partition pruning). You can also improve the performance of massive join operations when large amount of data (for example, several million rows) are joined together by using partition-wise joins. Finally, partitioning data greatly improves manageability of very large databases and dramatically reduces the time required for administrative tasks such as backup and restore.

Types of Partitioning

Oracle offers three partitioning methods:

Range
Hash
Composite

Each partitioning method has a different set of advantages and disadvantages. Thus, each method is appropriate for a particular situation.

Range Partitioning

Range partitioning maps data to partitions based on boundaries identified by ranges of column values that you establish for each partition. This method is often useful for applications that manage historical data, especially data warehouses.

Hash Partitioning

Hash partitioning maps data to partitions based on a hashing algorithm that Oracle applies to a partitioning key identified by the user. The hashing algorithm evenly distributes rows among partitions. Therefore, the resulting set of partitions should be approximately of the same size. This makes hash partitioning ideal for distributing data evenly across devices. Hash partitioning is also a good and easy-to-use alternative to range partitioning when data is not historical in content.

Note:
You cannot define alternate hashing algorithms for partitions.

Composite Partitioning

Composite partitioning combines the features of range and hash partitioning. With composite partitioning, Oracle first distributes data into partitions according to boundaries established by the partition ranges. Then Oracle further divides the data into subpartitions within each range partition. Oracle uses a hashing algorithm to distribute data into the subpartitions.

Index Partitioning

You can create both local and global indexes on a table partitioned by range, hash, or composite. Local indexes inherit the partitioning attributes of their related tables. For example, if you create a local index on a composite table, Oracle automatically partitions the local index using the composite method.

Oracle supports only range partitioning for global indexes. You cannot partition global indexes using the hash or composite partitioning methods.

Performance Issues for Range, Hash, and Composite Partitioning

The following section describes performance issues for range, hash, and composite partitioning.

Performance Considerations for Range Partitioning

Range partitioning is a convenient method for partitioning historical data. The boundaries of range partitions define the ordering of the partitions in the tables or indexes.

The most common use of range partitioning leverages the partitioning of data into time intervals on a column of type DATE. Because of this, SQL statements accessing range partitions tend to focus on time frames. An example of this is a SQL statement similar to "select data from a particular period in time". In such a scenario, if each partition represents one month's worth of data, the query "find data of month 98-DEC" needs to access only the December partition of year 98. This reduces the amount of data scanned to a fraction of the total data available. This optimization method is called 'partition pruning'.

Range partitioning is also ideal when you periodically load new data and purge old data. This adding or dropping of partitions is a major manageability enhancement.

It is common to keep a rolling window of data, for example keeping the past 36 months of data online. Range partitioning simplifies this process. To add a new month's data you load it into a separate table, clean the data, index it, and then add it to the range-partitioned table using the EXCHANGE PARTITION command, all while the table remains online. Once you add the new partition, you can drop the trailing month with the DROP PARTITION command.

In conclusion, consider using range partitioning when:

Very large tables are frequently scanned by a range predicate on a column that is a good partitioning column, such as ORDER_DATE or PURCHASE_DATE. Partitioning the table on that column would enable partitioning pruning.
You want to maintain a rolling window of data
You cannot complete administrative operations on large tables, such as backup and restore, in an allotted time frame
You need to implement parallel DML (PDML) operations

The following SQL example creates the table Sales for a period of two years, 1994 and 1995, and partitions it by range according to the column s_saledate to separate the data into eight quarters, each corresponding to a partition:

CREATE TABLE sales
  (s_productid NUMBER,
   s_saledate DATE,
   s_custid NUMBER,
   s_totalprice NUMBER)
PARTITION BY RANGE(s_saledate)
 (PARTITION sal94q1 VALUES LESS THAN TO_DATE (01-APR-1994, DD-MON-YYYY),
  PARTITION sal94q2 VALUES LESS THAN TO_DATE (01-JUL-1994, DD-MON-YYYY),
  PARTITION sal94q3 VALUES LESS THAN TO_DATE (01-OCT-1994, DD-MON-YYYY),
  PARTITION sal94q4 VALUES LESS THAN TO_DATE (01-JAN-1995, DD-MON-YYYY),
  PARTITION sal95q1 VALUES LESS THAN TO_DATE (01-APR-1995, DD-MON-YYYY),
  PARTITION sal95q2 VALUES LESS THAN TO_DATE (01-JUL-1995, DD-MON-YYYY),
  PARTITION sal95q3 VALUES LESS THAN TO_DATE (01-OCT-1995, DD-MON-YYYY),
  PARTITION sal95q4 VALUES LESS THAN TO_DATE (01-JAN-1996, DD-MON-YYYY));

Performance Considerations for Hash Partitioning

Unlike range partitioning, the way in which Oracle distributes data in hash partitions does not correspond to a business, or logical, view of the data. Therefore, hash partitioning is not an effective way to manage historical data. However, hash partitions share some performance characteristics of range partitions, such as using partition pruning is limited to equality predicates. You can also use partition-wise joins, parallel index access and PDML.

See Also:
Partition-wise joins are described later in this chapter under the heading "Partition-wise Joins".

As a general rule, use hash partitioning:

To improve the availability and manageability of large tables or to enable PDML, in tables that do not store historical data (where range partitioning is not appropriate).
To avoid data skew among partitions. Hash partitioning is an effective means of distributing data, because Oracle hashes the data into a number of partitions, each of which can reside on a separate device. Thus, data is evenly spread over as many devices as required to maximize I/O throughput. Similarly, you can use hash partitioning to evenly distribute data among the nodes of an MPP platform that uses the Oracle Parallel Server.
If it is important to use partition pruning and partition-wise joins according to a partitioning key.

Note:
In hash partitioning, partition pruning is limited to using equality or IN-list predicates.

If you add or merge a hashed partition, Oracle automatically rearranges the rows to reflect the change in the number of partitions and subpartitions. The hash function that Oracle uses is especially designed to limit the cost of this reorganization. Instead of reshuffling all the rows in the table, Oracles uses an 'add partition' logic that splits one and only one of the existing hashed partitions. Conversely, Oracle coalesces a partition by merging two existing hashed partitions.

Although this dramatically improves the manageability of hash partitioned tables, it means that the hash function can cause a skew if the number of partitions of a hash partitioned table, or the number of subpartitions in each partition of a composite table, is not a power of 2. If you do not quantify the number of partitions by a power of 2, in the worst case the largest partition can be twice the size of the smallest. So for optimal performance, create partitions, or subpartitions per partition, using a power of two. For example, 2, 4, 8, 16, 32, 64, 128, and so on.

The following example creates four hashed partitions for the table Sales using the column s_productid as the partition key:

CREATE TABLE sales
  (s_productid NUMBER,
   s_saledate DATE,
   s_custid NUMBER,
   s_totalprice NUMBER)
PARTITION BY HASH(s_productid)
PARTITIONS 4;

Specify the partition names only if you want some of the partitions to have different properties than the table. Otherwise, Oracle automatically generates internal names for the partitions. Also, you can use the STORE IN clause to assign partitions to tablespaces in a round-robin manner.

Performance Considerations for Composite Partitioning

Composite partitioning offers the benefits of both range and hash partitioning. With composite partitioning, Oracle first partitions by range, and then within each range Oracle creates subpartitions and distributes data within them using a hashing algorithm. Oracle uses the same hashing algorithm to distribute data among the hash subpartitions of composite partitioned tables as it does for hash partitioned tables.

Data placed in composite partitions is logically ordered only in terms of the partition boundaries you use to define the range level partitions. The partitioning of data within each partition has no logical organization beyond the identity of the partition to which the subpartitions belong.

Consequently, tables and local indexes partitioned using the composite method:

Support historical data at the partition level
Support the use of subpartitions as units of parallelism for parallel operations such as PDML, for example, space management and backup and recovery
Are eligible for partition pruning and partition-wise joins on the range and hash dimensions

Using Composite Partitioning

Use the composite partitioning method for tables and local indexes if:

Partitions must have a logical meaning to efficiently support historical data
The contents of a partition may be spread across multiple tablespaces, devices, or nodes (of an MPP system)
You need to use both partition pruning and partition-wise joins even when the pruning and join predicates use different columns of the partitioned table
You want to use a degree of parallelism that is greater than the number of partitions for backup, recovery, and parallel operations

Most large tables in a data warehouse should use range partitioning. Composite partitioning should be used for very large tables, or for data warehouses with a well-defined need for the conditions above. When using the composite method, Oracle stores each subpartition on a different segment. Thus, the subpartitions may have properties that are different from the properties of the table or the partition to which the subpartitions belong.

The following SQL example partitions the table Sales by range on the column s_saledate to create four partitions. This takes advantage of ordering data by a time frame. Then within each range partition, the data is further subdivided into four subpartitions by hash on the column s_productid.

CREATE TABLE sales(
  s_productid NUMBER,
  s_saledate DATE,
  s_custid NUMBER,
  s_totalprice)
   PARTITION BY RANGE (s_saledate)
   SUBPARTITION BY HASH (s_productid) SUBPARTITIONS 4
  (PARTITION sal94q1 VALUES LESS THAN TO_DATE (01-APR-1994, DD-MON-YYYY),
   PARTITION sal94q2 VALUES LESS THAN TO_DATE (01-JUL-1994, DD-MON-YYYY),
   PARTITION sal94q3 VALUES LESS THAN TO_DATE (01-OCT-1994, DD-MON-YYYY),
   PARTITION sal94q4 VALUES LESS THAN TO_DATE (01-JAN-1995, DD-MON-YYYY));

Each hashed subpartition contains sales of a single quarter ordered by product code. The total number of subpartitions is 16.

Partition Pruning

Partition pruning is a very important performance feature for data warehouses. In partition pruning, the cost-based optimizer analyzes FROM and WHERE clauses in SQL statements to eliminate unneeded partitions when building the partition access list. This allows Oracle to perform operations only on partitions relevant to the SQL statement. Oracle does this when you use range, equality, and IN-list predicates on the range partitioning columns, and equality and IN-list predicates on the hash partitioning columns.

Partition pruning can also dramatically reduce the amount of data retrieved from disk and reduce processing time. This results in substantial improvements in query performance and resource utilization. If you partition the index and table on different columns (with a global, partitioned index), partition pruning also eliminates index partitions even when the underlying table's partitions cannot be eliminated.

On composite partitioned objects, Oracle can prune at both the range partition level and hash subpartition level using the relevant predicates. For example, referring to the table Sales from the previous example, partitioned by range on the column
s_saledate and subpartitioned by hash on column s_productid, consider the following SQL statement:

SELECT * FROM sales 
WHERE s_saledate BETWEEN TO_DATE(01-JUL-1994, DD-MON-YYYY) AND
TO_DATE(01-OCT-1994, DD-MON-YYYY) AND s_productid = 1200;

Oracle uses the predicate on the partitioning columns to perform partition pruning as follows:

When using range partitioning, Oracle accesses only partitions sal94q2 and sal94q3
When using hash partitioning, Oracle accesses only the third partition, h3, where rows with s_productid equal to 1200 are mapped

Pruning Using DATE Columns

In the previous example, the date value was fully specified, 4 digits for year, using the TO_DATE function. While this is the recommended format for specifying date values, the optimizer can prune partitions using the predicates on s_saledate when you use other formats, as in the following examples:

SELECT * FROM sales 
WHERE s_saledate BETWEEN TO_DATE(01-JUL-1994, DD-MON-YY) AND
TO_DATE(01-OCT-1994, DD-MON-YY) AND s_productid = 1200;
   
SELECT * FROM sales 
WHERE s_saledate BETWEEN '01-JUL-1994' AND
'01-OCT-1994' AND s_productid = 1200;

However, you will not be able to see which partitions Oracle is accessing as is usually shown on the partition_start and partition_stop columns of the EXPLAIN PLAN command output on the SQL statement. Instead, you will see the keyword 'KEY' for both columns.

Avoiding I/O Bottlenecks

To avoid I/O bottlenecks, when Oracle is not scanning all partitions because some have been eliminated by pruning, spread each partition over several devices. On MPP systems, spread those devices over multiple nodes.

Partition-wise Joins

Partition-wise joins reduce query response time by minimizing the amount of data exchanged among query servers when joins execute in parallel. This significantly reduces response time and resource utilization, both in terms of CPU and memory. In Oracle Parallel Server (OPS) environments, it also avoids or at least limits the data traffic over the interconnect, which is the key to achieving good scalability for massive join operations.

There are two variations of partition-wise join, full and partial, as discussed under the following headings.

Full Partition-wise Joins

A full partition-wise join divides a large join into smaller joins between a pair of partitions from the two joined tables. To use this feature, you must equi-partition both tables on their join keys. For example, consider a large join between a sales table and a customer table on the column customerid. The query "find the records of all customers who bought more than 100 articles in Quarter 3 of 1994" is a typical example of a SQL statement performing such a join. The following is an example of this:

SELECT c_customer_name, COUNT(*)
FROM sales, customer 
  WHERE s_customerid = c_customerid 
     AND s_saledate BETWEEN TO_DATE(01-jul-1994, DD-MON-YYYY) AND 
  TO_DATE(01-oct-1994, DD-MON-YYYY)
GROUP BY c_customer_name HAVING
COUNT(*) > 100;

This is a very large join typical in data warehousing environments. The entire customer table is joined with one quarter of the sales data. In large data warehouse applications, it might mean joining millions of rows. The join method to use in that case is obviously a hash join. But you can reduce the processing time for this hash join even more if both tables are equi-partitioned on the customerid column. This enables a full partition-wise join.

When you execute a full partition-wise join in parallel, the granule of parallelism, as described under "Types of Parallelism", is a partition. As a result, the degree of parallelism is limited to the number of partitions. For example, you should have at least 16 partitions to set the degree of parallelism of the query to 16.

You can use various partitioning methods to equi-partition both tables on the column customerid with 16 partitions. These methods are described in the following subsections.

Hash - Hash

This is the simplest method: the Customer and Sales tables are both partitioned by hash into 16 partitions, on s_customerid and c_customerid respectively. This partitioning method should enable full partition-wise join when the tables are joined on the customerid column.

In serial, this join is performed between a pair of matching hash partitions at a time: when one partition pair has been joined, the join of another partition pair begins. The join completes when the 16 partition pairs have been processed.

Note:
A pair of matching hash partitions is defined as one partition from each table with the same partition number. For example, with full partition-wise joins we join partition 0 of Sales with partition 0 of customer, partition 1 of Sales with partition 1 of Customer, and so on.

Parallel execution of a full partition-wise join is a straightforward parallelization of the serial execution. Instead of joining one partition pair at a time, 16 partition pairs are joined in parallel by the 16 query servers. Figure 5-1 illustrates the parallel execution of a full partition-wise join.

Figure 5-1 Parallel Execution of A Full Partition-wise Join

In Figure 5-1, we assume that the degree of parallelism and the number of partitions are the same, in other words, 16 for both. It is possible to have more partitions than the degree of parallelism to improve load balancing and limit possible skew in the execution. If you have more partitions than query servers, when one query server is done with the join of one pair of partitions, it requests that the query coordinator give it another pair to join. This process repeats until all pairs have been processed. This method allows dynamic load balancing when the number of partition pairs is greater than the degree of parallelism, for example, 64 partitions with a degree of parallelism of 16.

Note:
Always use a number of partitions that is a multiple of the degree of parallelism.

In Oracle Parallel Server environments running on shared-nothing platforms or MPPs, partition placements on nodes is critical to achieving good scalability. To avoid remote I/O, both matching partitions should have affinity to the same node. Partition pairs should be spread over all nodes to avoid bottlenecks and to use all CPU resources available on the system.

You can, however, have node host multiple pairs when there are more pairs than nodes. For example, with an 8-node system and 16 partition pairs, each node should receive two pairs.

See Also:
For more information on data affinity, please refer to Oracle8i Parallel Server Concepts.

Composite - Hash

This method is a variation of the hash-hash method. The sales table is a typical example of a table storing historical data. For all the reasons mentioned under the heading "Performance Considerations for Range Partitioning", a more logical partitioning method for sales is probably the range method, not the hash method.

For example, assume you want to partition the Sales table by range on the column s_saledate into 8 partitions. Also assume you have two years' of data and each partition represents a quarter. Instead of range partitioning you can use composite to enable a full partition-wise join while preserving the partitioning on s_saledate. Do this by partitioning the Sales table by range on s_saledate and then by subpartitioning each partition by hash on s_customerid using 16 subpartitions per partition, for a total of 128 subpartitions. The customer table can still use hash partitioning with 16 partitions.

With that new partitioning method, a full partition-wise join works similarly to the hash/hash method. The join is still divided into 16 smaller joins between hash partition pairs from both tables. The difference is that now each hash partition in the Sales table is composed of a set of 8 subpartitions, one from each range partition.

Figure 5-2 illustrates how the hash partitions are formed in the Sales table. In it, each cell represents a subpartition. Each row corresponds to one range partition, for a total of 8 range partitions. Each range partition has 16 subpartitions. Symmetrically, each column on the figure corresponds to one hash partition for a total of 16 hash partitions; each hash partition has 8 subpartitions. Note that hash partitions can be defined only if all partitions have the same number of subpartitions, in this case, 16.

Hash partitions in a composite table are implicit. However, Oracle does not record them in the data dictionary, and you cannot manipulate them with DDL commands as you can range partitions.

Figure 5-2 Range and Hash Partitions of A Composite Table

This partitioning method is effective because it allows you to combine pruning (on s_salesdate) with a full partition-wise join (on customerid). In the previous example query, pruning is achieved by only scanning the subpartitions corresponding to Q3 of 1994, in other words, row number 3 on Figure 5-2. Oracle them joins these subpartitions with the customer table using a full partition-wise join.

All characteristics of the hash-hash method also apply to the composite-hash method. In particular for this example, these two points are common to both methods:

The degree of parallelism for this full partition-wise join cannot exceed 16. This is because even though the Sales table has 128 subpartitions, it has only 16 hash partitions.
The same rules for data placement on MPP systems apply here. The only difference is that a hash partition is now a collection of subpartitions. You must ensure that all these subpartitions are placed on the same node with the matching hash partition from the other table. For example, in Figure 5-2, you should store hash partition 9 of the Sales table shown by the eight circled subpartitions, on the same node as hash partition 9 of the Customer table.

Composite - Composite (Hash Dimension)

If needed, you can also partition the Customer table by composite. For example, you can partition it by range on a zip code column to enable pruning based on zip code. You should then subpartition it by hash on customerid to enable a partition-wise join on the hash dimension.

Range - Range

You can also use partition-wise joins for range partitioning. However, this is more complex to implement because you must know your data's distribution before performing the join. Furthermore, this can lead to data skew during the execution if you do not correctly identify the partition bounds so that you have partitions of equal size.

The basic principle for using range-range is the same as for hash-hash: you must equi-partition both tables. This means that the number of partitions must be the same and the partition bounds must be identical. For example, assume that you know in advance that you have 10 million customers, and the values for customerid vary from 1 to 10000000. In other words, you have possibly 10 million different values. To create 16 partitions, you can range partition both tables, Sales on
s_customerid and Customer on c_customerid. You should define partition bounds for both tables to generate partitions of the same size. In this example, partition bounds should be defined as 625001, 1250001, 1875001, ..., 10000001, so each partition contains 625000 rows.

Range - Composite, Composite - Composite (Range Dimension)

Finally, you can also subpartition one or both tables on another column. Therefore, the range/composite and composite/composite methods on the range dimension are also valid for enabling a full partition-wise join on the range dimension.

Partial Partition-wise Joins

Oracle can only perform partial partition-wise joins in parallel. Unlike full partition-wise joins, partial partition-wise joins require you to partition only one table on the join key, not both. The partitioned table is referred to as the reference table. The other table may or may not be partitioned. Partial partition-wise joins are more common than full partition-wise joins, because they require that you partition only one of the joined tables on the join key.

To execute a partial partition-wise join, Oracle dynamically repartitions the other table based on the partitioning of the reference table. Once the other table is repartitioned, the execution is similar to a full partition-wise join.

The performance advantage that partial partition-wise joins have over conventional parallel joins is that the reference table is not moved during the join operation. Conventional parallel joins require both input tables to be redistributed on the join key. This redistribution operation involves exchanging rows between query servers. This is a very CPU-intensive operation and can lead to excessive interconnect traffic in OPS environments. Partitioning large tables on a join key, either a foreign or primary key, prevents this re-distribution every time the table is joined on that key. Of course, if you choose a foreign key to partition the table, which is the most common scenario, select a foreign key that is involved in many queries.

To illustrate partial partition-wise joins, consider the previous Sales/Customer example. Assume that Customer is not partitioned or partitioned on a column other than c_customerid. Because Sales is often joined with Customer on customerid, and because this join dominates our application workload, partition Sales on
s_customerid to enable partial partition-wise join every time Customer and Sales are joined. As in full partition-wise join, we have several alternatives:

Hash

The simplest method to enable a partial partition-wise join is to partition Sales by hash on c_customerid. The number of partitions determines the maximum degree of parallelism, because the partition is the smallest granule of parallelism for partial partition-wise join operations.

The parallel execution of a partial partition-wise join is illustrated in Figure 5-3, "Partial Partition-wise Join", which assumes that both the degree of parallelism and the number of partitions of Sales are 16. The execution involves two sets of query servers: one set, labeled set 1 on the figure, scans the customer table in parallel. The granule of parallelism for the scan operation is a range of blocks.

Rows from Customer that are selected by the first set, in this case all rows, are redistributed to the second set of query servers by hashing customerid. For example, all rows in Customer that could have matching rows in partition H1 of Sales are sent to query server 1 in the second set. Rows received by the second set of query servers are joined with the rows from the corresponding partitions in Sales. Query server number 1 in the second set joins all Customer rows that it receives with partition H1 of Sales.

Figure 5-3 Partial Partition-wise Join

Considerations for full partition-wise joins also apply to partial partition-wise joins:

The degree of parallelism does not need to equal the number of partitions. In Figure 5-3, the query executes with 8 query server sets. In this case, Oracle assigns 2 partitions to each query server of the second set. Again, the number of partitions should always be a multiple of the degree of parallelism.
In Oracle Parallel Server environments on shared-nothing platforms (MPPs), each hash partition of sales should preferably have affinity to only one node to avoid remote I/Os. Also, spread partitions over all nodes to avoid bottlenecks and use all CPU resources available on the system. It is adequate for a node to host multiple partitions when there are more partitions than nodes.

See Also:
For more information on data affinity, please refer to Oracle8i Parallel Server Concepts.

Composite

As with full partition-wise joins, the prime partitioning method for the Sales table is to use the range method on column s_salesdate. This is because Sales is a typical example of a table that stores historical data. To enable a partial partition-wise join while preserving this range partitioning, you can subpartition Sales by hash on column s_customerid using 16 subpartitions per partition. Pruning and partial partition-wise joins can be used together if a query joins Customer and Sales and if the query has a selection predicate on s_salesdate.

When Sales is composite, the granule of parallelism for a partial-partition wise join is a hash partition and not a subpartition. Refer to Figure 5-2 for the definition of a hash partition in a composite table. Again, the number of hash partitions should be a multiple of the degree of parallelism. Also, on an MPP system, ensure that each hash partition has affinity to a single node. In the previous example, the 8 subpartitions composing a hash partition should have affinity to the same node.

Range

Finally, you can use range partitioning on s_customerid to enable a partial partition-wise join. This works similarly to the hash method, although it is not recommended. The resulting data distribution could be skewed if the size of the partitions differs. Moreover, this method is more complex to implement because it requires prior knowledge of the values of the partitioning column which is also a join key.

Benefits of Partition-wise Joins

Partition-wise joins offer benefits as described in this section:

Reduction of Communications Overhead

Partition-wise joins reduce communications overhead when they are executed in parallel. This is because in the default case, parallel execution of a join operation by a set of parallel execution servers requires the redistribution of each table on the join column into disjoint subsets of rows. These disjoint subsets of rows are then joined pair-wise by a single parallel execution server.

Oracle can avoid redistributing the partitions because the two tables are already partitioned on the join column. This allows each parallel execution server to join a pair of matching partitions.

This performance enhancement is even more noticeable in OPS configurations with internode parallel execution. This is because partition-wise joins can dramatically reduce interconnect traffic. Using this feature is an almost mandatory optimization measure for large DSS configurations that use OPS.

Currently, most OPS platforms, such as MPP and SMP clusters, provide limited interconnect bandwidths compared to their processing powers. Ideally, interconnect bandwidth should be comparable to disk bandwidth, but this is seldom the case. As a result, most join operations in OPS experience high interconnect latencies without this optimization.

Reduction of Memory Requirements

Partition-wise joins require less memory. In the case of serial joins, the join is performed on a pair of matching partitions at the same time. Thus, if data is evenly distributed across partitions, the memory requirement is divided by the number of partitions. In this case, there is no skew.

In the parallel case, it depends on the number of partition pairs that are joined in parallel. For example, if the degree of parallelism is 20 and the number of partitions is 100, 5 times less memory is required because only 20 joins of two partitions are performed at the same time. The fact that partition-wise joins require less memory has a direct effect on performance. For example, the join does not need to write blocks to disk during the build phase of a hash join.

Performance Considerations for Parallel Partition-wise Joins

The performance improvements from parallel partition-wise joins also come with disadvantages. The cost-based optimizer weighs the advantages and disadvantages when deciding whether to use partition-wise joins.

In the case of range partitioning, data skew may increase response time if the partitions are of different sizes. This is because some parallel execution servers take longer than others to finish their joins. Oracle recommends the use of hash (sub)partitioning to enable partition-wise joins because hash partitioning limits the risk of skew, assuming that the number of partitions is a power of 2.
The number of partitions used for partition-wise joins should, if possible, be a multiple of the number of query servers. With a degree of parallelism of 16, for example, it is acceptable to have 16, 32, or even 64 partitions (or subpartitions). But Oracle will serially execute the last phase of the join if the degree of parallelism is, for example, 17. This is because in the beginning of the execution, each parallel execution server works on a different partition pair. At the end of this first phase, only one pair is left. Thus, a single query server joins this remaining pair while all other query servers are idle.
Sometimes, parallel joins can cause remote I/Os. For example, on Oracle Parallel Server environments running on MPP configurations, if a pair of matching partitions is not collocated on the same node, a partition-wise join requires extra inter-node communication due to remote I/Os. This is because Oracle must transfer at least one partition to the node where the join is performed. In this case, it is better to explicitly redistribute the data than to use a partition-wise join.

5 Parallelism and Partitioning