Parallel DML combined with the updatable join views facility provides an efficient solution for refreshing the tables of a data warehouse system. To refresh tables is to update them with the differential data generated from the OLTP production system.
In the following example, assume a refresh of a table named
customers that has columns
c_addr. The differential data contains either new rows or rows that have been updated since the last refresh of the data warehouse. In this example, the updated data is shipped from the production system to the data warehouse system by means of ASCII files. These files must be loaded into a temporary table, named
diff_customer, before starting the refresh process. You can use SQL*Loader with both the parallel and direct options to efficiently perform this task. You can use the
APPEND hint when loading in parallel as well.
diff_customer is loaded, the refresh process can be started. It can be performed in two phases or by merging in parallel, as demonstrated in the following:
The following statement is a straightforward SQL implementation of the update using subqueries:
UPDATE customers SET(c_name, c_addr) = (SELECT c_name, c_addr FROM diff_customer WHERE diff_customer.c_key = customer.c_key) WHERE c_key IN(SELECT c_key FROM diff_customer);
Unfortunately, the two subqueries in this statement affect performance.
An alternative is to rewrite this query using updatable join views. To rewrite the query, you must first add a primary key constraint to the
diff_customer table to ensure that the modified columns map to a key-preserved table:
CREATE UNIQUE INDEX diff_pkey_ind ON diff_customer(c_key) PARALLEL NOLOGGING; ALTER TABLE diff_customer ADD PRIMARY KEY (c_key);
You can then update the
customers table with the following SQL statement:
UPDATE /*+ PARALLEL(cust_joinview) */ (SELECT /*+ PARALLEL(customers) PARALLEL(diff_customer) */ CUSTOMER.c_name AS c_name CUSTOMER.c_addr AS c_addr, diff_customer.c_name AS c_newname, diff_customer.c_addr AS c_newaddr FROM diff_customer WHERE customers.c_key = diff_customer.c_key) cust_joinview SET c_name = c_newname, c_addr = c_newaddr;
The underlying scans feeding the join view
cust_joinview are done in parallel. You can then parallelize the update to further improve performance, but only if the
customers table is partitioned.
The last phase of the refresh process consists of inserting the new rows from the
diff_customer temporary table to the
customers table. Unlike the update case, you cannot avoid having a subquery in the
INSERT /*+PARALLEL(customers)*/ INTO customers SELECT * FROM diff_customer s);
However, you can guarantee that the subquery is transformed into an anti-hash join by using the
HASH_AJ hint. Doing so enables you to use parallel
INSERT to execute the preceding statement efficiently. Parallel
INSERT is applicable even if the table is not partitioned.
You can combine update and insert operations into one statement, commonly known as a merge. The following statement achieves the same result as all of the statements in "Optimizing Performance for Updating the Table in Parallel" and "Efficiently Inserting the New Rows into the Table in Parallel":
MERGE INTO customers USING diff_customer ON (diff_customer.c_key = customer.c_key) WHEN MATCHED THEN UPDATE SET (c_name, c_addr) = (SELECT c_name, c_addr FROM diff_customer WHERE diff_customer.c_key = customers.c_key) WHEN NOT MATCHED THEN INSERT VALUES (diff_customer.c_key,diff_customer.c_data);