|Oracle8i Data Warehousing Guide
Release 2 (8.1.6)
Part Number A76994-01
The following topics provide information about transporting data into a data warehouse:
Transportation is, literally, the act of moving data from one system to another system. In a data warehouse environment, the most common requirements for transportation are in moving data from a source system to a staging database or a data warehouse database; from a staging database to a data warehouse; or from a data warehouse to a data mart.
Transportation is often one of the simplest portions of the ETT process, and is commonly integrated with other portions of the process. For example, as shown in Chapter 11, "Extraction", distributed query technology provides a mechanism for both extracting and transporting data.
There are various techniques for transporting data, each with its own distinct advantages. This chapter introduces some of these techniques, and includes:
The most common method for transporting data is by the transfer of flat files, using mechanisms such as FTP or other remote file-system access protocols. Data is unloaded or exported from the source system into flat files using techniques discussed in Chapter 11, "Extraction", and is then transported to the target platform using FTP or similar mechanisms.
Because source systems and data warehouses often use different operating systems and database systems, flat files are often the simplest mechanism to exchange data between heterogeneous systems with minimal transformations. However, even when transporting data between homogeneous systems, flat files are often the most efficient and most easy-to-manage mechanism for data transfer.
Distributed queries and gateways can be an effective mechanism for extracting data. These mechanisms also transport the data directly to the target systems, thus providing both extracting and transformation in a single step. For relatively small volumes of data, these mechanisms are very well-suited for both extraction and transformation. See Chapter 11, "Extraction", for further details.
Oracle8i introduced an important mechanism for transporting data: transportable tablespaces. This feature is the fastest mechanism for moving large volumes of data between two Oracle databases.
Previous to Oracle8i, the most scalable data-transportation mechanisms relied on moving flat files containing raw data. These mechanisms required that data be unloaded or exported into files from the source database, and, after transportation, that these files be loaded or imported into the target database. Transportable tablespaces entirely bypass the unload and reload steps.
Using transportable tablespaces, Oracle data files (containing table data, indexes, and almost every other Oracle database object) can be directly transported from one database to another. Furthermore, like import and export, transportable tablespaces provide a mechanism for transporting metadata in addition to transporting data.
Transportable tablespaces have some notable limitations: source and target systems must be running Oracle8i (or higher), must be running the same OS, must use the same character set, and must have the same block size. Nevertheless, despite these limitations, transportable tablespaces can be an invaluable data-transportation technique in many warehouse environments.
The most common applications of transportable tablespaces in data warehouses are in moving data from a staging database to a data warehouse, or in moving data from a data warehouse to a data mart.
See Oracle8i Concepts for more information on transportable tablespaces.
Suppose that you have a data warehouse containing sales data, and several data marts which are refreshed monthly. Also suppose that you are going to move one month of sales data from the data warehouse to the data mart.
The current month's data needs to be placed into a separate tablespace in order to be transported. In this example, you have a tablespace TS_SALES_TEMP, which will hold a copy of the current month's data. Using the CREATE TABLE AS SELECT statement, the current month's data can be efficiently copied to this tablespace:
CREATE TABLE temp_jan_sales UNRECOVERABLE TABLESPACE ts_temp_sales AS SELECT * FROM sales WHERE sales_date BETWEEN '31-DEC-1999' AND '01-FEB-2000';
Following this create-table operation, the tablespace TS_TEMP_SALES is set to read only:
A tablespace cannot be transported unless there are no active transactions modifying the tablespace; setting the tablespace to read only enforces this.
The tablespace TS_TEMP_SALES may be a tablespace that has been especially created to temporarily store data for use by the transportable tablespace features. Following "Step 3: Copy the Datafiles and Export File to the Target System", this tablespace can be set to read-write, and, if desired, the table TEMP_JAN_SALES can be deleted, and the tablespace can be re-used for other transportations or for other purposes.
In a given transportable tablespace operation, all of the objects in a given tablespace are transported. Although only one table is being transported in this example, the tablespace TS_TEMP_SALES could contain multiple tables. For example, perhaps the data mart is refreshed not only with the new month's worth of sales transactions, but also with a new copy of the customer table. Both of these tables could be transported in the same tablespace. Moreover, this tablespace could also contain other database objects such as indexes, which would also be transported.
Additionally, in a given transportable-tablespace operation, multiple tablespaces can be transported at the same time. This makes it easier to move very large volumes of data between databases. Note, however, that the transportable tablespace feature can only transport a set of tablespaces which contain an complete set of database objects without dependencies on other tablespaces. For example, an index cannot be transported without its table, nor can a partition be transported without the rest of the table.
In this step, we have copied the January sales data into a separate tablespace; however, in some cases, it may be possible to leverage the transportable tablespace feature without even moving data to a separate tablespace. If the sales table has been partitioned by month in the data warehouse and if each partition is in its own tablespace, then it may be possible to directly transport the tablespace containing the January data. Suppose the January partition, sales_jan2000, is located in the tablespace TS_SALES_JAN2000. Then the tablespace TS_SALES_JAN2000 could potentially be transported, rather than creating a temporary copy of the January sales data in the TS_TEMP_SALES.
However, two conditions must be satisfied in order to transport the tablespace TS_SALES_JAN2000. First, this tablespace must be set to read only. Second, since a single partition of a partitioned table cannot be transported without the remainder of the partitioned table also being transported, it is necessary to exchange the January partition into a separate table (using the ALTER TABLE statement), in order to transport the January data. The EXCHANGE operation is very quick; however, the January data will no longer be a part of the underlying SALES table, and thus may be unavailable to users until this data is exchanged back into the SALES table. The January data can be exchanged back into the SALES table following step 3.
The EXPORT utility is used to export the metadata describing the objects contained in the transported tablespace. For our example scenario, the EXPORT command could be:
This operation will generate an export file, jan_sales.dmp. The export file will be small, because it only contains metadata. In this case, the export file will contain information describing the table TEMP_JAN_SALES, such as the column names, column datatype, and all other information which the target Oracle database will need in order to access the objects in TS_TEMP_SALES.
The data files that make up TS_TEMP_SALES, as well as the export file jan_sales.dmp, should be copied to the data mart platform, using any transportation mechanism for flat files.
Once the datafiles have been copied, the tablespace TS_TEMP_SALES can be set to READ WRITE mode if desired.
Once the files have been copied to the data mart, the metadata should be imported into the data mart:
Like the export operation in step 2, the import operation in this step should be very fast.
At this point, the tablespace TS_TEMP_SALES and the table TEMP_SALES_JAN are accessible in the data mart.
There are two potential scenarios for incorporating this new data into the data mart's tables.
First, the data from the TEMP_SALES_JAN table could be inserted into the data mart's sales table:
Following this operation, the temp_sales_jan table (and even the entire ts_temp_sales tablespace) could be deleted.
Second, if the data mart's sales table is partitioned by month, then the new transported tablespace and the TEMP_SALES_JAN table could become a permanent part of the data mart. The TEMP_SALES_JAN table could become a partition of the data mart's sales table:
ALTER TABLE sales ADD PARTITION sales_00jan VALUES LESS THAN (TO_DATE('01-feb-2000','dd-mon-yyyy')); ALTER TABLE sales EXCHANGE PARTITION sales_00jan WITH TABLE temp_sales_jan INCLUDING INDEXES WITH VALIDATION;
The above example illustrated a typical scenario for transporting data in a data warehouse. However, transportable tablespaces can be used for many other purposes. In a data warehousing environment, transportable tablespaces should be viewed as a utility (much like import/export or SQL*Loader), whose purpose is to move large volumes of data between Oracle databases. When used in conjunction with parallel data-movement operations such as the CREATE TABLE AS SELECT and INSERT AS SELECT statements, transportable tablespaces provide an important mechanism for quickly transporting data for many purposes.