|
|
Using the Database Loader
The Commerce services include a DBLoader program that you can use to bulk load data into any table in a database, which is especially helpful for tasks like populating your product catalog. While you could use the browser-based Administration Tools to add new item or category data, one record at a time, this is impractical when you need to load hundreds or thousands of records. The DBLoader program is also useful if you want to load legacy data from an existing database into the Commerce database.
You can also use a database vendor's specific loader program such as Oracle SQL*Loader, or a data loader by a third-party company, to populate the database.
The topic includes the following sections:
The Input File for DBLoader
The Commerce services DBLoader program loads data that you provide in a text file into a database. Data is loaded one table at a time; create a separate input file for each table that you want to update.
The input data file is, by default, a comma-separated value (CSV) text file. The input file has the following structure:
First Row
The header of the file must identify:
For example, the header line might contain:
130,WLCS_PRODUCT
Second Row
The second row identifies the table column names into which you are loading data. You must include the primary key column or columns in the input file. Preface each primary key column name with an asterisk (*). Apart from primary keys in tables, all other columns are defaulted as NULL. Thus, you may omit column names where NULL is an acceptable value, and specify only those with non-NULL values.
For example, the second line of the input data file might contain:
*SKU,NAME,IN_STOCK,EST_SHIP_TIME,SPECIAL_NOTES,CREATION_DATE
Third Row
The third row specifies the data type of each column being loaded. See The Product Catalog Database Schema, for information about the Product Catalog schema and the datatypes used.
For example, the third line of the input data file might contain:
VARCHAR,VARCHAR,VARCHAR,VARCHAR,VARCHAR,DATE
Notes: On the data type line of the input file, it is not necessary to include the length of the data type, such as VARCHAR(20) or VARCHAR2(20). Simply use VARCHAR for strings. Use NUMBER instead of (for example) NUMBER(16,4). Use DOUBLE instead of DOUBLE PRECISION.
Fourth Through N Rows
All subsequent lines in the input data file contain the data values. The following is an example of a simple input file:
3,WLCS_PRODUCT
*SKU,NAME,IN_STOCK,EST_SHIP_TIME,SPECIAL_NOTES,CREATION_DATE
VARCHAR,VARCHAR,VARCHAR,VARCHAR,VARCHAR,DATE
P123,CoolKid,N,Out of stock until further notice,Special order only,02-Oct-2000
P124,FastKid,Y,One week,No special order,02-Oct-2000
P125,RadSneakers,Y,,regular stock,02-Oct-2000
Note: DATE column values should always be entered in the format DD-MMM-YYYY. It cannot be an empty string. Its values are either NULL or a valid date.
Empty input strings from the data file are inserted into database as empty strings. You must account for each unspecified column in the input record by including the delimiter character (by default, a comma) in the correct position (matching the position of the columns you listed in line 2, the column names). For example:
P125,RadSneakers,Y,,regular stock,02-Oct-2000
In the previous example a value for the fourth identified column (EST_SHIP_TIME) was not specified. This condition is fine because this column is not a primary key for the database record. The column's value is stored as an empty string.
Note: If your intention is to store a null value in the database for a non-primary-key column, you should enter NULL in the correct position for the column in that record. Do not enclose NULL in quotes; enclosing the word `NULL' in quotes will cause the column to be stored as a string.
The dbloader.properties File
The Commerce services DBLoader program uses a properties file named dbloader.properties to decide what driver, database, or login to use.
This file resides in the PORTAL_HOME directory. PORTAL_HOME is the directory where you installed WebLogic Portal.
Comment lines are prefixed with the # character. Both comment lines and blank lines are allowed.
The following table describes the values you can set in this property file.
Listing 3-1 shows a sample dbloader.properties file.
Listing 3-1 Sample dbloader.properties File
# THE PROPERTIES FILE FOR BEA CUSTOM DATA LOADER
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
# Examples of using JDBC drivers and establishing connections to your database
# 'Commerce' and 'CATALOG' are examples of database names
# Cloudscape driver connects to the sample database shipped with WebLogic
jdbcdriver=COM.cloudscape.core.JDBCDriver
connection=jdbc:cloudscape:Commerce
dblogin=none
dbpassword=none
# WebLogic jDriver for Oracle (OCI Driver). Uncomment to use:
#jdbcdriver=weblogic.jdbc.oci.Driver
#connection=jdbc:weblogic:oracle:sturney
#dblogin=FLAGSTAFF
#dbpassword=FLAGSTAFF
# Delimiter used in your data file
delimiter=,
# Character used to specify the primary key in your data file.
# This value will default to '*' if not specified.
primarykeyidentifier=*
# Table and column name for autoinsertion of current DATE.
timestamptable=wlcs_category
timestampfield=modified_date
# Allowable encoding values:
# If encoding=UCS2, unicode encoding used is Unicode
# If encoding=UTF8, unicode encoding used is UTF8
# If no encoding is specified, it's default to the platform encoding of the JDK
# For any other value specified, it is reset to default
# Note:
# 1. If encoding is UCS2, no change is needed to database or sql API
# (accessor/mutator)using WLS OCI driver.
# 2. If encoding is UTF8, no change is needed to database or mutator sql API.
# However, you have to getBinaryStream() with WLS OCI driver or Oracle thin driver.
# Cloudscape doesn't have proper sql API to get multibyte codes out of Cloudview
encoding=UTF8
# Committing transactions to the database after parsing every N number of data
# records. If the value is less than or equal to one, it'll be committing every # transaction.
commitTxn=50
Running the DBLoader Program
You use the loaddata script to run the DBLoader program.
Depending on the platform you are using, the script is in one of the following directories:
Note: PORTAL_HOME is the directory where you installed WebLogic Portal.
The loaddata script performs the following:
Before you can run the loaddata script, make sure that the set-environment script specifies the same database as the dbloader.properties file. The set-environment script resides in the same directory as the loaddata script. For example, if the dbloader.properties file uses `jdbc:cloudscape:Commerce' connections, then set-environment script should have SET DATABASE=CLOUDSCAPE.
As we mentioned earlier, DBLoader runs independently of the WebLogic Portal server. Therefore you do not need to stop the server if you are planning to run the loader. However, if you are running the WebLogic Portal server with a Cloudscape database, the database itself does not allow more than one connection at a time. In that case, you would need to stop the server.
If you are running the WebLogic Portal server with Oracle, then the drawback might be a slower performance for the time the data is being loaded into the database.
Note: You might want to back up the particular tables that you are about to update before running DBLoader. The DBLoader program does not keep history records in the database.
To Run the Program
The command to run the program has the following format:
prompt> loaddata { -insert | -update | -delete } input-file.csv
On UNIX systems, the loaddata.sh file needs to have its default protections set to include execute privilege. A typical way to do this is with the command:
$ chmod +x loaddata.sh
You must select one of the three possible operations: -insert, -update, or -delete.
For example:
prompt> loaddata -update category.csv
In the previous example, the DBLoader program will update rows in the product catalog database that match the primary keys specified in the category.csv input file.
To insert, update, or delete data in several tables, run the loaddata script separately for each table, providing the corresponding input filename as a parameter. The order of tables being updated should use the same data integrity rules as all other SQL statements. For example, insert rows into the parent table with the primary key constraint before inserting rows into the child table with the foreign key constraint.
DBLoader Log Files
The Commerce services DBLoader creates two audit trail logs:
If these files do not already exist, they are created. Otherwise, the existing audit trails are overwritten by each DBLoader operation. Both files reside in the same directory where you run the loaddata script.
The dbloader.log file contains the following information:
If any errors occurred during the attempted database load operation, the dbloader.err file captures the following information:
DBLoader Validations
The DBLoader program checks the number of columns affected by the load (as specified in the second line of the input data file) against the number of input columns in each record. Because the column delimiter is a comma (by default), this character is not allowed in a string input column. If extra commas are supplied inadvertently, such as punctuation in a LONG_DESC (Long Description) column, an error will result and is noted in the dbloader.err file. To avoid this type of error, carefully check the number of commas you are using to separate the input data column values. Or select a different delimiter character and specify it in the dbloader.properties file. For more information, see The dbloader.properties File.
All errors and exceptions are displayed in the console where the DBLoader program is running. Records with errors in them will be skipped, and the processing continues until the end of the file. (The program does not roll back a transaction if an error has occurred.)
Important Database Considerations
This section describes some important database considerations that you should keep in mind while using the DBLoader program.
Primary keys and unique indexes prevent the possibility of placing duplicate entries in the table. Foreign key constraints ensure referential integrity by making certain that the parent key already exists before allowing the child record to be written to the database. For example, a foreign key constraint exists between the WLCS_PRODUCT_CATEGORY table and its parent tables, the WLCS_PRODUCT and WLCS_CATEGORY table. This ensures that each SKU value and each CATEGORY_ID value in the WLCS_PRODUCT_CATEGORY table have corresponding entries in their respective parent tables (WLCS_PRODUCT and WLCS_CATEGORY). For example, the category record with a CATEGORY_ID value of "Men's Apparel" in the WLCS_PRODUCT_CATEGORY table requires that an entry exist in WLCS_CATEGORY, otherwise, an exception will be thrown. At the same time, a SKU of "Khakis" must exist in WLCS_PRODUCT.
Note: For every WLCS_PRODUCT and WLCS_CATEGORY table entry, a corresponding entry in the CATALOG_ENTITY table must also be made.
Because Cloudscape does not provide this functionality, run the loaddata script to delete records from child tables before deleting records from primary tables.
For related information, please see The Product Catalog Database Schema.
You can identify the default file encoding by checking the System property named file.encoding, as follows:
System.out.println(System.getProperty("file.encoding"));
If the file.encoding property differs from the encoding of the text data you want to process, then you must perform the conversion yourself.
Currently, the Java 2 SDK 1.2.2 and higher can convert several files encoding into Unicode 2.0 and higher. For details, see http://java.sun.com/products/jdk/1.2/docs/guide/internat/encoding.doc.html.
What this means to your development group:
Normally, you enter or extract data using the default encoding used by your operating system. Therefore, the case shown in item "a" in the previous list is the usual behavior.
Using Database-Specific Data Loaders
Most database management systems provide a data loader utility. In the case of Oracle, the data load utility is known as SQL*Loader. This section summarizes the capabilities of SQL*Loader. For details about SQL*Loader, see the Oracle 8i Utilities Guide. An online copy is available at http://technet.oracle.com/.
The examples used in this section are based on a simple ASCII file containing a few comma-separated values (SKU, IN_STOCK, VISIBLE, NAME) taken from a sample WLCS_PRODUCT table, which is described in The Product Catalog Database Schema.
Note: The example shown in this section does not use all of the columns from WLCS_PRODUCT. Your actual comma-separate values (CSV) file may contain more columns than we show here. We are merely attempting to show you how to conduct the import operation once you have the CSV file ready.
The name we will use for our sample data file is sample.csv. The contents of that file might look like the following (based on the columns we mentioned earlier - SKU, IN_STOCK, VISIBLE, NAME).
"0,3,4",0,0,"Growing Herbs from Seed, Cutting, and Root"
"0,2,1",0,0,"The Perfect Storm: A True Story of Men Against the Sea"
"0,2,2",0,0,"The Worst-Case Scenario Survival Handbook"
"0,4,0",0,0,"Acute Asthma: Assessment and Management"
"0,4,1",0,0,"Communications Technology Explained"
"0,4,2",0,0,"Modern Plastics Handbook"
Once you have your data file ready to populate the WLCS_PRODUCT table, you must create a control file which will be used by SQL*Loader. The control file identifies the data file to be read in, how the pieces of information are delimited in the file, and, of course, the actual column locations of the destination table. We will name our control file sample.ctl.
LOAD DATA
INFILE `sample.csv'
APPEND INTO TABLE WLCS_PRODUCT
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
(SKU, IN_STOCK, VISIBLE, NAME)
At a system prompt, invoke SQL*Loader with a command such as:
sqlldr userid=bea_systems/bea_systems control=sample.ctl log=sample.log
To review the results of your data load, see sample.log.
Using Third-Party Data Loaders
There are a variety of data loaders available on the market today to assist in the extraction and loading of information. Please be sure to research the use of these tools to ensure success within your environment.
The Data Movement module of DataStage provides a comprehensive data extraction, transformation, and loading toolset designed for building Operational Data Stores (ODS), data marts and enterprise data warehouses. For more information, see http://www.informix.com.
PowerConnect is Informatica's family of packaged software products that helps customers easily extract data and metadata from hard-to-access ERP and other legacy applications. This data is then delivered to PowerCenter, Informatica's data integration software hub, which provides robust capabilities for transforming the data and delivering it to downstream data warehouses, data marts and analytic applications. For more information, see http://www.informatic.com.
Data Junction is a visual design tool for rapidly integrating and transforming data between hundreds of applications and structured data formats. For more information, see http://www.datajunction.com.
ETI·EXTRACT moves and integrates data across the value chain and multiple business processes. The product automates the writing of programs that retrieve the data needed from any system, transform it and load it into any other system while capturing a complete history of that process. For more information, see http://www.eti.com.
|
Copyright © 2001 BEA Systems, Inc. All rights reserved.
|