BEA Logo BEA WebLogic Portal Release 4.0

  BEA Home  |  Events  |  Solutions  |  Partners  |  Products  |  Services  |  Download  |  Developer Center  |  WebSUPPORT

 

   WebLogic Portal Documentation   |   Product Catalog Management   |   Previous Topic   |   Next Topic   |   Contents   |   Index

Using the Database Loader

 

The Commerce services include a DBLoader program that you can use to bulk load data into any table in a database, which is especially helpful for tasks like populating your product catalog. While you could use the browser-based Administration Tools to add new item or category data, one record at a time, this is impractical when you need to load hundreds or thousands of records. The DBLoader program is also useful if you want to load legacy data from an existing database into the Commerce database.

You can also use a database vendor's specific loader program such as Oracle SQL*Loader, or a data loader by a third-party company, to populate the database.

The topic includes the following sections:

 


The Input File for DBLoader

The Commerce services DBLoader program loads data that you provide in a text file into a database. Data is loaded one table at a time; create a separate input file for each table that you want to update.

The input data file is, by default, a comma-separated value (CSV) text file. The input file has the following structure:

First Row

The header of the file must identify:

For example, the header line might contain:

130,WLCS_PRODUCT

Second Row

The second row identifies the table column names into which you are loading data. You must include the primary key column or columns in the input file. Preface each primary key column name with an asterisk (*). Apart from primary keys in tables, all other columns are defaulted as NULL. Thus, you may omit column names where NULL is an acceptable value, and specify only those with non-NULL values.

For example, the second line of the input data file might contain:

*SKU,NAME,IN_STOCK,EST_SHIP_TIME,SPECIAL_NOTES,CREATION_DATE

Third Row

The third row specifies the data type of each column being loaded. See The Product Catalog Database Schema, for information about the Product Catalog schema and the datatypes used.

For example, the third line of the input data file might contain:

VARCHAR,VARCHAR,VARCHAR,VARCHAR,VARCHAR,DATE

Notes: On the data type line of the input file, it is not necessary to include the length of the data type, such as VARCHAR(20) or VARCHAR2(20). Simply use VARCHAR for strings. Use NUMBER instead of (for example) NUMBER(16,4). Use DOUBLE instead of DOUBLE PRECISION.

Fourth Through N Rows

All subsequent lines in the input data file contain the data values. The following is an example of a simple input file:

3,WLCS_PRODUCT
*SKU,NAME,IN_STOCK,EST_SHIP_TIME,SPECIAL_NOTES,CREATION_DATE
VARCHAR,VARCHAR,VARCHAR,VARCHAR,VARCHAR,DATE
P123,CoolKid,N,Out of stock until further notice,Special order only,02-Oct-2000
P124,FastKid,Y,One week,No special order,02-Oct-2000
P125,RadSneakers,Y,,regular stock,02-Oct-2000

Note: DATE column values should always be entered in the format DD-MMM-YYYY. It cannot be an empty string. Its values are either NULL or a valid date.

Empty input strings from the data file are inserted into database as empty strings. You must account for each unspecified column in the input record by including the delimiter character (by default, a comma) in the correct position (matching the position of the columns you listed in line 2, the column names). For example:

P125,RadSneakers,Y,,regular stock,02-Oct-2000

In the previous example a value for the fourth identified column (EST_SHIP_TIME) was not specified. This condition is fine because this column is not a primary key for the database record. The column's value is stored as an empty string.

Note: If your intention is to store a null value in the database for a non-primary-key column, you should enter NULL in the correct position for the column in that record. Do not enclose NULL in quotes; enclosing the word `NULL' in quotes will cause the column to be stored as a string.

 


The dbloader.properties File

The Commerce services DBLoader program uses a properties file named dbloader.properties to decide what driver, database, or login to use.

This file resides in the PORTAL_HOME directory. PORTAL_HOME is the directory where you installed WebLogic Portal.

Comment lines are prefixed with the # character. Both comment lines and blank lines are allowed.

The following table describes the values you can set in this property file.

Property Name

Default Value

Description

jdbcdriver

COM.cloudscape.core.
JDBCDriver

Specify which JDBC driver to use to connect to your database. The default driver is the Cloudscape JDBC driver that ships with WebLogic.

connection

jdbc:cloudscape:Commerce

Database name where loaded data should go. Commerce is the name of the default database that ships with the WebLogic Portal. The location of this database is specified by the system property: cloudscape.system.home.

dblogin

None

The database username. The default Cloudscape database does not require a username to be specified. The login name must have read/write privileges on the affected tables.

dbpassword

None

The database user password. The default Cloudscape database does not require a user password to be specified.

delimiter

,

You can change the recognized delimiter character that is used to separate values in the input data file. For example, this might be necessary if you use commas as punctuation in an item's Long Description (LONG_DESC). Choose another character, such as the circumflex (^) as a delimiter.

timestamptable

WLCS_CATEGORY,
WLCS_PRODUCT

Identifies the database tables to which DBLoader will track updates (for these two tables). The column name is fixed in the schema provided by the Commerce services. However, if you are using DBLoader for other tables (not WLCS tables), you can specify other column names of your own.

timestampfield

MODIFIED_DATE

Specifies the column in the WLCS_CATEGORY and WLCS_PRODUCT tables that identifies the last time this record in the table was modified. The value of the column specified is used by DBLoader to learn when the most recent update was made in each record in the tables identified in the timestamptable property. The column name is fixed in the schema provided by the Commerce services. However, if you are using DBLoader for other tables (not WLCS tables), you can specify other column names of your own.

commitTxn

50

Sets how many records are loaded before committing the updates in the database. If the value is less than or equal to one, DBLoader will commit after loading each record.

encoding

Not specified in the dbloader.properties file; therefore, the default is the Java 2 SDK's platform default.

Sets the multibyte character encoding type. The property value supplied can be UCS2 or UTF8.

When writing data into and reading data out of the database, Java will transparently convert from the native character encoding used by your systems and Unicode 2.0. There is nothing special that you must do.

However, if you need to write/read data to/from the database that is encoded differently than your system's native encoding, you will have to explicitly perform the translation. For more information, see Important Database Considerations.


 

Listing 3-1 shows a sample dbloader.properties file.

Listing 3-1 Sample dbloader.properties File

#    THE PROPERTIES FILE FOR BEA CUSTOM DATA LOADER
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
# Examples of using JDBC drivers and establishing connections to your database
# 'Commerce' and 'CATALOG' are examples of database names
# 	Cloudscape driver connects to the sample database shipped with WebLogic  
jdbcdriver=COM.cloudscape.core.JDBCDriver
connection=jdbc:cloudscape:Commerce
dblogin=none
dbpassword=none
# 	WebLogic jDriver for Oracle (OCI Driver). Uncomment to use:
#jdbcdriver=weblogic.jdbc.oci.Driver
#connection=jdbc:weblogic:oracle:sturney
#dblogin=FLAGSTAFF
#dbpassword=FLAGSTAFF
# 	Delimiter used in your data file
delimiter=,
# Character used to specify the primary key in your data file.
# This value will default to '*' if not specified.
primarykeyidentifier=*
# 	Table and column name for autoinsertion of current DATE.
timestamptable=wlcs_category
timestampfield=modified_date
# 	Allowable encoding values:
# If encoding=UCS2, unicode encoding used is Unicode
# If encoding=UTF8, unicode encoding used is UTF8
# If no encoding is specified, it's default to the platform encoding of the JDK
# For any other value specified, it is reset to default
# Note:
# 1. If encoding is UCS2, no change is needed to database or sql API
# (accessor/mutator)using WLS OCI driver.
# 2. If encoding is UTF8, no change is needed to database or mutator sql API.
# However, you have to getBinaryStream() with WLS OCI driver or Oracle thin driver.
# Cloudscape doesn't have proper sql API to get multibyte codes out of Cloudview
encoding=UTF8
# Committing transactions to the database after parsing every N number of data 
# records. If the value is less than or equal to one, it'll be committing every # transaction.
commitTxn=50

 


Running the DBLoader Program

You use the loaddata script to run the DBLoader program.

Depending on the platform you are using, the script is in one of the following directories:

Note: PORTAL_HOME is the directory where you installed WebLogic Portal.

The loaddata script performs the following:

Before you can run the loaddata script, make sure that the set-environment script specifies the same database as the dbloader.properties file. The set-environment script resides in the same directory as the loaddata script. For example, if the dbloader.properties file uses `jdbc:cloudscape:Commerce' connections, then set-environment script should have SET DATABASE=CLOUDSCAPE.

As we mentioned earlier, DBLoader runs independently of the WebLogic Portal server. Therefore you do not need to stop the server if you are planning to run the loader. However, if you are running the WebLogic Portal server with a Cloudscape database, the database itself does not allow more than one connection at a time. In that case, you would need to stop the server.

If you are running the WebLogic Portal server with Oracle, then the drawback might be a slower performance for the time the data is being loaded into the database.

Note: You might want to back up the particular tables that you are about to update before running DBLoader. The DBLoader program does not keep history records in the database.

To Run the Program

The command to run the program has the following format:

prompt> loaddata { -insert | -update | -delete } input-file.csv

On UNIX systems, the loaddata.sh file needs to have its default protections set to include execute privilege. A typical way to do this is with the command:

$ chmod +x loaddata.sh

You must select one of the three possible operations: -insert, -update, or -delete.

For example:

prompt> loaddata -update category.csv

In the previous example, the DBLoader program will update rows in the product catalog database that match the primary keys specified in the category.csv input file.

To insert, update, or delete data in several tables, run the loaddata script separately for each table, providing the corresponding input filename as a parameter. The order of tables being updated should use the same data integrity rules as all other SQL statements. For example, insert rows into the parent table with the primary key constraint before inserting rows into the child table with the foreign key constraint.

 


DBLoader Log Files

The Commerce services DBLoader creates two audit trail logs:

If these files do not already exist, they are created. Otherwise, the existing audit trails are overwritten by each DBLoader operation. Both files reside in the same directory where you run the loaddata script.

The dbloader.log file contains the following information:

If any errors occurred during the attempted database load operation, the dbloader.err file captures the following information:

 


DBLoader Validations

The DBLoader program checks the number of columns affected by the load (as specified in the second line of the input data file) against the number of input columns in each record. Because the column delimiter is a comma (by default), this character is not allowed in a string input column. If extra commas are supplied inadvertently, such as punctuation in a LONG_DESC (Long Description) column, an error will result and is noted in the dbloader.err file. To avoid this type of error, carefully check the number of commas you are using to separate the input data column values. Or select a different delimiter character and specify it in the dbloader.properties file. For more information, see The dbloader.properties File.

All errors and exceptions are displayed in the console where the DBLoader program is running. Records with errors in them will be skipped, and the processing continues until the end of the file. (The program does not roll back a transaction if an error has occurred.)

 


Important Database Considerations

This section describes some important database considerations that you should keep in mind while using the DBLoader program.

 


Using Database-Specific Data Loaders

Most database management systems provide a data loader utility. In the case of Oracle, the data load utility is known as SQL*Loader. This section summarizes the capabilities of SQL*Loader. For details about SQL*Loader, see the Oracle 8i Utilities Guide. An online copy is available at http://technet.oracle.com/.

The examples used in this section are based on a simple ASCII file containing a few comma-separated values (SKU, IN_STOCK, VISIBLE, NAME) taken from a sample WLCS_PRODUCT table, which is described in The Product Catalog Database Schema.

Note: The example shown in this section does not use all of the columns from WLCS_PRODUCT. Your actual comma-separate values (CSV) file may contain more columns than we show here. We are merely attempting to show you how to conduct the import operation once you have the CSV file ready.

The name we will use for our sample data file is sample.csv. The contents of that file might look like the following (based on the columns we mentioned earlier - SKU, IN_STOCK, VISIBLE, NAME).

"0,3,4",0,0,"Growing Herbs from Seed, Cutting, and Root"
"0,2,1",0,0,"The Perfect Storm: A True Story of Men Against the Sea"
"0,2,2",0,0,"The Worst-Case Scenario Survival Handbook"
"0,4,0",0,0,"Acute Asthma: Assessment and Management"
"0,4,1",0,0,"Communications Technology Explained"
"0,4,2",0,0,"Modern Plastics Handbook"

Once you have your data file ready to populate the WLCS_PRODUCT table, you must create a control file which will be used by SQL*Loader. The control file identifies the data file to be read in, how the pieces of information are delimited in the file, and, of course, the actual column locations of the destination table. We will name our control file sample.ctl.

LOAD DATA
INFILE `sample.csv'
APPEND INTO TABLE WLCS_PRODUCT
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
(SKU, IN_STOCK, VISIBLE, NAME)

At a system prompt, invoke SQL*Loader with a command such as:

sqlldr  userid=bea_systems/bea_systems  control=sample.ctl  log=sample.log

To review the results of your data load, see sample.log.

 


Using Third-Party Data Loaders

There are a variety of data loaders available on the market today to assist in the extraction and loading of information. Please be sure to research the use of these tools to ensure success within your environment.

 

back to top previous page next page