© 2002 BEA Systems, Inc.


com.bea.p13n.content.document.ref.loader
Class BulkLoader

java.lang.Object
  |
  +--com.bea.p13n.content.document.ref.loader.BulkLoader

public class BulkLoader
extends java.lang.Object
implements java.io.FilenameFilter

The reference document repository bulk loader application.

This class is mainly designed to run as a command-line application, via a "java com.bea.p13n.content.document.loader.BulkLoader" command-line. To see a usage, give it a -h flag or read the Usage.txt in this package.

Additionally, BulkLoader objects can be created and used to provide the functionality in other places. The lifecycle of a BulkLoader is as follows:

Not calling parseArgs(), validateArgs(), and initialize(), in that order, will cause the BulkLoader object to most likely ungracefully fail. However, once those methods have been invoked, the utility methods (like go(), doLoad(), printSchema(), deleteDoc(), and insertDoc()) can invoked as needed. The commit() and rollback() method will control the BulkLoader's transaction with the database. When finished, be sure to call the shutdown() method to release internal resources.

If manually constructing and utilizing a BulkLoader object, be certain to synchronize all access to the object. Since the command-line program is single-threaded, BulkLoader objects are not thread-safe by design.

To load the default LoaderFilters, the BulkLoader looks for the com/bea/p13n/content/document/loader/loader.properties file in the CLASSPATH. From that it reads the list of default LoaderFilter class names from the "loader.defFilters" property. To not use any of the default filters, specify +filters in the command-line args.

Since:
2.0
See Also:
MetaParser, FileCache

Inner Class Summary
static interface BulkLoader.ShowUsageException
          Quick inner exception thrown on parseArgs() to say we should just print a usage report.
 
Field Summary
 java.util.Map addlColumnMap
          The map of additional DOCUMENT table column name to collection of property names that map onto the column name.
 java.util.Collection addlColumnNames
          The list of additional DOCUMENT table column names.
 int commitAfter
          The number of documents loaded after which this should commit (0 or less for only at the end).
protected  java.sql.Connection connection
          The connection this loader uses.
 java.lang.String conPoolName
          The connection pool name.
static java.lang.String DEF_MD_FILE_EXT
          The default file extension for metadata property files.
static java.lang.String DEF_MIME_TYPE
          The default mime type.
static java.lang.String DEF_SCHEMA_NAME
          The default schema name.
static java.lang.String DEF_SCHEMA_PATH
          The default path for the schema file.
static java.lang.String DEF_WLS_PROPS_PATH
          The default weblogic properties file path.
static java.lang.String deleteDocSql
          The preparable sql to remove a document from the database.
protected  java.sql.PreparedStatement deleteDocStmt
          The delete document statement.
static java.lang.String deleteMDSql
          The preparable sql to remove a document's implicit metadata from the database.
protected  java.sql.PreparedStatement deleteMDStmt
          The delete metadata statement.
static java.lang.String DOC_MD_TABLE
          The document_metadata table name.
static java.lang.String DOC_TABLE
          The document table name.
 java.lang.String docBase
          The docBase.
 boolean doCleanUp
          Are we supposed to do a cleanup.
 boolean doDelete
          Are we deleting entries.
 boolean doMetaParse
          Are we supposed to parse '*.htm' and '*.html' files for META tags.
static java.util.Collection explicitAttrNames
          The set of explicit document metadata attribute names.
static java.util.Collection explicitColumnNames
          The default set of DOCUMENT table column names.
 java.lang.String fileEncoding
          The file enconding (null for VM default).
 java.util.List fileList
          The list of files/directories to scan over.
 java.util.List htmlMatchList
          The list of patterns that represent HTML file names.
 boolean ignoreErrors
          Do we ignore errors.
 java.util.List ignoreList
          The list of file name patterns to ignore.
 boolean includeHidden
          Are we supposed to include hidden files and directories.
 boolean inheritProps
          Are we supposed to inherit metadata properties when recursing directories?
static java.lang.String insertDocSql
          Deprecated. No longer used.
protected  java.sql.PreparedStatement insertDocStmt
          The insert document statement.
static java.lang.String insertMDSql
          The preparable sql to insert a document metadata into the database.
protected  java.sql.PreparedStatement insertMDStmt
          The insert document metadata statement.
protected  java.util.Properties jdbcProps
          The JDBC connection properties.
protected  java.lang.String jdbcUrl
          The JDBC connection url.
 java.util.List loaderFilters
          The list of LoaderFilters to try.
 java.util.List matchList
          The list of file name patterns to include.
 java.lang.String mdFileExt
          The file extension of metadata property files.
protected  java.util.Collection metadataNames
          The metadata properties we find along the way.
protected  java.lang.String myInsertDocSql
          The SQL statement used to insert into the DOCUMENT table.
protected  java.lang.String myUpdateDocSql
          The SQL statement used to update the DOCUMENT table.
protected  long numDocsLoaded
          The number of documents we've loaded so far.
 boolean recurse
          Do we recurse over directories?
 java.lang.String schemaName
          The name of the schema in the schema file.
 boolean schemaOnly
          Are we doing only the schema file generation.
 java.lang.String schemaPath
          The path to the schema file to output.
 boolean testOnly
          Are we running in test mode?
 boolean truncate
          Do we try to truncate.
static java.lang.String updateDocSql
          Deprecated. No longer used.
protected  java.sql.PreparedStatement updateDocStmt
          The update document statement.
 boolean verbose
          Do we spew out messages.
 java.lang.String wlsPropsPath
          The weblogic.properties file path.
 
Constructor Summary
BulkLoader()
          Constructor a BulkLoader without command-line arguments.
BulkLoader(java.lang.String[] args)
          Construct a BulkLoader from the given command-line arguments.
 
Method Summary
 boolean accept(java.io.File dir, java.lang.String name)
          Implement the FilenameFilter interface method to use our match and ignore lists.
static void close(java.lang.Object o)
          Close an object which has a close() method, ignoring any exceptions.
 void commit()
          Commit the transaction.
 void debug(java.lang.String mesg)
          Out put a debug message.
 int deleteDoc(java.lang.String path)
          This will remove the document with id of path from the database, including all of its implicit metadata.
 void deleteDocMetadata(java.lang.String path)
          This will remove the implicit metadata for the specified document.
 void doLoad()
          Do the actual bulk load logic on the file list.
 void doLoad(java.io.File baseDir, java.lang.String path, java.util.Properties dirProps)
          Load the given path into the database.
 void error(java.lang.String mesg)
          Output an error message.
 void error(java.lang.String mesg, java.lang.Throwable ex)
          Output an error message.
protected  void finalize()
          Called when this to be finialized.
 java.lang.String fixPath(java.lang.String path)
          Fix up a path to be forward-slash style and to not have empty path parts.
static java.lang.String fixString(java.lang.String in)
          Fix empty strings to be nulls.
 java.lang.String fixString(java.lang.String in, java.lang.String tableName, java.lang.String colName)
          Fix up strings to check potentionally truncate too large values.
 java.util.Properties getLoaderFilterProperties(java.io.File f, java.util.Properties p)
          Get the properties from the BulkLoader's LoaderFilters for the given file.
 java.util.Properties getMetadataProperties(java.io.File base, java.util.Properties p)
          Get the metadata properties for the given file or directory.
 java.lang.String getPropertyForColumn(java.lang.String colName, java.util.Properties p)
          Get the property value for the specified additional column name.
 void go()
          Let the bulkloader go.
 void initialize()
          Initialize the bulkloader from the current state.
 int insertDoc(java.lang.String path, java.io.File f, java.util.Properties p, java.lang.String mimeType)
          Update or insert a document and metadata into the database.
static boolean isHidden(java.io.File f)
          Check if the specified file is a hidden file.
 boolean isHtmlFile(java.lang.String name)
          Tell if the specified file name is an HTML file to the loader.
static boolean isReadableDirectory(java.lang.String name)
          Check if the specified file name is a directory that we can get into.
 void loadPropertyColumnInfo(java.util.Properties p)
          Load the set of jdbc.column.<columnName>=propName,...
static int main(BulkLoader loader, java.lang.String[] args)
          The main method invoked on a BulkLoader instance.
static void main(java.lang.String[] args)
          Command-line entry point.
 void parseArgs(java.lang.String[] args)
          Parse the given input arguments.
 void printSchema()
          Print the schema xml for all the metadata we've loaded so far.
 void printSchema(java.io.PrintWriter out, java.lang.String enc)
          Print the schema xml for all the metadata we've loaded so far to the given output stream.
 void rollback()
          Rollback the transaction.
protected  void setJDBCInfo()
          Set the jdbc connection url and properties from the current weblogic.properties file.
 boolean shouldIgnore(java.lang.String name)
          Tell if the loader should ignore the specified file name.
 boolean shouldInclude(java.lang.String name)
          Tell if the loader should include the specified file name.
 void shutdown()
          Shutdown this bulk loader.
static java.util.List split(java.lang.String str, java.lang.String on)
          Split a delimited list into a List.
static java.util.Properties splitToProperties(java.lang.String str, java.lang.String on)
          Split a WLS connection pool style string in a Properties object of the name=value pairs.
 void usage(java.io.PrintStream out)
          Print the usage of the application.
 void usage(java.io.PrintWriter out)
          Print the usage of the application.
 void validateArgs()
          Validate that we have been passed correct arguments.
 void warning(java.lang.String mesg)
          Output a warning message.
 void warning(java.lang.String mesg, java.lang.Throwable ex)
          Output a warning message.
 
Methods inherited from class java.lang.Object
clone, equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

explicitAttrNames

public static final java.util.Collection explicitAttrNames
The set of explicit document metadata attribute names.

explicitColumnNames

public static final java.util.Collection explicitColumnNames
The default set of DOCUMENT table column names.

DOC_TABLE

public static java.lang.String DOC_TABLE
The document table name.

DOC_MD_TABLE

public static java.lang.String DOC_MD_TABLE
The document_metadata table name.

DEF_MD_FILE_EXT

public static final java.lang.String DEF_MD_FILE_EXT
The default file extension for metadata property files.

DEF_SCHEMA_PATH

public static final java.lang.String DEF_SCHEMA_PATH
The default path for the schema file.

DEF_SCHEMA_NAME

public static final java.lang.String DEF_SCHEMA_NAME
The default schema name.

DEF_WLS_PROPS_PATH

public static final java.lang.String DEF_WLS_PROPS_PATH
The default weblogic properties file path.

DEF_MIME_TYPE

public static final java.lang.String DEF_MIME_TYPE
The default mime type.

deleteMDSql

public static final java.lang.String deleteMDSql
The preparable sql to remove a document's implicit metadata from the database.

deleteDocSql

public static final java.lang.String deleteDocSql
The preparable sql to remove a document from the database.

insertDocSql

public static final java.lang.String insertDocSql
Deprecated. No longer used.
The default preparable sql to insert a document into the database.

updateDocSql

public static final java.lang.String updateDocSql
Deprecated. No longer used.
The default preparable sql to update a document in the database.

insertMDSql

public static final java.lang.String insertMDSql
The preparable sql to insert a document metadata into the database.

verbose

public boolean verbose
Do we spew out messages.

testOnly

public boolean testOnly
Are we running in test mode?

schemaOnly

public boolean schemaOnly
Are we doing only the schema file generation.

recurse

public boolean recurse
Do we recurse over directories?

doDelete

public boolean doDelete
Are we deleting entries.

doMetaParse

public boolean doMetaParse
Are we supposed to parse '*.htm' and '*.html' files for META tags.

doCleanUp

public boolean doCleanUp
Are we supposed to do a cleanup.

includeHidden

public boolean includeHidden
Are we supposed to include hidden files and directories.

inheritProps

public boolean inheritProps
Are we supposed to inherit metadata properties when recursing directories?

truncate

public boolean truncate
Do we try to truncate.

ignoreErrors

public boolean ignoreErrors
Do we ignore errors.

docBase

public java.lang.String docBase
The docBase.

mdFileExt

public java.lang.String mdFileExt
The file extension of metadata property files.

This should start with a ".".


schemaPath

public java.lang.String schemaPath
The path to the schema file to output.

schemaName

public java.lang.String schemaName
The name of the schema in the schema file.

fileEncoding

public java.lang.String fileEncoding
The file enconding (null for VM default).

commitAfter

public int commitAfter
The number of documents loaded after which this should commit (0 or less for only at the end).

addlColumnNames

public java.util.Collection addlColumnNames
The list of additional DOCUMENT table column names.

addlColumnMap

public java.util.Map addlColumnMap
The map of additional DOCUMENT table column name to collection of property names that map onto the column name.

matchList

public java.util.List matchList
The list of file name patterns to include.

Empty to include all.


ignoreList

public java.util.List ignoreList
The list of file name patterns to ignore.

htmlMatchList

public java.util.List htmlMatchList
The list of patterns that represent HTML file names.

fileList

public java.util.List fileList
The list of files/directories to scan over.

wlsPropsPath

public java.lang.String wlsPropsPath
The weblogic.properties file path.

conPoolName

public java.lang.String conPoolName
The connection pool name.

loaderFilters

public java.util.List loaderFilters
The list of LoaderFilters to try.

myInsertDocSql

protected java.lang.String myInsertDocSql
The SQL statement used to insert into the DOCUMENT table.

This will constructed in initialize() from with any additional column information.


myUpdateDocSql

protected java.lang.String myUpdateDocSql
The SQL statement used to update the DOCUMENT table.

This will constructed in initialize() from with any additional column information.


jdbcUrl

protected java.lang.String jdbcUrl
The JDBC connection url.

jdbcProps

protected java.util.Properties jdbcProps
The JDBC connection properties.

connection

protected java.sql.Connection connection
The connection this loader uses.

deleteMDStmt

protected java.sql.PreparedStatement deleteMDStmt
The delete metadata statement.

deleteDocStmt

protected java.sql.PreparedStatement deleteDocStmt
The delete document statement.

updateDocStmt

protected java.sql.PreparedStatement updateDocStmt
The update document statement.

insertDocStmt

protected java.sql.PreparedStatement insertDocStmt
The insert document statement.

insertMDStmt

protected java.sql.PreparedStatement insertMDStmt
The insert document metadata statement.

metadataNames

protected java.util.Collection metadataNames
The metadata properties we find along the way.

numDocsLoaded

protected long numDocsLoaded
The number of documents we've loaded so far.
Constructor Detail

BulkLoader

public BulkLoader()
Constructor a BulkLoader without command-line arguments.

BulkLoader

public BulkLoader(java.lang.String[] args)
           throws java.lang.IllegalArgumentException
Construct a BulkLoader from the given command-line arguments.

Throws:
java.lang.IllegalArgumentException - thrown on invalid args
See Also:
parseArgs(java.lang.String[])
Method Detail

finalize

protected void finalize()
                 throws java.lang.Throwable
Called when this to be finialized.

Overrides:
finalize in class java.lang.Object

accept

public boolean accept(java.io.File dir,
                      java.lang.String name)
Implement the FilenameFilter interface method to use our match and ignore lists.
Specified by:
accept in interface java.io.FilenameFilter


parseArgs

public void parseArgs(java.lang.String[] args)
               throws java.lang.IllegalArgumentException
Parse the given input arguments.

Parameters:
args - the input arguments.
Throws:
BulkLoader.ShowUsageException - thrown if the caller should show a usage report.
java.lang.IllegalArgumentException - thrown on bad arguments.

usage

public void usage(java.io.PrintStream out)
Print the usage of the application.


usage

public void usage(java.io.PrintWriter out)
Print the usage of the application.


validateArgs

public void validateArgs()
                  throws java.lang.IllegalStateException
Validate that we have been passed correct arguments.

This does not validate that the arguments are valid. That will be done in initialize().


initialize

public void initialize()
                throws java.sql.SQLException,
                       java.lang.IllegalStateException
Initialize the bulkloader from the current state.

This calls setJDBCInfo() and then creates a JDBC connection. It additinal configures any known column size limitations and constructs any SQL statements it needs to.

Throws:
java.sql.SQLException - thrown on an error connecting to the database.
java.lang.IllegalStateException - thrown on an initialization error.
See Also:
setJDBCInfo()

go

public void go()
        throws java.sql.SQLException,
               java.io.IOException
Let the bulkloader go.

Throws:
java.sql.SQLException - if the load/cleanup fails.
java.io.IOException - if printing the schema fails.

shutdown

public void shutdown()
Shutdown this bulk loader.


doLoad

public void doLoad()
            throws java.sql.SQLException
Do the actual bulk load logic on the file list.


doLoad

public void doLoad(java.io.File baseDir,
                   java.lang.String path,
                   java.util.Properties dirProps)
            throws java.sql.SQLException
Load the given path into the database.

If path is a directory, all files underneath it that match our patterns will be included. If path is a file, it will be loaded.

Parameters:
baseDir - the base directory (can be used to get absolute file paths).
path - the path to the file or directory (this can be multi-part, not just name).
dirProps - the base md properties for file (this should be a clone this method can modify as needed).
Throws:
java.sql.SQLException - thrown on a database error.

printSchema

public void printSchema()
                 throws java.io.IOException
Print the schema xml for all the metadata we've loaded so far.

Throws:
java.io.IOException - thrown on an error outputting the schema xml.

printSchema

public void printSchema(java.io.PrintWriter out,
                        java.lang.String enc)
                 throws java.io.IOException
Print the schema xml for all the metadata we've loaded so far to the given output stream.

Parameters:
out - the output stream.
enc - the file encoding (will go in xml head if not null).
Throws:
java.io.IOException - thrown on an I/O error.

getMetadataProperties

public java.util.Properties getMetadataProperties(java.io.File base,
                                                  java.util.Properties p)
                                           throws java.io.IOException
Get the metadata properties for the given file or directory.

This does not do a META data parse.

Parameters:
base - the file or directory base path.
p - the properties to load into (null to create new).
Returns:
the properties (p if p was not null).
Throws:
java.io.IOException - on an error reading the properties file.

getLoaderFilterProperties

public java.util.Properties getLoaderFilterProperties(java.io.File f,
                                                      java.util.Properties p)
Get the properties from the BulkLoader's LoaderFilters for the given file.

Parameters:
f - the file.
p - the properties object to add to (null to create new one).
Returns:
p.

deleteDoc

public int deleteDoc(java.lang.String path)
              throws java.sql.SQLException
This will remove the document with id of path from the database, including all of its implicit metadata.

Parameters:
path - the document path
Returns:
number of documents deleted
Throws:
java.sql.SQLException - thrown on a database error.
See Also:
deleteDocMetadata(java.lang.String)

deleteDocMetadata

public void deleteDocMetadata(java.lang.String path)
                       throws java.sql.SQLException
This will remove the implicit metadata for the specified document.

Parameters:
path - the document path
Throws:
java.sql.SQLException - thrown on a database error.
See Also:
deleteDocMetadata(java.lang.String)

insertDoc

public int insertDoc(java.lang.String path,
                     java.io.File f,
                     java.util.Properties p,
                     java.lang.String mimeType)
              throws java.sql.SQLException
Update or insert a document and metadata into the database.

Parameters:
path - the document path id.
f - the file of the document (can be null).
p - the implicit properties.
mimeType - the preferred mime type of the document (null for default).
Returns:
number of documents inserted
Throws:
java.sql.SQLException - thrown on a database error.

setJDBCInfo

protected void setJDBCInfo()
                    throws java.lang.IllegalStateException
Set the jdbc connection url and properties from the current weblogic.properties file.

Throws:
java.lang.IllegalStateException - thrown if we cannot get the information.

commit

public void commit()
Commit the transaction.


rollback

public void rollback()
Rollback the transaction.


shouldInclude

public boolean shouldInclude(java.lang.String name)
Tell if the loader should include the specified file name.


shouldIgnore

public boolean shouldIgnore(java.lang.String name)
Tell if the loader should ignore the specified file name.


isHtmlFile

public boolean isHtmlFile(java.lang.String name)
Tell if the specified file name is an HTML file to the loader.


fixPath

public java.lang.String fixPath(java.lang.String path)
Fix up a path to be forward-slash style and to not have empty path parts.


getPropertyForColumn

public java.lang.String getPropertyForColumn(java.lang.String colName,
                                             java.util.Properties p)
Get the property value for the specified additional column name.

This will loop over the property names mapped to the column name and return the first found value in the properties.


loadPropertyColumnInfo

public void loadPropertyColumnInfo(java.util.Properties p)
Load the set of jdbc.column.<columnName>=propName,... properties into our property to column information.


debug

public void debug(java.lang.String mesg)
Out put a debug message.

Subclasses can override this method to change where messages go.


warning

public void warning(java.lang.String mesg,
                    java.lang.Throwable ex)
Output a warning message.

Subclasses can override this method to change where messages go.


warning

public void warning(java.lang.String mesg)
Output a warning message.


error

public void error(java.lang.String mesg,
                  java.lang.Throwable ex)
Output an error message.

Subclasses can override this method to change where messages go.


error

public void error(java.lang.String mesg)
Output an error message.


isReadableDirectory

public static boolean isReadableDirectory(java.lang.String name)
Check if the specified file name is a directory that we can get into.


isHidden

public static boolean isHidden(java.io.File f)
Check if the specified file is a hidden file.

Under UNIX, the File.isHidden() reports that "/weblogicCommerce/dmsBase/." is a hidden file, which it is not. So, this fixes that problem by getting canonicals paths for directories before calling isHidden(). That seems to do the trick.


splitToProperties

public static java.util.Properties splitToProperties(java.lang.String str,
                                                     java.lang.String on)
Split a WLS connection pool style string in a Properties object of the name=value pairs.


split

public static java.util.List split(java.lang.String str,
                                   java.lang.String on)
Split a delimited list into a List.


fixString

public java.lang.String fixString(java.lang.String in,
                                  java.lang.String tableName,
                                  java.lang.String colName)
Fix up strings to check potentionally truncate too large values.


fixString

public static java.lang.String fixString(java.lang.String in)
Fix empty strings to be nulls.


close

public static void close(java.lang.Object o)
Close an object which has a close() method, ignoring any exceptions.


main

public static int main(BulkLoader loader,
                       java.lang.String[] args)
The main method invoked on a BulkLoader instance.

This will take a BulkLoader through the bulk loading steps. Output will be sent via the BulkLoader's debug(), warning(), and error() methods.

This will not call System.exit().

Parameters:
args - the command-line args.
Returns:
the exit code (0 for success, non-zero for failure).
See Also:
parseArgs(java.lang.String[]), validateArgs(), initialize(), go(), commit()

main

public static void main(java.lang.String[] args)
Command-line entry point.

This will call System.exit() on invalid args or error. To invoke a bulk load from your own code, create and manipulate a BulkLoader object. You can use the other main method, which does not exit.

Parameters:
args - the command-line args.
See Also:
main(BulkLoader, String[])

© 2002 BEA Systems, Inc.

Copyright © 2002 BEA Systems, Inc. All Rights Reserved