© 2005 BEA Systems, Inc.

com.bea.p13n.content.document.ref.loader
Class BulkLoader

java.lang.Object
  extended bycom.bea.p13n.content.document.ref.loader.BulkLoader
All Implemented Interfaces:
FilenameFilter

public class BulkLoader
extends Object
implements FilenameFilter

The reference document repository bulk loader application.

This class is mainly designed to run as a command-line application, via a "java com.bea.p13n.content.document.loader.BulkLoader" command-line. To see a usage, give it a -h flag or read the Usage.txt in this package.

Additionally, BulkLoader objects can be created and used to provide the functionality in other places. The lifecycle of a BulkLoader is as follows:

Not calling parseArgs(), validateArgs(), and initialize(), in that order, will cause the BulkLoader object to most likely ungracefully fail. However, once those methods have been invoked, the utility methods (like go(), doLoad(), printSchema(), deleteDoc(), and insertDoc()) can invoked as needed. The commit() and rollback() method will control the BulkLoader's transaction with the database. When finished, be sure to call the shutdown() method to release internal resources.

If manually constructing and utilizing a BulkLoader object, be certain to synchronize all access to the object. Since the command-line program is single-threaded, BulkLoader objects are not thread-safe by design.

To load the default LoaderFilters, the BulkLoader looks for the com/bea/p13n/content/document/loader/loader.properties file in the CLASSPATH. From that it reads the list of default LoaderFilter class names from the "loader.defFilters" property. To not use any of the default filters, specify +filters in the command-line args.

Since:
2.0
See Also:
MetaParser, FileCache

Nested Class Summary
static class BulkLoader.ShowUsageException
          Deprecated. Quick inner exception thrown on parseArgs() to say we should just print a usage report.
 
Field Summary
 Map addlColumnMap
          Deprecated. The map of additional DOCUMENT table column name to collection of property names that map onto the column name.
 Collection addlColumnNames
          Deprecated. The list of additional DOCUMENT table column names.
 int commitAfter
          Deprecated. The number of documents loaded after which this should commit (0 or less for only at the end).
protected  Connection connection
          Deprecated. The connection this loader uses.
 String conPoolName
          Deprecated. The connection pool name.
static String DEF_MD_FILE_EXT
          Deprecated. The default file extension for metadata property files.
static String DEF_MIME_TYPE
          Deprecated. The default mime type.
static String DEF_SCHEMA_NAME
          Deprecated. The default schema name.
static String DEF_SCHEMA_PATH
          Deprecated. The default path for the schema file.
static String DEF_WLS_PROPS_PATH
          Deprecated. The default weblogic properties file path.
static String deleteDocSql
          Deprecated. The preparable sql to remove a document from the database.
protected  PreparedStatement deleteDocStmt
          Deprecated. The delete document statement.
static String deleteMDSql
          Deprecated. The preparable sql to remove a document's implicit metadata from the database.
protected  PreparedStatement deleteMDStmt
          Deprecated. The delete metadata statement.
static String DOC_MD_TABLE
          Deprecated. The document_metadata table name.
static String DOC_TABLE
          Deprecated. The document table name.
 String docBase
          Deprecated. The docBase.
 boolean doCleanUp
          Deprecated. Are we supposed to do a cleanup.
 boolean doDelete
          Deprecated. Are we deleting entries.
 boolean doMetaParse
          Deprecated. Are we supposed to parse '*.htm' and '*.html' files for META tags.
static Collection explicitAttrNames
          Deprecated. The set of explicit document metadata attribute names.
static Collection explicitColumnNames
          Deprecated. The default set of DOCUMENT table column names.
 String fileEncoding
          Deprecated. The file enconding (null for VM default).
 List fileList
          Deprecated. The list of files/directories to scan over.
 List htmlMatchList
          Deprecated. The list of patterns that represent HTML file names.
 boolean ignoreErrors
          Deprecated. Do we ignore errors.
 List ignoreList
          Deprecated. The list of file name patterns to ignore.
 boolean includeHidden
          Deprecated. Are we supposed to include hidden files and directories.
 boolean inheritProps
          Deprecated. Are we supposed to inherit metadata properties when recursing directories?
static String insertDocSql
          Deprecated. No longer used.
protected  PreparedStatement insertDocStmt
          Deprecated. The insert document statement.
static String insertMDSql
          Deprecated. The preparable sql to insert a document metadata into the database.
protected  PreparedStatement insertMDStmt
          Deprecated. The insert document metadata statement.
protected  Properties jdbcProps
          Deprecated. The JDBC connection properties.
protected  String jdbcUrl
          Deprecated. The JDBC connection url.
 List loaderFilters
          Deprecated. The list of LoaderFilters to try.
 List matchList
          Deprecated. The list of file name patterns to include.
 String mdFileExt
          Deprecated. The file extension of metadata property files.
protected  Collection metadataNames
          Deprecated. The metadata properties we find along the way.
protected  String myInsertDocSql
          Deprecated. The SQL statement used to insert into the DOCUMENT table.
protected  String myUpdateDocSql
          Deprecated. The SQL statement used to update the DOCUMENT table.
protected  long numDocsLoaded
          Deprecated. The number of documents we've loaded so far.
 boolean recurse
          Deprecated. Do we recurse over directories?
 String schemaName
          Deprecated. The name of the schema in the schema file.
 boolean schemaOnly
          Deprecated. Are we doing only the schema file generation.
 String schemaPath
          Deprecated. The path to the schema file to output.
 boolean testOnly
          Deprecated. Are we running in test mode?
 boolean truncate
          Deprecated. Do we try to truncate.
static String updateDocSql
          Deprecated. No longer used.
protected  PreparedStatement updateDocStmt
          Deprecated. The update document statement.
 boolean verbose
          Deprecated. Do we spew out messages.
 String wlsPropsPath
          Deprecated. The weblogic.properties file path.
 
Constructor Summary
BulkLoader()
          Deprecated. Constructor a BulkLoader without command-line arguments.
BulkLoader(String[] args)
          Deprecated. Construct a BulkLoader from the given command-line arguments.
 
Method Summary
 boolean accept(File dir, String name)
          Deprecated. Implement the FilenameFilter interface method to use our match and ignore lists.
static void close(Object o)
          Deprecated. Close an object which has a close() method, ignoring any exceptions.
 void commit()
          Deprecated. Commit the transaction.
 void debug(String mesg)
          Deprecated. Out put a debug message.
 int deleteDoc(String path)
          Deprecated. This will remove the document with id of path from the database, including all of its implicit metadata.
 void deleteDocMetadata(String path)
          Deprecated. This will remove the implicit metadata for the specified document.
 void doLoad()
          Deprecated. Do the actual bulk load logic on the file list.
 void doLoad(File baseDir, String path, Properties dirProps)
          Deprecated. Load the given path into the database.
 void error(String mesg)
          Deprecated. Output an error message.
 void error(String mesg, Throwable ex)
          Deprecated. Output an error message.
protected  void finalize()
          Deprecated. Called when this to be finialized.
 String fixPath(String path)
          Deprecated. Fix up a path to be forward-slash style and to not have empty path parts.
static String fixString(String in)
          Deprecated. Fix empty strings to be nulls.
 String fixString(String in, String tableName, String colName)
          Deprecated. Fix up strings to check potentionally truncate too large values.
 Properties getLoaderFilterProperties(File f, Properties p)
          Deprecated. Get the properties from the BulkLoader's LoaderFilters for the given file.
 Properties getMetadataProperties(File base, Properties p)
          Deprecated. Get the metadata properties for the given file or directory.
 String getPropertyForColumn(String colName, Properties p)
          Deprecated. Get the property value for the specified additional column name.
 void go()
          Deprecated. Let the bulkloader go.
 void initialize()
          Deprecated. Initialize the bulkloader from the current state.
 int insertDoc(String path, File f, Properties p, String mimeType)
          Deprecated. Update or insert a document and metadata into the database.
static boolean isHidden(File f)
          Deprecated. Check if the specified file is a hidden file.
 boolean isHtmlFile(String name)
          Deprecated. Tell if the specified file name is an HTML file to the loader.
static boolean isReadableDirectory(String name)
          Deprecated. Check if the specified file name is a directory that we can get into.
 void loadPropertyColumnInfo(Properties p)
          Deprecated. Load the set of jdbc.column.
static int main(BulkLoader loader, String[] args)
          Deprecated. The main method invoked on a BulkLoader instance.
static void main(String[] args)
          Deprecated. Command-line entry point.
 void parseArgs(String[] args)
          Deprecated. Parse the given input arguments.
 void printSchema()
          Deprecated. Print the schema xml for all the metadata we've loaded so far.
 void printSchema(PrintWriter out, String enc)
          Deprecated. Print the schema xml for all the metadata we've loaded so far to the given output stream.
 void rollback()
          Deprecated. Rollback the transaction.
protected  void setJDBCInfo()
          Deprecated. Set the jdbc connection url and properties from the current weblogic.properties file.
 boolean shouldIgnore(String name)
          Deprecated. Tell if the loader should ignore the specified file name.
 boolean shouldInclude(String name)
          Deprecated. Tell if the loader should include the specified file name.
 void shutdown()
          Deprecated. Shutdown this bulk loader.
static List split(String str, String on)
          Deprecated. Split a delimited list into a List.
static Properties splitToProperties(String str, String on)
          Deprecated. Split a WLS connection pool style string in a Properties object of the name=value pairs.
 void usage(PrintStream out)
          Deprecated. Print the usage of the application.
 void usage(PrintWriter out)
          Deprecated. Print the usage of the application.
 void validateArgs()
          Deprecated. Validate that we have been passed correct arguments.
 void warning(String mesg)
          Deprecated. Output a warning message.
 void warning(String mesg, Throwable ex)
          Deprecated. Output a warning message.
 
Methods inherited from class java.lang.Object
clone, equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

addlColumnMap

public Map addlColumnMap
Deprecated. 
The map of additional DOCUMENT table column name to collection of property names that map onto the column name.


addlColumnNames

public Collection addlColumnNames
Deprecated. 
The list of additional DOCUMENT table column names.


commitAfter

public int commitAfter
Deprecated. 
The number of documents loaded after which this should commit (0 or less for only at the end).


connection

protected Connection connection
Deprecated. 
The connection this loader uses.


conPoolName

public String conPoolName
Deprecated. 
The connection pool name.


DEF_MD_FILE_EXT

public static final String DEF_MD_FILE_EXT
Deprecated. 
The default file extension for metadata property files.

See Also:
Constant Field Values

DEF_MIME_TYPE

public static final String DEF_MIME_TYPE
Deprecated. 
The default mime type.

See Also:
Constant Field Values

DEF_SCHEMA_NAME

public static final String DEF_SCHEMA_NAME
Deprecated. 
The default schema name.

See Also:
Constant Field Values

DEF_SCHEMA_PATH

public static final String DEF_SCHEMA_PATH
Deprecated. 
The default path for the schema file.

See Also:
Constant Field Values

DEF_WLS_PROPS_PATH

public static final String DEF_WLS_PROPS_PATH
Deprecated. 
The default weblogic properties file path.

See Also:
Constant Field Values

deleteDocSql

public static final String deleteDocSql
Deprecated. 
The preparable sql to remove a document from the database.


deleteDocStmt

protected PreparedStatement deleteDocStmt
Deprecated. 
The delete document statement.


deleteMDSql

public static final String deleteMDSql
Deprecated. 
The preparable sql to remove a document's implicit metadata from the database.


deleteMDStmt

protected PreparedStatement deleteMDStmt
Deprecated. 
The delete metadata statement.


DOC_MD_TABLE

public static String DOC_MD_TABLE
Deprecated. 
The document_metadata table name.


DOC_TABLE

public static String DOC_TABLE
Deprecated. 
The document table name.


docBase

public String docBase
Deprecated. 
The docBase.


doCleanUp

public boolean doCleanUp
Deprecated. 
Are we supposed to do a cleanup.


doDelete

public boolean doDelete
Deprecated. 
Are we deleting entries.


doMetaParse

public boolean doMetaParse
Deprecated. 
Are we supposed to parse '*.htm' and '*.html' files for META tags.


explicitAttrNames

public static final Collection explicitAttrNames
Deprecated. 
The set of explicit document metadata attribute names.


explicitColumnNames

public static final Collection explicitColumnNames
Deprecated. 
The default set of DOCUMENT table column names.


fileEncoding

public String fileEncoding
Deprecated. 
The file enconding (null for VM default).


fileList

public List fileList
Deprecated. 
The list of files/directories to scan over.


htmlMatchList

public List htmlMatchList
Deprecated. 
The list of patterns that represent HTML file names.


ignoreErrors

public boolean ignoreErrors
Deprecated. 
Do we ignore errors.


ignoreList

public List ignoreList
Deprecated. 
The list of file name patterns to ignore.


includeHidden

public boolean includeHidden
Deprecated. 
Are we supposed to include hidden files and directories.


inheritProps

public boolean inheritProps
Deprecated. 
Are we supposed to inherit metadata properties when recursing directories?


insertDocSql

public static final String insertDocSql
Deprecated. No longer used.

The default preparable sql to insert a document into the database.


insertDocStmt

protected PreparedStatement insertDocStmt
Deprecated. 
The insert document statement.


insertMDSql

public static final String insertMDSql
Deprecated. 
The preparable sql to insert a document metadata into the database.


insertMDStmt

protected PreparedStatement insertMDStmt
Deprecated. 
The insert document metadata statement.


jdbcProps

protected Properties jdbcProps
Deprecated. 
The JDBC connection properties.


jdbcUrl

protected String jdbcUrl
Deprecated. 
The JDBC connection url.


loaderFilters

public List loaderFilters
Deprecated. 
The list of LoaderFilters to try.


matchList

public List matchList
Deprecated. 
The list of file name patterns to include.

Empty to include all.


mdFileExt

public String mdFileExt
Deprecated. 
The file extension of metadata property files.

This should start with a ".".


metadataNames

protected Collection metadataNames
Deprecated. 
The metadata properties we find along the way.


myInsertDocSql

protected String myInsertDocSql
Deprecated. 
The SQL statement used to insert into the DOCUMENT table.

This will constructed in initialize() from with any additional column information.


myUpdateDocSql

protected String myUpdateDocSql
Deprecated. 
The SQL statement used to update the DOCUMENT table.

This will constructed in initialize() from with any additional column information.


numDocsLoaded

protected long numDocsLoaded
Deprecated. 
The number of documents we've loaded so far.


recurse

public boolean recurse
Deprecated. 
Do we recurse over directories?


schemaName

public String schemaName
Deprecated. 
The name of the schema in the schema file.


schemaOnly

public boolean schemaOnly
Deprecated. 
Are we doing only the schema file generation.


schemaPath

public String schemaPath
Deprecated. 
The path to the schema file to output.


testOnly

public boolean testOnly
Deprecated. 
Are we running in test mode?


truncate

public boolean truncate
Deprecated. 
Do we try to truncate.


updateDocSql

public static final String updateDocSql
Deprecated. No longer used.

The default preparable sql to update a document in the database.


updateDocStmt

protected PreparedStatement updateDocStmt
Deprecated. 
The update document statement.


verbose

public boolean verbose
Deprecated. 
Do we spew out messages.


wlsPropsPath

public String wlsPropsPath
Deprecated. 
The weblogic.properties file path.

Constructor Detail

BulkLoader

public BulkLoader()
Deprecated. 
Constructor a BulkLoader without command-line arguments.


BulkLoader

public BulkLoader(String[] args)
           throws IllegalArgumentException
Deprecated. 
Construct a BulkLoader from the given command-line arguments.

Throws:
IllegalArgumentException - thrown on invalid args
See Also:
parseArgs(java.lang.String[])
Method Detail

accept

public boolean accept(File dir,
                      String name)
Deprecated. 
Implement the FilenameFilter interface method to use our match and ignore lists.

Specified by:
accept in interface FilenameFilter

close

public static void close(Object o)
Deprecated. 
Close an object which has a close() method, ignoring any exceptions.


commit

public void commit()
Deprecated. 
Commit the transaction.


debug

public void debug(String mesg)
Deprecated. 
Out put a debug message.

Subclasses can override this method to change where messages go.


deleteDoc

public int deleteDoc(String path)
              throws SQLException
Deprecated. 
This will remove the document with id of path from the database, including all of its implicit metadata.

Parameters:
path - the document path
Returns:
number of documents deleted
Throws:
SQLException - thrown on a database error.
See Also:
deleteDocMetadata(java.lang.String)

deleteDocMetadata

public void deleteDocMetadata(String path)
                       throws SQLException
Deprecated. 
This will remove the implicit metadata for the specified document.

Parameters:
path - the document path
Throws:
SQLException - thrown on a database error.
See Also:
deleteDocMetadata(java.lang.String)

doLoad

public void doLoad()
            throws SQLException
Deprecated. 
Do the actual bulk load logic on the file list.

Throws:
SQLException

doLoad

public void doLoad(File baseDir,
                   String path,
                   Properties dirProps)
            throws SQLException
Deprecated. 
Load the given path into the database.

If path is a directory, all files underneath it that match our patterns will be included. If path is a file, it will be loaded.

Parameters:
baseDir - the base directory (can be used to get absolute file paths).
path - the path to the file or directory (this can be multi-part, not just name).
dirProps - the base md properties for file (this should be a clone this method can modify as needed).
Throws:
SQLException - thrown on a database error.

error

public void error(String mesg)
Deprecated. 
Output an error message.


error

public void error(String mesg,
                  Throwable ex)
Deprecated. 
Output an error message.

Subclasses can override this method to change where messages go.


finalize

protected void finalize()
                 throws Throwable
Deprecated. 
Called when this to be finialized.

Throws:
Throwable

fixPath

public String fixPath(String path)
Deprecated. 
Fix up a path to be forward-slash style and to not have empty path parts.


fixString

public static String fixString(String in)
Deprecated. 
Fix empty strings to be nulls.


fixString

public String fixString(String in,
                        String tableName,
                        String colName)
Deprecated. 
Fix up strings to check potentionally truncate too large values.


getLoaderFilterProperties

public Properties getLoaderFilterProperties(File f,
                                            Properties p)
Deprecated. 
Get the properties from the BulkLoader's LoaderFilters for the given file.

Parameters:
f - the file.
p - the properties object to add to (null to create new one).
Returns:
p.

getMetadataProperties

public Properties getMetadataProperties(File base,
                                        Properties p)
                                 throws IOException
Deprecated. 
Get the metadata properties for the given file or directory.

This does not do a META data parse.

Parameters:
base - the file or directory base path.
p - the properties to load into (null to create new).
Returns:
the properties (p if p was not null).
Throws:
IOException - on an error reading the properties file.

getPropertyForColumn

public String getPropertyForColumn(String colName,
                                   Properties p)
Deprecated. 
Get the property value for the specified additional column name.

This will loop over the property names mapped to the column name and return the first found value in the properties.


go

public void go()
        throws SQLException,
               IOException
Deprecated. 
Let the bulkloader go.

Throws:
SQLException - if the load/cleanup fails.
IOException - if printing the schema fails.

initialize

public void initialize()
                throws SQLException,
                       IllegalStateException
Deprecated. 
Initialize the bulkloader from the current state.

This calls setJDBCInfo() and then creates a JDBC connection. It additinal configures any known column size limitations and constructs any SQL statements it needs to.

Throws:
SQLException - thrown on an error connecting to the database.
IllegalStateException - thrown on an initialization error.
See Also:
setJDBCInfo()

insertDoc

public int insertDoc(String path,
                     File f,
                     Properties p,
                     String mimeType)
              throws SQLException
Deprecated. 
Update or insert a document and metadata into the database.

Parameters:
path - the document path id.
f - the file of the document (can be null).
p - the implicit properties.
mimeType - the preferred mime type of the document (null for default).
Returns:
number of documents inserted
Throws:
SQLException - thrown on a database error.

isHidden

public static boolean isHidden(File f)
Deprecated. 
Check if the specified file is a hidden file.

Under UNIX, the File.isHidden() reports that "/weblogicCommerce/dmsBase/." is a hidden file, which it is not. So, this fixes that problem by getting canonicals paths for directories before calling isHidden(). That seems to do the trick.


isHtmlFile

public boolean isHtmlFile(String name)
Deprecated. 
Tell if the specified file name is an HTML file to the loader.


isReadableDirectory

public static boolean isReadableDirectory(String name)
Deprecated. 
Check if the specified file name is a directory that we can get into.


loadPropertyColumnInfo

public void loadPropertyColumnInfo(Properties p)
Deprecated. 
Load the set of jdbc.column.<columnName>=propName,... properties into our property to column information.


main

public static int main(BulkLoader loader,
                       String[] args)
Deprecated. 
The main method invoked on a BulkLoader instance.

This will take a BulkLoader through the bulk loading steps. Output will be sent via the BulkLoader's debug(), warning(), and error() methods.

This will not call System.exit().

Parameters:
args - the command-line args.
Returns:
the exit code (0 for success, non-zero for failure).
See Also:
parseArgs(java.lang.String[]), validateArgs(), initialize(), go(), commit()

main

public static void main(String[] args)
Deprecated. 
Command-line entry point.

This will call System.exit() on invalid args or error. To invoke a bulk load from your own code, create and manipulate a BulkLoader object. You can use the other main method, which does not exit.

Parameters:
args - the command-line args.
See Also:
main(BulkLoader, String[])

parseArgs

public void parseArgs(String[] args)
               throws IllegalArgumentException
Deprecated. 
Parse the given input arguments.

Parameters:
args - the input arguments.
Throws:
BulkLoader.ShowUsageException - thrown if the caller should show a usage report.
IllegalArgumentException - thrown on bad arguments.

printSchema

public void printSchema()
                 throws IOException
Deprecated. 
Print the schema xml for all the metadata we've loaded so far.

Throws:
IOException - thrown on an error outputting the schema xml.

printSchema

public void printSchema(PrintWriter out,
                        String enc)
                 throws IOException
Deprecated. 
Print the schema xml for all the metadata we've loaded so far to the given output stream.

Parameters:
out - the output stream.
enc - the file encoding (will go in xml head if not null).
Throws:
IOException - thrown on an I/O error.

rollback

public void rollback()
Deprecated. 
Rollback the transaction.


setJDBCInfo

protected void setJDBCInfo()
                    throws IllegalStateException
Deprecated. 
Set the jdbc connection url and properties from the current weblogic.properties file.

Throws:
IllegalStateException - thrown if we cannot get the information.

shouldIgnore

public boolean shouldIgnore(String name)
Deprecated. 
Tell if the loader should ignore the specified file name.


shouldInclude

public boolean shouldInclude(String name)
Deprecated. 
Tell if the loader should include the specified file name.


shutdown

public void shutdown()
Deprecated. 
Shutdown this bulk loader.


split

public static List split(String str,
                         String on)
Deprecated. 
Split a delimited list into a List.


splitToProperties

public static Properties splitToProperties(String str,
                                           String on)
Deprecated. 
Split a WLS connection pool style string in a Properties object of the name=value pairs.


usage

public void usage(PrintStream out)
Deprecated. 
Print the usage of the application.


usage

public void usage(PrintWriter out)
Deprecated. 
Print the usage of the application.


validateArgs

public void validateArgs()
                  throws IllegalStateException
Deprecated. 
Validate that we have been passed correct arguments.

This does not validate that the arguments are valid. That will be done in initialize().

Throws:
IllegalStateException

warning

public void warning(String mesg)
Deprecated. 
Output a warning message.


warning

public void warning(String mesg,
                    Throwable ex)
Deprecated. 
Output a warning message.

Subclasses can override this method to change where messages go.


© 2005 BEA Systems, Inc.

Copyright © 2005 BEA Systems, Inc. All Rights Reserved