Skip Headers
Oracle® Fusion Middleware System Administrator's Guide for Content Server
11g Release 1 (11.1.1)
E10792-01
  Go To Documentation Library
Library
Go To Table Of Contents
Contents
Go To Index
Index

Previous
Previous
 
Next
Next
 

3.9 Batchloading Content

This section covers these topics:

3.9.1 About Batch Loading

This section describes how to use the Batch Loader utility to check in (insert), delete, or update a large number of files on your Content Server system simultaneously. The Batch Loader can save you time and effort by automating the batch loading process. The following are examples of when to use the Batch Loader:

  • You just purchased the Content Server software, and you want check in all of your existing files with metadata that exists in a database.

  • You have documents checked into the Content Server repository, and you just created a new custom metadata field. You can use the Batch Loader to add the values you specify for the new metadata field to each existing content item.

  • You want to remove a large number of specific files from the system.


Note:

For the Batch Loader utility to function correctly with Oracle WebLogic Server, you must have JDBC connection settings configured. See "Configuring System Database Provider for Standalone Mode".

The Batch Loader performs actions that are specified in a batch load file, which is a text file that describes the action to perform and the metadata for each content item in the batch.

A batch load file is a text file that tells the Batch Loader which actions to perform and what metadata to assign to each content item in the batch.

This section covers these topics:

3.9.1.1 File Records

A batch load file is made up of file records, which are sets of name/value pairs that specify the action to perform, or the metadata for individual content items, or both.


Important:

Field names and parameters are case sensitive. They must appear in the batch load file exactly as they appear in the following sections. For example, dDocName is not the same as ddocname, dDocname, or DDOCNAME.

  • Each file record ends with an <<EOD>> (end of data) marker.

  • A pound sign (#) followed by a space at the beginning of a line indicates a comment. The comment character must be followed by a space. For example: # primaryFile=test.txt works properly, but #primaryFile=test.txt will cause errors.

  • The following is an example of a file record:

    # This is a comment
    Action=insert
    dDocName=Sample1
    dDocType=Document
    dDocTitle=Batch Load record insert example
    dDocAuthor=sysadmin
    dSecurityGroup=Public
    primaryFile=links.doc
    dInDate=8/15/2001
    <<EOD>>
    

3.9.1.2 Actions

Valid actions for batch loading are Insert, Delete, and Update.

  • If no action is specified for a file, the system tries to perform an update.

  • Each file record can have only one action, but file records with different actions can be present in the same batch load file.

  • The logic process for each action is different.

3.9.1.3 Insert

The insert action checks a new file into the content server repository. If the Content ID (dDocName) already exists in the content server, no action is performed.Figure 3-11 illustrates the insert action.

Figure 3-11 The Insert Action Sequence for Checking In a New File

Description of Figure 3-11 follows
Description of "Figure 3-11 The Insert Action Sequence for Checking In a New File"

3.9.1.3.1 Insert Requirements

The following table defines the fields required for successful performance of an insert action.


Note:

Batch loaded revisions will not enter a workflow even if they meet the criteria for an active workflow.

  • Field Length: Maximum number of characters permitted in the field.

  • Carried Over: If the next record does not contain this field, the value of this field will be taken from the previous record.


    Important:

    If you have defined any custom metadata fields as required fields, those fields also need to be defined for an insert action.

    Required Items Field Length Carried Over Definition
    Action=insert N/A Yes The command to insert a file.

    The term Action is case sensitive and must be initial capitalized.

    dDocName 30 No The metadata field named Content ID.
    dDocType 30 Yes The metadata field named Type.
    dDocTitle 80 No The metadata field named Title.
    dDocAuthor 30 Yes The metadata field named Author.
    dSecurityGroup 30 Yes The metadata field named Security Group.
    primaryFile N/A N/A The metadata field named Primary File. The Primary File name can be a complete path or just the file name. If a file name only is specified, the location of the file is determined as follows:
    • If the SetFileDir optional parameter has been set in this file record or any previous file record, the directory specified in SetFileDir will be used.

    • If the SetFileDir parameter has not been set, the batch load file path is used. (The path is specified in the Batch Load File field on the Batch Loader Application.)

    dInDate N/A No The metadata field named Release Date.
    • The dInDate must use the date format of the locale of the user executing the Batch Loader. For example, the US English date format is mm/dd/yy hh:mm:ss am/pm.

    • Time information is optional. If you specify the time, only the hh:mm part is required. The ss and am/pm parts are optional.

    <<EOD>> N/A N/A Indicates the end of data for the file record.

3.9.1.3.2 Insert Example

The following code fragments show the batch load file syntax for inserting files. This example shows two file records.

The first file record includes all required fields and the action statement, Action=insert. The second file record does not list the required fields dDocType, dDocAuthor, or dSecurityGroup. However, the information for these items is taken from the previous record. Also, the second record does not specify an action, so the insert action is carried over. Therefore, if the Content ID HR003 does not exist, the file will be inserted. However, if the Content ID does exist, it will not be inserted because the action is insert and not update.

  • First record:

    Action=insert
    dDocName=HR001
    dDocType=Form
    dDocTitle=New Employee Information Form
    dDocAuthor=Olson
    dSecurityGroup=Public
    primaryFile=hr001.doc
    dIndate=3/15/97
    <<EOD>>
    
  • Second record:

    dDocName=HR003
    dDocTitle=Performance Review
    primaryFile=hr003.doc
    dIndate=3/15/97
    <<EOD>>
    

3.9.1.4 Delete

The delete action deletes one or all revisions of an existing file from the content server repository. If the specified Content ID (dDocName) does not exist in the content server, no action is performed. Figure 3-12 illustrates the delete action.

Figure 3-12 The Delete Action Sequence

Description of Figure 3-12 follows
Description of "Figure 3-12 The Delete Action Sequence"

3.9.1.4.1 Delete Requirements

The following table defines the fields required for successful performance of a delete action.

Required Items Definition
Action=delete The command to delete a file.

The term Action is case sensitive and must be initial capitalized.

dDocName The metadata field named Content ID.
<<EOD>> Indicates the end of data for the file record.

3.9.1.4.2 Delete Example

The following example shows the batch load file syntax for deleting files. This example shows two file records. The first file record will delete all revisions of the Content ID HR001. The second file record will delete revision 2 of the content item HR002.

Action=delete
dDocName=HR001
<<EOD>>
Action=delete
dDocName=HR002
dRevLabel=2
<<EOD>>

3.9.1.5 Update

The update action updates existing content items. One of the following actions occurs, depending on what items are present in the file record and what content exists in the system:

  • A new revision of an existing content item is created.

  • An existing file's metadata is updated.

  • A new content item is inserted (Action=insert is performed).


    Note:

    Batch loaded revisions will not enter a workflow even if they meet the criteria for an active workflow.

A new revision is created when one of the following scenarios occur:

Scenario Content ID (dDocName) Revision (dRevLabel) Release Date in Batch Load file (dInDate)
Scenario 1 Exists in Content Server Not specified in the batch load file. After the release date of the latest revision of the file in the system.
Scenario 2 Exists in Content Server Specified in the batch load file, but does not exist in the Content Server. After the release date of the latest revision of the file in the system.

Figure 3-13 The Update Action Sequence

Description of Figure 3-13 follows
Description of "Figure 3-13 The Update Action Sequence"

3.9.1.5.1 Update Requirements

The following table defines the fields required for successful performance of an update action.

Required Items Field Length Carried Over Definition
Action=update N/A Yes The command to update a file.

The term Action is case sensitive and must be initial capitalized.

dDocName 30 No The metadata field named Content ID.
dDocType 30 Yes The metadata field named Type.
dDocTitle 80 No The metadata field named Title.
dDocAuthor 30 Yes The metadata field named Author.
dSecurityGroup 30 Yes The metadata field named Security Group.
primaryFile N/A N/A The metadata field named Primary File.

If only the metadata is being updated, the primaryFile field is not required but dRevLabel is required.

If the optional dRevLabel field is specified and matches a revision label that exists in the content server, the primaryFile field is not required; the primary file specified for that revision is used.

It is important to note that although dRevLabel is not a required field, if the primaryFile is not present, then dRevLabel becomes a required field.

The Primary File name can be a complete path or just the file name. If a file name only is specified, the location of the file is determined as follows:

  • If the SetFileDir optional parameter has been set in this file record or any previous file record, the directory specified in SetFileDir will be used.

  • If the SetFileDir parameter has not been set, the batch load file path is used. (The path is specified in the Batch Load File field on the Batch Loader Application.)

dInDate N/A No The metadata field named Release Date.
  • The dInDate must use the date format of the locale of the user executing the Batch Loader. For example, the US English date format is mm/dd/yy hh:mm:ss am/pm.

  • Time information is optional. If you specify the time, only the hh:mm part is required. The ss and am/pm parts are optional.

<<EOD>> N/A N/A Indicates the end of data for the file record.

3.9.1.5.2 Update Example 1

This example assumes that two files are already checked into the system with the following metadata:

  • HR001 has a Release Date of 9/26/98 and Revision of 1

  • HR002 has a Release Date of 3/15/99 and Revision of 2

The first file record, Content ID HR001, exists in the system, but it does not have a Revision (dRevLabel) specified in the batch load file. Therefore, the Batch Loader will compare the Release Date of the latest revision in the system with the Release Date specified in the batch load file. Since 2/20/99 is after 9/26/98, a new revision 2 for HR001 is added.

The second file record, Content ID HR002, exists in the system and has a Revision (dRevLabel) specified, but Revision 3 does not exist in the system. Therefore, a new revision 3 for HR002 is added.

Action=update
dDocName=HR001
dDocType=Form
dDocTitle=New Employee Form
dDocAuthor=Olson
dSecurityGroup=Public
primaryFile=hr001.doc
DInDate=2/20/99
<<EOD>>
dDocName=HR002
dDocTitle=Payroll Change Form
primaryFile=hr002.doc
DIndate=2/20/99
dRevLabel=3
<<EOD>>

3.9.1.5.3 Update Example 2

This example assumes that one file is already checked into the system with the following metadata:

  • Content ID = HR003

  • Release Date = 3/15/97

  • Revision = 1

  • Title = Performance Review

  • Author = Smith

Because Revision 1 of the Content ID HR003 exists in the system (and is not in an active workflow), the revision will be updated with the new Title, Author, and Release Date metadata.

Action=update
dDocName=HR003
dDocType=Form
dDocTitle=Performance Review Template
dDocAuthor=Smith
primaryFile=hr003.doc
dIndate=2/20/99
dRevLabel=1
<<EOD>>

3.9.1.6 Optional Parameters

The following table lists the optional parameters you can use in any file record in a batch load file.

In a batchload file, there are two methods you can use to override the primary and alternate formats assigned to a content item checkin:

  • Specifying a value for the primaryFile:format parameter, or specifying a value for the alternateFile:format parameter, both. However, it is possible to override these values by using the primaryOverrideFormat or alternateOverrideFormat parameters. It is also possible that certain components will force specific formats on certain types of checkins or certain application functionality may exist in some components that forces a different format.

  • Specifying a value for the primaryOverrideFormat parameter, or specifying a value for the alternateOverrideFormat parameter, or both. However, these will only work as parameters in the batch load file if you enable the IsOverrideFormat configuration variable. Note that using this method will override any values that you set for the primaryFile:format and alternateFile:format parameters.

    Optional Parameters Definition
    dRevLabel The metadata field named Revision.

    Maximum field length is 10 characters.

    Values must be an integer or comply with the Major/Minor Revision Label Sequence established by the System Properties settings (see "Configuring General Options").

    dDocAccount The metadata field named Accounts.

    Maximum field length is 30 characters.

    This field is not carried over to the next file record.

    Do not specify this field if accounts are not enabled.

    If accounts are enabled and this field is not specified, dDocAccount will be set to an empty value.

    xComments The metadata field named Comments. Maximum field length is 255 characters.
    dOutDate The metadata field named Expiration Date.

    The dOutDate must use the date format of the locale of the user executing the Batch Loader. For example, the English-US date format is mm/dd/yy hh:mm:ss am/pm.

    Time information is optional. If you specify the time, only the hh:mm part is required. The ss and am/pm parts are optional.

    primaryFile:path Specifies the location of the file. If a primaryFile:path value is specified, the value overrides the value specified for the primaryFile parameter. However, the primaryFile:path value is not used to determine the file conversion format. If a value for primaryFile:path is not specified, the location is determined from the primaryFile value.

    This parameter uses the following syntax:

    primaryFile:path=complete_path

    primaryFile:format Specifies the file format to use for the Primary File. This file format overrides the one specified by the file extension of the file and the value specified for the primaryFile parameter. If a primaryFile:format value is not specified, the file format is determined from the file extension for the primaryFile value.

    This parameter uses the following syntax:

    primaryFile:format=application/conversion_type

    alternateFile The metadata field named Alternate File. The Alternate File name can be a complete path or just the file name. If a file name only is specified, the location of the file is determined as follows:

    If the SetFileDir optional parameter has been set in this file record or any previous file record, the directory specified in SetFileDir will be used.

    If the SetFileDir parameter has not been set, the batch load file path is used. (The path is specified in the Batch Load File field on the Batch Loader Application.)

    alternateFile:path Specifies the location of the alternate file. If an alternateFile:path value is specified, the value overrides the value specified for the alternateFile parameter. However, the alternateFile:path value is not used to determine the file conversion format. If an alternateFile:path value is not specified, the location is determined from the alternateFile parameter, if a value is specified. Otherwise, by default, the primaryFile value is used for the computation.

    This parameter uses the following syntax:

    alternateFile:path=complete_path

    alternateFile:format Specifies the file format to use for the Alternate File. This file format overrides the one specified by the file extension of the file and the value specified for the alternateFile parameter. If an alternateFile:format value is not specified, the file format is determined from the file extension for the alternateFile parameter, if a value is specified. Otherwise, by default, the primaryFile value is used for the computation.

    This parameter uses the following syntax:

    alternateFile:format=application/conversion_type

    webViewableFile The webViewableFile name can be a complete path or just the file name. If a webViewableFile value is specified, then the conversion process is not performed. If a file name only is specified, the location of the file is determined as follows:

    If the SetFileDir optional parameter has been set in this file record or any previous file record, the directory specified in SetFileDir will be used.

    If the SetFileDir parameter has not been set, the batch load file path is used. (The path is specified in the Batch Load File field on the Batch Loader Application.)

    webViewableFile:path Specifies the location of the Web viewable file. If a webViewableFile.path value is specified, the value overrides the value specified for the webViewableFile parameter. However, the webViewableFile:path value is not used to determine the file conversion format. If a webViewableFile:path value is not specified, the location is determined from the webViewableFile parameter, if a value is specified. Otherwise, by default, the primaryFile value is used for the computation.

    This parameter uses the following syntax:

    webViewableFile:path=complete_path

    webViewableFile:format Specifies the file format to use for the Web viewable file. This file format overrides the one specified by the file extension of the file and the value specified for the webViewableFile parameter. If a webViewableFile:format value is not specified, the file format is determined from the file extension for the webViewableFile parameter, if a value is specified. Otherwise, by default, the primaryFile value is used for the computation.

    This parameter uses the following syntax:

    alternateFile:format=application/conversion_type

    primaryOverrideFormat Specifies which file format to use for the Primary File. This file format overrides the one specified by the file extension of the file. This option will only work as a parameter if you enable the IsOverrideFormat configuration variable. You can set this variable by selecting the Allow Override Format in the System Properties application. However, a better (and recommended) alternative would be to use the primaryFile:format parameter.
    alternateOverrideFormat Specifies which file format to use for the Alternate File. This file format overrides the one specified by the file extension of the file. This option will only work as a parameter if you enable the IsOverrideFormat configuration variable. You can set this variable by selecting the Allow Override Format in the System Properties application. However, a better (and recommended) alternative would be to use the alternate File:format parameter.
    SetFileDir Specifies the directory where the Primary Files and Alternate Files are located. This field is carried over to the next file record.

3.9.1.7 Custom Metadata Fields

Any custom metadata field that has been defined in the Configuration Manager can be included in a file record.

  • If you have defined any custom metadata fields as required fields, those fields must be defined for an insert action or an update action.

  • If a custom metadata field is not a required field, but it has a default value (even if blank), then the default value will be used if the value is not specified in the batch load file.

  • When specifying a custom metadata field value, the field name preceded with an x. For example, if you have a custom metadata field called Location, then the batch load file entry will be xLocation=value.

  • Keep in mind that some add-on products use custom metadata fields. For example, if you have PDF Watermark, you will have created a field called Watermark. To include this field in a batch load file, precede it with an x just like any other custom metadata field (that is, xWatermark).

3.9.2 Preparing a Batch Load File

This section covers these topics:

3.9.2.1 About Preparing a Batch Load File

You can use any method you prefer to create a batch load file, if the resulting text file conforms to the batch load file syntax requirements. However, the Batch Loader provides a tool called the BatchBuilder to assist you in creating batch load files.

  • The BatchBuilder creates a batch load file based on the files in a specified directory. The BatchBuilder reads recursively through all the sub-directories to create the batch load file.

  • A mapping file tells the BatchBuilder how to determine the metadata for each file record. You can use the BatchBuilder to create and save custom Mapping Files.

  • You can run the BatchBuilder from the standalone application interface or from the command line.

  • The BatchBuilder can also be used to create external collections of content, which are indexed and stored in a separate search collection rather than in the Content Server database. You can set up read-only external collections, where users can search for content but cannot update metadata or delete content. This option is recommended when external content is also included in another Content Server instance.

3.9.2.2 Mapping Files

Mapping files are text files that have a .hda extension, which identifies them as a type of data file used by the content server.

See Oracle Fusion Middleware Developer's Guide for Content Server for more information on HDA files, LocalData properties, and ResultSets.

3.9.2.2.1 Mapping File Formats

The metadata mapping can be defined in one of two formats:

  • As name/value pairs in a LocalData definition, a mapping file would look like the following:

    @Properties LocalData
    dDocName=<$filename$>.<$extension$>
    dInDate=<$filetimestamp$>
    @end
    
  • As a BatchBuilderMapping ResultSet, a mapping file would look like the following:

    @ResultSet SpiderMapping
    2
    mapField
    mapValue
    dDocName
    <$filename$>.<$extension$>
    dInDate
    <$filetimestamp$>
    @end
    

3.9.2.2.2 Mapping File Values

The following values can be used in a mapping file:

Value Description Example
Normal string All files will have the specified metadata value. dDocType=Document

All files will be the Document content type.

Idoc script Any supported Idoc script. See the Oracle Fusion Middleware Idoc Script Reference Guide for more information. xLanguage=<$if strEquals(dir2, "EN")$>English<$elseif strEquals(dir2, "SP")$>Spanish<$else$>French<$endif$>
<$dir1$>, <$dir2$> The directory name at the specified level in the file's path. <$dir1$> refers to the root directory specified in the "Directory" field, <$dir2$> refers to the next level directory, and so on.
dDocType=<$dir1$>
dSecurityGroup=<$dir2$>
dDocAccount=<$dir3$>
If the file path is "f:/docs/public/sales/march.doc" and you have specified the "Directory" value as "f:/docs", the values would be:
<$dir1$> = "docs"
<$dir2$> = "public"
<$dir3$> = "sales"
<$dUser$> The user currently logged in.
dDocAuthor=<$dUser$>
If sysadmin is logged in, then <$dUser$> would equal "sysadmin".
<$extension$> The file extension of the file.
dDocTitle=<$filename$>.<$extension$>
If the file path is "d:/salesdocs/sample.doc", then <$extension$> would equal "doc".
<$filename$> The name of the file.
dDocName=<$filename$>
If the file path is "d:/salesdocs/sample.doc", then <$filename$> would equal "sample".
<$filepath$> The entire directory path of the file, including the file name.
xPath=<$filepath$>
If the file path is "c:/docs/public/acct/sample.doc", then <$filepath$> is "c:/docs/public/acct/sample.doc".
<$filesize$> The size of the file (in bytes).
xFileSize=<$filesize$>
For a 42KB file, <$filesize$> would be 43008.
<$filetimestamp$> The date and time the file was last modified.
dInDate=<$filetimestamp$>
If the last modified date is September 13, 2001 at 4:03 pm, then <$filetimestamp$> would equal "9/13/01 4:03 PM" for an English-US locale.
<$URL$> The URL of the file, based on the values of the physical file root and relative Web root.

3.9.2.3 Creating a Batch Load File from the BatchBuilder Screen

Use the following procedure to create a batch load file from the BatchBuilder screen:

  1. Start the Batch Loader:

    Win32: Select Start, the Programs, then Content Server, then instance_name, then Utilities, then BatchLoader.

    UNIX: Change to the DomainHome/ucm/cs/bin/ directory, type BatchLoader in a shell window, and press the RETURN key.

    The login screen is displayed.

  2. Enter the sysadmin user name and password, and click OK.

    The Batch Loader Application is displayed.

  3. Select Options, Build Batch File.

    The BatchBuilder Screen is displayed.

  4. In the Directory field, enter the location of the files to be included in the batch load file.

  5. In the Batch Load File field, enter the path and file name for the batch load file. You can click the Browse button to navigate to and select the directory and file.

  6. From the Mapping list, select a mapping file. To create a new mapping file or edit an existing one, see "Creating a Mapping File".

  7. Optional: In the File Filter field, enter filter settings to include or exclude particular files from the batch load file.

  8. Optional: To batch load a read-only external collection, select the External check box and select the external collection options.

  9. Click Build.

  10. When the build process is complete, click OK.

  11. Open the batch load file in a text editor and double-check the file records.

  12. To save the current batch load file settings as the default, select Options, then Save Configuration.

3.9.2.4 Creating a Mapping File

Use the following procedure to create a mapping file.

  1. Display the BatchBuilder Screen.

  2. Click Edit next to the Mapping field.

    The BatchBuilder Mapping List Screen is displayed.

  3. Click Add.

    The Add BatchBuilder Mapping Screen is displayed.

  4. Enter a name and description for the mapping file, and click OK.

    The Edit BatchBuilder Mapping Screen is displayed.

  5. Click Add.

    The Add/Edit BatchBuilder Mapping Field Screen is displayed.

  6. Enter a metadata field name to be defined. For example, enter dDocName for the Content ID field, or xComments for the Comments field.

  7. Enter the value for the metadata field.

    • Type any constant text and Idoc script directly in the Value field. For example, to set Document as the Type for all documents in the batch load file, enter dDocType in the Field field, and enter Document in the Value field. See the Oracle Fusion Middleware Idoc Script Reference Guide for more information on Idoc Script.

    • To add a predefined variable to the Value field, select the variable in the right column and click the << button. For example, to set each document's second-level directory as the Security Group, enter dSecurityGroup in the Field field, and insert the <$dir1$> variable in the Value field.


      Caution:

      Be careful when choosing predefined variables. Many metadata fields have length limitations and cannot contain certain characters (such as spaces or punctuation marks). See "Managing Repository Content" in the Oracle Fusion Middleware Application Administrator's Guide for Content Server for more information.

  8. Click OK.

  9. Repeat steps 4 through 8 for as many metadata fields as you want to define.

  10. Click OK to save changes and close the Edit BatchBuilder Mapping screen.

    The mapping file is saved as MapFileName.hda in the IntradocDir/search/external/mapping/ directory.

  11. Click Close to close the BatchBuilder Mapping List screen.

3.9.2.5 Creating a Batch Load File from the Command Line

You can create a batch load file by entering the BatchBuilder parameters from a command line rather than entering them in the BatchBuilder screen. Use the following procedure to create a batch load file from the command line:

  1. Open the DomainHome/ucm/cs/bin/intradoc.cfg file in a text editor, and add the following line:

    BatchLoaderUserName=sysadmin
    

    This is required so that the system logs in as the system administrator, because only users who have admin rights have permission to run the Batch Loader and BatchBuilder applications.

  2. Save and close the file.

  3. Open a command line window and change to the DomainHome/ucm/cs/bin/ directory.


    Caution:

    Run the BatchBuilder using the same operating system account that runs the content server. Otherwise, the software might not process your data due to permissions problems.

  4. Enter the following command:

    Win32:

    BatchLoader.exe /spider /q /ddirectory /mmappingfile /nbatchloadfile
    

    UNIX:

    BatchLoader -spider -q -ddirectory -mmappingfile -nbatchloadfile
    

The following flags can be used with the BatchLoader command to run the BatchBuilder from the command line:

Flag Required? Description
-spider or /spider Yes Runs the BatchBuilder application.
-q or /q No Runs the BatchBuilder in quiet mode in the background. (If the BatchBuilder is run from the command line without this flag, the BatchBuilder screen will be displayed.)
-d or /d Yes Directory field value.
-m or /m Yes Mapping field value.
-n or /n Yes Batch Load File field value.
-e or /e No Exclude specified files (Exclude check box selected).
-i or /i No Include specified files (Exclude check box clear).

3.9.2.5.1 Win32 Example

The following example shows the correct syntax to run the BatchBuilder from a Win32 command line, where:

  • Directory = c:/myfiles

  • Mapping File = MyMappingFile

  • Batch Load File = c:/batching/batchinsert.txt

  • Excluded files = *.exe and *.zip

BatchLoader.exe /spider /q /dc:/myfiles /mMyMappingFile /nc:/batching/batchinsert.txt /eexe,zip

3.9.2.5.2 UNIX Example

The following example shows the correct syntax to run the BatchBuilder from a UNIX command line, where:

  • Directory = /myfiles

  • Mapping File = MyMappingFile

  • Batch Load File = /batching/batchinsert.txt

  • Excluded files = index.htm and index.html

BatchLoader -spider -q -d/myfiles -mMyMappingFile -n/batching/batchinsert.txt -eindex.htm,index.html

3.9.3 Running the Batch Loader

This section covers these topics:

3.9.3.1 About Running the Batch Loader

The Batch Loader uses the information from a batch load file to check in (insert), delete, or update a large number of files on your Content Server system simultaneously.

  • You can run the Batch Loader from the standalone application interface or from the command line.

  • After you run the Batch Loader, the content server processes files through the Inbound Refinery and the Indexer as it would for any other content item.

3.9.3.2 Batch Loading from the Batch Loader Screen

Use the following procedure to batch load content using the Batch Loader screen:

  1. Display the Batch Loader Application.

  2. Click Browse, and navigate to and select the batch load file.

  3. To change the number of errors that can occur before the Batch Loader stops processing, enter the number in the Maximum errors allowed field.

  4. To delete files from the hard drive after they are successfully checked in or updated, select the Clean up files after successful check in check box.

  5. To create a text file containing the file records that failed during batch loading, select the Enable error file for failed revision classes check box.

  6. Click Load Batch File to start the Batch Loader process.

    When the batch load process is complete, a Batch Loader message screen is displayed, indicating the number of errors that occurred, if any.

  7. If you enabled the error file, write down the file name shown in the message box.

  8. Click OK.

  9. Correct any problems with the batch load.

  10. To save the current Batch Loader settings as the default, select Options, Save Configuration.

3.9.3.3 Batch Loading from the Command Line

You can batch load content by entering the Batch Loader parameters from a command line rather than entering them in the Batch Loader screen. Use the following procedure to run the Batch Loader from the command line:

  1. Open the DomainHome/ucm/cs/bin/intradoc.cfg file in a text editor, and add the following line:

    BatchLoaderUserName=sysadmin
    

    This is required so that the system logs in as the system administrator, because only users who have admin rights have permission to run the Batch Loader application.

  2. Save and close the file.

  3. Open a command line window and change to the DomainHome/ucm/cs/bin/ directory.


    Caution:

    Run the Batch Loader using the same operating system account that runs the content server. Otherwise, the software might not process your files due to permissions problems.

  4. Enter the following command:

    Win32: BatchLoader.exe /q /nbatchloadfile
    Unix: BatchLoader -q -nbatchloadfile
    

    The Batch Loader processes the batch load file, but message boxes will not be displayed.

  5. Correct any problems with the batch load.

The following flags can be used with the BatchLoader command from the command line:

Flag Required? Description
-q or /q No Runs the Batch Loader in quiet mode in the background. (If the Batch Loader is run from the command line without this flag, the Batch Loader screen will be displayed.)
-n or /n Yes Batch Load File field value.
-console No Echoes all output to the HTML Content Server log and to the console window that is running the Batch Loader. See "Batch Loader -console Command Line Switch" for details.

3.9.3.3.1 Win32 Example

The following example shows the correct syntax to run the Batch Loader from a Win32 command line, where the batch load file is c:/batching/batchinsert.txt:

BatchLoader.exe /q /nc:/batching/batchinsert.txt

3.9.3.3.2 UNIX Example

The following example shows the correct syntax to run the Batch Loader from a UNIX command line, where the batch load file is /batching/batchinsert.txt:

BatchLoader -q -n/batching/batchinsert.txt

3.9.3.4 Using the IdcCommand Utility and Remote Access

Occasionally, you may need to use remote access when managing your Content Server system. This does not necessarily mean that remote terminal access is required. However, you must have the ability to submit commands to the server from a remote location.

Combining remote access with the IdcCommand utility provides a powerful toolset and an easy way to checkin a large number of files to your Content Server. To take advantage of this functionality, you will need to properly set up the workstation to submit commands and be able to use the IdcCommand utility with a batch load command file. This section covers the following topics:

3.9.3.4.1 Batch Load Command Files

A batch load command file contains a set of commands for each file that is loaded. If you are loading a large number of files, the command file may contain hundreds of lines. Using an editing tool can simplify the task of creating the numerous required lines. For example, the procedure for Preparing for Remote Batch Loading shows how you can prepare a batch load command file using the editing and mail merge features of Microsoft Office.

The following is an example Batch Load Command File:

@Properties LocalData
IdcService=CHECKIN_UNIVERSAL
doFileCopy=1
dDocTitle=thisfile
dDocType=Native
dSecurityGroup=Internal
dDocAuthor=sysadmin
primaryFile=thisfile.xls
xComments=Initial Check In
@end
<<EOD>>@Properties LocalData
IdcService=CHECKIN_UNIVERSAL
doFileCopy=1
dDocTitle=99.tif
dDocType=Native
dSecurityGroup=Internal
dDocAuthor=sysadmin
primaryFile=v:\99.tif
xComments=Initial Check In
@end
<<EOD>>

3.9.3.4.2 Preparing for Remote Batch Loading

To perform batch loading from remote locations, complete the following procedure:

Log In to the Local PC

  1. Open Windows Explorer.

  2. Create a working directory (for example, c:\working_dir).

  3. In the working directory, create one or more directories for the various content servers you will be accessing (for example, c:\working_dir\development and c:\working_dir\contribution).

  4. In each of these directories, create a cmdfiles subdirectory.

  5. From the remote Content Server instance, copy the following directories (and their files) to your working directory:

    • working_dir\idcm1\bin

    • working_dir\idcm1\config

    • working_dir\idcm1\shared\config\resources\lang

    • working_dir\idcm1\shared\config\resources\lang\en

    • working_dir\idcm1\weblayout\groups\secure\logs

    • In a text editor, open the DomainHome/ucm/cs/bin/intradoc.cfg file and update the IntradocDir configuration variable to match your directory structure (for example, IntradocDir=C:/working_dir/xxS/development/).

    • In a text editor, open the IntradocDir/config/config.cfg file and ensure the following settings are correct for the server you are accessing:

      IntradocServerPort=4444
      IntradocServerHostName=xxsicmsd
      
    • On the remote server, add the IP address of the local PC to the Security Filter, using the Systems Properties utility and restart the server.

Test the Configuration for the Remote Workstation

  1. In the cmdfiles directory, create a file named pingservertest.hda and add the following lines:

    @Properties LocalData
    IdcService=PING_SERVER
    @end
    

  1. Open a command prompt and change to your working bin directory (for example, cd C:\working_dir\development\bin

  2. Issue the following command:

    IdcCommand -f ..\cmdfiles\pingservertest.hda -u sysadmin -l ..\pingservertest.log -c server
    

  1. Confirm the output. If you are successful, you will get the following message from the server.

    3/24/04: Success executing service PING_SERVER.
    You have completed your setup for remote commands.
    

Create a Batch Load Command File

This procedure uses the editing and mailmerge features of Microsoft Office to create a batch load command file.

  1. Create a file listing of your directory contents:

    1. Open a command prompt and change to the root directory representing the files you intend to load.

    2. Create a file listing, using the following command to redirect the output into a file:

    3. dir /s /b > filelisting.txt

    4. Check your filelisting.txt file; it will look something like this:

      V:\policies\ADMIN\working_dir_Admin\AbbreviationList.doc
      V:\policies\ADMIN\working_dir_Admin\Abbreviations.doc
      V:\policies\ADMIN\working_dir_Admin\AbsencePres.doc
      V:\policies\ADMIN\working_dir_Admin\AdmPatientCare.doc
      V:\policies\ADMIN\working_dir_Admin\AdmRounds.doc
      V:\policies\ADMIN\working_dir_Admin\AdverseEvents.doc
      V:\policies\ADMIN\working_dir_Admin\ArchivesPermanent.doc
      V:\policies\ADMIN\working_dir_Admin\ArchivesRetrieval.doc
      V:\policies\ADMIN\working_dir_Admin\ArchivesStandardReq.doc
      

      Note:

      When working with batch loads, it is important to note that the file must exist on the server indicated by the primaryFile statement in the batch load command file. Optimally, you should use the same letter to map the directory of files to the server and to your local system. Alternatively, you can copy the directory of files to the server temporarily.

  2. Edit the file listing to create your filename and title data:

    1. Open your filelisting.txt file in Excel.

    2. Using Replace, remove all the directory information leaving only the file name. Also look for and remove the line for filelisting.txt.

    3. Copy column A (containing the file names) to column B. In this example the file name is also used for the title and Column B will become the title.

    4. Using Replace, remove the file extension from the names in column B.

    5. Insert a new first line and enter filename in the first column and title in the second.

    6. Save the file.

  3. Create an hda file from the file listing using Mail Merge features:

    1. Open Word and create a new document with your set of batch load commands. The following example shows basic batch load commands. You must match your configuration settings when you create your batch load commands.

      @Properties LocalData
      IdcService=CHECKIN_UNIVERSAL
      doFileCopy=1
      dDocTitle=
      dDocType=Native
      dSecurityGroup=Internal
      dDocAccount=Policy/Admin
      dDocAuthor=sysadmin
      primaryFile=d:/temp/working_dir_Admin/
      xComments=Initial Check In
      @end
      <<EOD>>
      
  1. Select Tools / Letters and Mailing / Mail Merge Wizard and advance through the wizard. Choose the selections below to use your filelisting.txt file as input to the mail merge.

    • Letter Document (step 1)

    • Current document (step 2)

    • Existing List (step 3) and select your Excel spreadsheet as the data source

    • More Items (step 4), place the title and filename fields into the word document so that it looks like the following:

      @Properties LocalData
      IdcService=CHECKIN_UNIVERSAL
      doFileCopy=1
      dDocTitle="title"
      dDocType=Native
      dSecurityGroup=Internal
      dDocAccount=Policy/Admin
      dDocAuthor=sysadmin
      primaryFile=d:/temp/working_dir_Admin/"filename"
      xHistory=Initial Check In
      @end
      <<EOD>>
      
  2. Complete the mail merge (Steps 5 and 6) and you will have a new Word document with one merge record per page.

  3. Edit the letters, selecting all, and use the Replace feature to remove all of the section breaks.

  4. Save the file as a plain text file to the /cmdfiles directory with the file extension of hda (for example, filelisting.hda)

Execute the Upload

  1. Open a command prompt.

  2. Navigate to the working bin directory.

  3. Issue the command:

    IdcCommand -f ../cmdfiles/filelisting.hda -u sysadmin -l ../filelisting.log -c server
    

Your files will be checked into the content server and a message is displayed in the command window as each file is checked in.

3.9.3.5 Batch Loading Content as Metadata Only

Depending on the action you plan to perform using the Batch Loader, certain fields are required in the batch load file. If you are updating only the metadata in existing content items, the primaryFile field is not required in the batch load file; see "Update Requirements".

However, if you want to load (insert action) content into the Content Server as metadata only, then the primaryFile field is required in the batch load file. Although the field is ignored by the import, the Batch Loader expects it to be defined. If the primaryFile field is missing, you will get an error as follows (or similar):

Please check record number <number>. BatchLoader: unable to check in '<record>' because the required field 'primaryFile' is missing.

To batch load content as metadata only:

  1. Open Content Server's config.cfg file:

    IntradocDir/config/config.cfg

  2. Add the following configuration variables:

    createPrimaryMetaFile=true
    AllowPrimaryMetaFile=true
    
  3. Save and close the config.cfg file.

  4. In the batch load file, add the following field for each record:

    primaryFile=
    

    Note that leaving the field blank is acceptable. The field is ignored but must be included.

  5. Continue to batch load your content using the Batch Loader procedure or the command line procedure. See "Batch Loading from the Batch Loader Screen" or "Batch Loading from the Command Line".

3.9.3.6 Batch Loader -console Command Line Switch

Adding the -console switch to the Batch Loader command line causes all output to be echoed to the HTML content server log and to the console window that is running the Batch Loader. Alternately, you can use operating system redirects to send the output to a separate log file.


Important:

The -console switch does not follow standard Windows command line syntax (although this may be corrected in later versions). You must use the -console syntax usually associated with UNIX instead of the /console syntax. With most other command line utilities, both syntaxes will work on both platforms.

3.9.3.6.1 Examples

Win32 command line:

BatchLoader.exe /q -console /nc:/batching/batchinsert.txt

UNIX command line:

BatchLoader -q -console -n/u2/apps/batching/batchinsert.txt

Sample output:

Processed 1 of 4 record.
Processed 2 of 4 records.
Processed 3 of 4 records.
Processed 4 of 4 records.
Done processing batch file 'c:/batching/batchinsert.txt'. Out of 4 records processed, 4 succeeded and 0 errors occurred.

3.9.3.7 Adding a Redirect

You can use a redirect symbol on the command line to send the Batch Loader output to a separate log file. The symbol works on both UNIX and Windows. By default, the -console switch sends the Batch Loader's output to stderr. To redirect the output to a different file, use the special redirect symbol 2>.

In the following examples, each command must be entered all on one line.

Win32 command line with redirect:

BatchLoader.exe /q -console /nc:/batching/batchinsert.txt 2> batchlog.txt

UNIX command line with redirect:

BatchLoader -q -console -n/u2/apps/batching/batchinsert.txt 2>
/logs/CSbatchload.log

3.9.3.8 Correcting Batch Load Errors

Use the following procedure to correct any errors that occur during batch loading.

  1. Open the content server log. Select Administration, then Log Files, then click Content Server Logs.

  2. Look through the Type column for the word Error.

  3. Read the description to determine the problem.

  4. Fix the error in one of these files:

    • Batch load file

    • The error file for the failed content. (This option is available only if you enabled it on the Batch Loader Application.) The error file is located in the same directory as the batch load file, with several digits appended to the batch load file name.


      Tip:

      If you rerun an entire batch load file, content items that have already been checked in will usually fail. This occurs because the release dates of the existing content items will be the same as the ones you are trying to insert.

Figure 3-14 Content Server log file

Description of Figure 3-14 follows
Description of "Figure 3-14 Content Server log file"

3.9.4 Optimizing Batch Loader Performance

This section provides some basic guidelines that you can use to improve Batch Loader performance. These suggestions can minimize potentially slow batch load performance when you are checking in a large number of content items. In many cases, proper tuning for batch loading can significantly speed up a slow server.

To minimize batch loading slow downs, try implementing the following Batch Loader adjustments:

  • Temporarily disable other activities such as shutting down Inbound Refinery (see the Oracle Fusion Middleware Administrator's Guide for Conversion) and suspending the automatic update cycle feature of the Repository Manager. See "Repository Manager: Indexer Tab".

  • Analyze your database usage during a batch load to help the database query optimizer. Databases have built-in optimizer utilities that can help make database queries more efficient. However, to maximize the efficiency of optimizers, it is necessary to update or re-create the statistics about the physical characteristics of a table and the associated indexes. These characteristics include number of records, number of pages, and the average record length. The optimizers use these statistics to access data.

    Each database has a proprietary command that you can use to invoke the statistic update or recreation process. For example:

    • For Oracle, use the ANALYZE TABLE COMPUTE STATISTICS command

    • For SQL Server, use the CREATE STATISTICS statement

    • For DB2, use the RUNSTATS command

3.9.4.1 Example: Best Practice Case Study

This case study describes a very slow load batch performance and the steps taken to diagnose and correct the situation. This information can serve as a model for isolating underlying issues and resolving batch loading performance problems.

3.9.4.1.1 Background Information

A user wanted to load 27,000 content items into Content Server that was running on an AIX server. The DB2 database was running on a separate AIX server. The content items included TIFs as the native files and corresponding PDFs as the Web-viewable files. Inbound Refinery generated thumbnails from the native files.

Initially during the batch load, the performance was acceptable with sub-second insert times. However, after a few thousand content items were loaded, the performance began to degrade. Content items started to require a few seconds to load and, eventually, the load time was over 10 seconds per content item.

3.9.4.1.2 Preliminary Troubleshooting

While the batch load was running, nothing seemed to be wrong with the Content Server system. It had sufficient memory, the CPU utilization was low (less than 5%), and there were no disk bottlenecks. The Inbound Refinery server was busy, but was processing thumbnails at an acceptable rate.

Two issues were found with the database server:

  • Two processes were taking turns to update the database. While one process was executing, the second process waited for other process to release database locks. When the first process completed, the second process executed while the first process waited. The processes in this execute/wait cycle included:

    • The actual batch load process that was updating the database tables after inserting a content item.

    • The Content Server was updating the database tables; changing the status from GENWWW to DONE after receiving notification that a thumbnail had been completed.

    The two processes should not have been contending with each other because they were not updating the same content items. It seemed that the two processes were locking each other out because DB2 had performed lock escalation and was now locking entire database pages instead of single rows.

  • There were a large number of tablespace scans being performed by both processes.

3.9.4.1.3 Solution

A two-step solution was used:

  1. Inbound Refinery was shut down to prevent the status update process from competing with the batch loading process. The performance did improve because there was a 2000+ backlog of content items from the completed thumbnails.

  2. A RUNSTATS command was issued on all the Content Server database tables to update the table statistics. This dramatically improved the performance of the batch load. The insert time returned to sub-second and the batch load completed within a short amount of time. It took 21 hours to insert the first 22,000 content items. After updating the table statistics, the remaining 5,000 content items were inserted in 13 minutes.