4 Batch Loading Content

This section provides information on how to use the Batch Loader utility to check in (insert), delete, and update a large number of files simultaneously on an Oracle WebCenter Content Server instance.

This chapter covers these topics:

4.1 About Batch Loading

Batch loading a number of files can be automated to save time and effort by using the Batch Loader utility. The following are examples of when to use Batch Loader:

  • You just purchased the WebCenter Content software, and you want check in all of your existing files with metadata that exists in a database.

  • You have documents checked in to the Content Server repository, and you just created a new custom metadata field. You can use Batch Loader to add the values you specify for the new metadata field to each existing content item.

  • You want to remove a large number of specific files from the system.

Batch Loader performs actions that are specified in a batch load file, which is a text file that tells Batch Loader which actions to perform and what metadata to assign to each content item in the batch.

Note:

For the Batch Loader utility to function correctly with an Oracle WebLogic Server instance, you must have JDBC connection settings configured. For instructions, see Running Administration Applications in Standalone Mode.

This section covers these topics:

4.1.1 About Batch Load File Records

A batch load file is made up of file records, which are sets of name/value pairs that specify the action to perform, or the metadata for individual content items, or both.

Note:

Field names and parameters are case sensitive. They must appear in the batch load file exactly as they appear in the following sections. For example, dDocName is not the same as ddocname, dDocname, or DDOCNAME.

  • Each file record ends with an <<EOD>> (end of data) marker.

  • A pound sign (#) followed by a space at the beginning of a line indicates a comment. The comment character must be followed by a space. For example: # primaryFile=test.txt works properly, but #primaryFile=test.txt will cause errors.

  • The following is an example of a file record:

    # This is a comment
    Action=insert
    dDocName=Sample1
    dDocType=Document
    dDocTitle=Batch Load record insert example
    dDocAuthor=sysadmin
    dSecurityGroup=Public
    primaryFile=links.doc
    dInDate=8/15/2001
    <<EOD>>

4.1.2 About Batch Load Actions

Valid actions for batch loading are Insert, Delete, and Update.

  • If no action is specified for a file, the system tries to perform an update.

  • Each file record can have only one action, but file records with different actions can be present in the same batch load file.

  • The logic process for each action is different.

4.1.3 About Batch Load Insert Action

The Insert action checks a new file in to the Content Server repository. Figure 4-1 illustrates the insert action.

  • If the Content ID (dDocName) does not exist in the Content Server database, then a new file is created.

  • If the Content ID (dDocName) exists in the Content Server database, and no revision (dRevLabel) is specified, then a new revision is created.

  • If the Content ID (dDocName) and the specified revision (dRevLabel) exist in the Content Server database, then no action is performed.

Figure 4-1 The Insert Action Sequence for Checking In a New File

Description of Figure 4-1 follows
Description of "Figure 4-1 The Insert Action Sequence for Checking In a New File"
4.1.3.1 Insert Requirements

The following table defines the fields required for successful performance of an insert action.

Note:

Batch loaded revisions will not enter a workflow even if they meet the criteria for an active workflow.

  • Field Length: Maximum number of characters permitted in the field.

  • Carried Over: If the next record does not contain this field, the value of this field will be taken from the previous record.

    Important:

    If you have defined any custom metadata fields as required fields, those fields also need to be defined for an insert action.

    Required Items Field Length Carried Over Definition

    Action=insert

    N/A

    Yes

    The command to insert a file.

    The term Action is case sensitive and must be initial capitalized.

    dDocName

    30

    No

    The metadata field named Content ID.

    dDocType

    30

    Yes

    The metadata field named Type.

    dDocTitle

    80

    No

    The metadata field named Title.

    dDocAuthor

    30

    Yes

    The metadata field named Author.

    dSecurityGroup

    30

    Yes

    The metadata field named Security Group.

    primaryFile

    N/A

    N/A

    The metadata field named Primary File. The Primary File name can be a complete path or just the file name. If a file name only is specified, the location of the file is determined as follows:

    • If the SetFileDir optional parameter has been set in this file record or any previous file record, the directory specified in SetFileDir will be used.

    • If the SetFileDir parameter has not been set, the batch load file path is used. (The path is specified in the Batch Load File field on the Batch Loader window.)

    By default, the length of the Primary File name cannot exceed 80 characters (of which the extension can only be 8 characters maximum).

    dInDate

    N/A

    No

    The metadata field named Release Date.

    • The dInDate must use the date format of the locale of the user executing the Batch Loader. For example, the US English date format is mm/dd/yy hh:mm:ss am/pm.

    • Time information is optional. If you specify the time, only the hh:mm part is required. The ss and am/pm parts are optional.

    <<EOD>>

    N/A

    N/A

    Indicates the end of data for the file record.

4.1.3.2 Insert Example

The following code fragments show the batch load file syntax for inserting files. This example shows two file records.

The first file record includes all required fields and the action statement, Action=insert. The second file record does not list the required fields: dDocType, dDocAuthor, or dSecurityGroup. However, the information for these items is taken from the previous record. Also, the second record does not specify an action, so the insert action is carried over. Therefore, if the Content ID HR003 does not exist, the file will be inserted. However, if the Content ID does exist, it will not be inserted because the action is insert and not update.

  • First record:

    Action=insert
    dDocName=HR001
    dDocType=Form
    dDocTitle=New Employee Information Form
    dDocAuthor=Olson
    dSecurityGroup=Public
    primaryFile=hr001.doc
    dIndate=3/15/97
    <<EOD>>
    
  • Second record:

    dDocName=HR003
    dDocTitle=Performance Review
    primaryFile=hr003.doc
    dIndate=3/15/97
    <<EOD>>

4.1.4 About Batch Load Delete Action

The delete action deletes one or all revisions of an existing file from the Content Server repository. If the specified Content ID (dDocName) does not exist in the Content Server database, no action is performed. Figure 4-2 illustrates the delete action.

Figure 4-2 The Delete Action Sequence

Description of Figure 4-2 follows
Description of "Figure 4-2 The Delete Action Sequence"
4.1.4.1 Delete Requirements

The following table defines the fields required for successful performance of a delete action.

Required Items Definition

Action=delete

The command to delete a file.

The term Action is case sensitive and must be initial capitalized.

dDocName

The metadata field named Content ID.

<<EOD>>

Indicates the end of data for the file record.

4.1.4.2 Delete Example

The following example shows the batch load file syntax for deleting files. This example shows two file records. The first file record will delete all revisions of the Content ID HR001. The second file record will delete revision 2 of the content item HR002.

Action=delete
dDocName=HR001
<<EOD>>
Action=delete
dDocName=HR002
dRevLabel=2
<<EOD>>

4.1.5 About Batch Load Update Action

The update action updates existing content items. One of the following actions occurs, depending on what items are present in the file record and what content exists in the system:

  • A new revision of an existing content item is created.

  • An existing file's metadata is updated.

  • A new content item is inserted (Action=insert is performed).

    Note:

    Batch loaded revisions will not enter a workflow even if they meet the criteria for an active workflow.

A new revision is created when one of the following scenarios occur:

Scenario Content ID (dDocName) Revision (dRevLabel) Release Date in Batch Load file (dInDate)

Scenario 1

Exists in Content Server instance

Not specified in the batch load file.

After the release date of the latest revision of the file in the system.

Scenario 2

Exists in Content Server instance

Specified in the batch load file, but does not exist in Content Server instance.

After the release date of the latest revision of the file in the system.

Figure 4-3 The Update Action Sequence

Description of Figure 4-3 follows
Description of "Figure 4-3 The Update Action Sequence"
4.1.5.1 Update Requirements

The following table defines the fields required for successful performance of an update action.

Required Items Field Length Carried Over Definition

Action=update

N/A

Yes

The command to update a file.

The term Action is case sensitive and must be initial capitalized.

dDocName

30

No

The metadata field named Content ID.

dDocType

30

Yes

The metadata field named Type.

dDocTitle

80

No

The metadata field named Title.

dDocAuthor

30

Yes

The metadata field named Author.

dSecurityGroup

30

Yes

The metadata field named Security Group.

primaryFile

N/A

N/A

The metadata field named Primary File.

If only the metadata is being updated, the primaryFile field is not required but dRevLabel is required.

If the optional dRevLabel field is specified and matches a revision label that exists in the Content Server instance, the primaryFile field is not required; the primary file specified for that revision is used.

It is important to note that although dRevLabel is not a required field, if the primaryFile is not present, then dRevLabel becomes a required field.

The Primary File name can be a complete path or just the file name. If a file name only is specified, the location of the file is determined as follows:

  • If the SetFileDir optional parameter has been set in this file record or any previous file record, the directory specified in SetFileDir will be used.

  • If the SetFileDir parameter has not been set, the batch load file path is used. (The path is specified in the Batch Load File field on the Batch Loader window.)

dInDate

N/A

No

The metadata field named Release Date.

  • The dInDate must use the date format of the locale of the user executing the Batch Loader. For example, the US English date format is mm/dd/yy hh:mm:ss am/pm.

  • Time information is optional. If you specify the time, only the hh:mm part is required. The ss and am/pm parts are optional.

<<EOD>>

N/A

N/A

Indicates the end of data for the file record.

4.1.5.2 Update Example 1

This example assumes that two files are already checked in to the system with the following metadata:

  • HR001 has a Release Date of 9/26/98 and Revision of 1

  • HR002 has a Release Date of 3/15/99 and Revision of 2

The first file record, Content ID HR001, exists in the system, but it does not have a Revision (dRevLabel) specified in the batch load file. Therefore, the Batch Loader will compare the Release Date of the latest revision in the system with the Release Date specified in the batch load file. Since 2/20/99 is after 9/26/98, a new revision 2 for HR001 is added.

The second file record, Content ID HR002, exists in the system and has a Revision (dRevLabel) specified, but Revision 3 does not exist in the system. Therefore, a new revision 3 for HR002 is added.

Action=update
dDocName=HR001
dDocType=Form
dDocTitle=New Employee Form
dDocAuthor=Olson
dSecurityGroup=Public
primaryFile=hr001.doc
DInDate=2/20/99
<<EOD>>
dDocName=HR002
dDocTitle=Payroll Change Form
primaryFile=hr002.doc
DIndate=2/20/99
dRevLabel=3
<<EOD>>
4.1.5.3 Update Example 2

This example assumes that one file is already checked in to the system with the following metadata:

  • Content ID = HR003

  • Release Date = 3/15/97

  • Revision = 1

  • Title = Performance Review

  • Author = Smith

Because Revision 1 of the Content ID HR003 exists in the system (and is not in an active workflow), the revision will be updated with the new Title, Author, and Release Date metadata.

Action=update
dDocName=HR003
dDocType=Form
dDocTitle=Performance Review Template
dDocAuthor=Smith
primaryFile=hr003.doc
dIndate=2/20/99
dRevLabel=1
<<EOD>>

4.1.6 About Optional Batch Load File Parameters

The following table lists the optional parameters you can use in any file record in a batch load file.

In a batch load file, there are two methods you can use to override the primary and alternate formats assigned to a content item check-in:

  • Specifying a value for the primaryFile:format parameter, or specifying a value for the alternateFile:format parameter, both. However, it is possible to override these values by using the primaryOverrideFormat or alternateOverrideFormat parameters. It is also possible that certain components will force specific formats on certain types of check-ins or certain application functionality may exist in some components that forces a different format.

  • Specifying a value for the primaryOverrideFormat parameter, or specifying a value for the alternateOverrideFormat parameter, or both. However, these will only work as parameters in the batch load file if you enable the IsOverrideFormat configuration variable. Note that using this method will override any values that you set for the primaryFile:format and alternateFile:format parameters.

    Optional Parameters Definition

    dRevLabel

    The metadata field named Revision.

    Maximum field length is 10 characters.

    Values must be an integer or comply with the Major/Minor Revision Label Sequence established by the System Properties settings.

    dDocAccount

    The metadata field named Accounts.

    Maximum field length is 30 characters.

    This field is not carried over to the next file record.

    Do not specify this field if accounts are not enabled.

    If accounts are enabled and this field is not specified, dDocAccount will be set to an empty value.

    xComments

    The metadata field named Comments. Maximum field length is 255 characters.

    dOutDate

    The metadata field named Expiration Date.

    The dOutDate must use the date format of the locale of the user executing the Batch Loader. For example, the English-US date format is mm/dd/yy hh:mm:ss am/pm.

    Time information is optional. If you specify the time, only the hh:mm part is required. The ss and am/pm parts are optional.

    primaryFile:path

    Specifies the location of the file. If a primaryFile:path value is specified, the value overrides the value specified for the primaryFile parameter. However, the primaryFile:path value is not used to determine the file conversion format. If a value for primaryFile:path is not specified, the location is determined from the primaryFile value.

    This parameter uses the following syntax:

    primaryFile:path=complete_path/primary_file_name

    primaryFile:format

    Specifies the file format to use for the Primary File. This file format overrides the one specified by the file extension of the file and the value specified for the primaryFile parameter. If a primaryFile:format value is not specified, the file format is determined from the file extension for the primaryFile value.

    This parameter uses the following syntax:

    primaryFile:format=application/conversion_type

    alternateFile

    The metadata field named Alternate File. The Alternate File name can be a complete path or just the file name. If a file name only is specified, the location of the file is determined as follows:

    If the SetFileDir optional parameter has been set in this file record or any previous file record, the directory specified in SetFileDir will be used.

    If the SetFileDir parameter has not been set, the batch load file path is used. (The path is specified in the Batch Load File field on the Batch Loader window.)

    alternateFile:path

    Specifies the location of the alternate file. If an alternateFile:path value is specified, the value overrides the value specified for the alternateFile parameter. However, the alternateFile:path value is not used to determine the file conversion format. If an alternateFile:path value is not specified, the location is determined from the alternateFile parameter, if a value is specified. Otherwise, by default, the primaryFile value is used for the computation.

    This parameter uses the following syntax:

    alternateFile:path=complete_path

    alternateFile:format

    Specifies the file format to use for the Alternate File. This file format overrides the one specified by the file extension of the file and the value specified for the alternateFile parameter. If an alternateFile:format value is not specified, the file format is determined from the file extension for the alternateFile parameter, if a value is specified. Otherwise, by default, the primaryFile value is used for the computation.

    This parameter uses the following syntax:

    alternateFile:format=application/conversion_type

    webViewableFile

    The webViewableFile name can be a complete path or just the file name. If a webViewableFile value is specified, then the conversion process is not performed. If a file name only is specified, the location of the file is determined as follows:

    If the SetFileDir optional parameter has been set in this file record or any previous file record, the directory specified in SetFileDir will be used.

    If the SetFileDir parameter has not been set, the batch load file path is used. (The path is specified in the Batch Load File field on the Batch Loader window.)

    webViewableFile:path

    Specifies the location of the web viewable file. If a webViewableFile.path value is specified, the value overrides the value specified for the webViewableFile parameter. However, the webViewableFile:path value is not used to determine the file conversion format. If a value for webViewableFile:path is not specified, the location is determined from the webViewableFile value.

    This parameter uses the following syntax:

    webViewableFile:path=complete_path

    webViewableFile:format

    Specifies the file format to use for the web viewable file. This file format overrides the one specified by the file extension of the file and the value specified for the webViewableFile parameter. The webViewableFile:format value should be explicitly specified, it is not determined from the webViewableFile value.

    This parameter uses the following syntax:

    alternateFile:format=application/conversion_type

    primaryOverrideFormat

    Specifies which file format to use for the Primary File. This file format overrides the one specified by the file extension of the file. This option will only work as a parameter if you enable the IsOverrideFormat configuration variable. You can set this variable by selecting Allow Override Format in the System Properties utility. However, a better (and recommended) alternative would be to use the primaryFile:format parameter.

    alternateOverrideFormat

    Specifies which file format to use for the Alternate File. This file format overrides the one specified by the file extension of the file. This option will only work as a parameter if you enable the IsOverrideFormat configuration variable. You can set this variable by selecting Allow Override Format in the System Properties utility. However, a better (and recommended) alternative would be to use the alternate File:format parameter.

    SetFileDir

    Specifies the directory where the Primary Files and Alternate Files are located. This field is carried over to the next file record.

4.1.7 About Custom Metadata Fields

Any custom metadata field that has been defined in the Configuration Manager can be included in a file record.

  • If you have defined any custom metadata fields as required fields, those fields must be defined for an insert action or an update action.

  • If a custom metadata field is not a required field, but it has a default value (even if blank), then the default value will be used if the value is not specified in the batch load file.

  • When specifying a custom metadata field value, the field name preceded with an x. For example, if you have a custom metadata field called Location, then the batch load file entry will be xLocation=value.

  • Keep in mind that some add-on products use custom metadata fields. For example, if you have PDF Watermark, you will have created a field called Watermark. To include this field in a batch load file, precede it with an x just like any other custom metadata field (for example, xWatermark).

4.2 Preparing a Batch Load File

This section covers these topics:

4.2.1 About Preparing a Batch Load File

You can use any method you prefer to create a batch load file, if the resulting text file conforms to the batch load file syntax requirements. However, the Batch Loader provides a tool called the BatchBuilder to assist you in creating batch load files.

  • The BatchBuilder creates a batch load file based on the files in a specified directory. The BatchBuilder reads recursively through all the sub-directories to create the batch load file.

  • A mapping file tells the BatchBuilder how to determine the metadata for each file record. You can use the BatchBuilder to create and save custom Mapping Files.

  • You can run the BatchBuilder from the standalone utility interface or from the command line.

  • The BatchBuilder can also be used to create external collections of content, which are indexed and stored in a separate search collection rather than in the Content Server database. You can set up read-only external collections, where users can search for content but cannot update metadata or delete content. This option is recommended when external content is also included in another Content Server instance.

If you plan to use the Batch Loader utility to update and insert a large number of files on your Content Server instance simultaneously, you must create a batch load file. Two of the optional parameters that you can include in your batch load file are primaryOverrideFormat and alternateOverrideFormat. However, these options only work as parameters in the batch load file if you enable the IsOverrideFormat configuration variable. You can set this variable using the System Properties utility.

4.2.2 Mapping Files

Mapping files are text files that have an .hda extension, which identifies them as a type of data file used by the Content Server instance.

For more information on HDA files, LocalData properties, and ResultSets, see Elements in HDA Files in Developing with Oracle WebCenter Content.

4.2.2.1 Mapping File Formats

The metadata mapping can be defined in one of two formats:

  • As name/value pairs in a LocalData definition, a mapping file would look like the following:

    @Properties LocalData
    dDocName=<$filename$>.<$extension$>
    dInDate=<$filetimestamp$>
    @end
    
  • As a BatchBuilderMapping ResultSet, a mapping file would look like the following:

    @ResultSet SpiderMapping
    2
    mapField
    mapValue
    dDocName
    <$filename$>.<$extension$>
    dInDate
    <$filetimestamp$>
    @end
4.2.2.2 Mapping File Values

The following values can be used in a mapping file:

Value Description Example

Normal string

All files will have the specified metadata value.

dDocType=Document

All files will be the Document content type.

Idoc script

Any supported Idoc script. See Introduction to the Idoc Script Custom Scripting Language in Developing with Oracle WebCenter Content

xLanguage=<$if strEquals(dir2, "EN")$>English<$elseif strEquals(dir2, "SP")$>Spanish<$else$>French<$endif$>

<$dir1$>, <$dir2$>

The directory name at the specified level in the file's path. <$dir1$> refers to the root directory specified in the "Directory" field, <$dir2$> refers to the next level directory, and so on.

dDocType=<$dir1$>
dSecurityGroup=<$dir2$>
dDocAccount=<$dir3$>

If the file path is f:/docs/public/sales/march.doc and you have specified the Directory value as f:/docs, the values would be:

<$dir1$> = "docs"
<$dir2$> = "public"
<$dir3$> = "sales"

<$dUser$>

The user currently logged in.

dDocAuthor=<$dUser$>

If administrator is logged in, then <$dUser$> would equal administrator.

<$extension$>

The file extension of the file.

dDocTitle=<$filename$>.<$extension$>

If the file path is d:/salesdocs/sample.doc, then <$extension$> would equal doc.

<$filename$>

The name of the file.

dDocName=<$filename$>

If the file path is d:/salesdocs/sample.doc, then <$filename$> would equal sample.

<$filepath$>

The entire directory path of the file, including the file name.

xPath=<$filepath$>

If the file path is c:/docs/public/acct/sample.doc, then <$filepath$> is c:/docs/public/acct/sample.doc.

<$filesize$>

The size of the file (in bytes).

xFileSize=<$filesize$>

For a 42KB file, <$filesize$> would be 43008.

<$filetimestamp$>

The date and time the file was last modified.

dInDate=<$filetimestamp$>

If the last modified date is September 13, 2001 at 4:03 pm, then <$filetimestamp$> would equal 9/13/01 4:03 PM for an English-US locale.

<$URL$>

The URL of the file, based on the values of the physical file root and relative web root.

4.2.3 Creating a Batch Load File from the BatchBuilder Window

To create a batch load file from the BatchBuilder window:

  1. Start the Batch Loader utility:
    • Windows: Choose Start, then Programs, then Content Server, then instance_name, then Utilities, then BatchLoader.

    • UNIX: Go to the DomainHome/ucm/cs/bin/ directory, type ./BatchLoader in a shell window, and press the RETURN key on your keyboard.

  2. In the login window, enter the Content Server administrator user name and password, then click OK.
  3. In the Batch Loader window, choose Options, then Build Batch File.
  4. In the Directory field on the BatchBuilder window, enter the location of the files to be included in the batch load file.
  5. In the Batch Load File field, enter the path and file name for the batch load file. You can click the Browse button to navigate to and select the directory and file.
  6. From the Mapping list, select a mapping file. To create a new mapping file or edit an existing one, see Creating a Mapping File.
  7. Optional: In the File Filter field, enter filter settings to include or exclude particular files from the batch load file.
  8. Optional: To batch load a read-only external collection, choose External, and select the external collection options.
  9. Click Build.
  10. When the build process is complete, click OK.
  11. Open the batch load file in a text editor and double-check the file records.
  12. To save the current batch load file settings as the default, choose Options, then Save Configuration.

4.2.4 Creating a Mapping File

To create a mapping file.

  1. Open the BatchBuilder window.
  2. Click Edit next to the Mapping field.
  3. In the BatchBuilder Mapping List window, click Add.
  4. In the Add BatchBuilder Mapping window, enter a name and description for the mapping file, and click OK.
  5. In the Edit BatchBuilder Mapping window, click Add.
  6. In the Add/Edit BatchBuilder Mapping Field window, enter a metadata field name to be defined. For example, enter dDocName for the Content ID field, or xComments for the Comments field.
  7. Enter the value for the metadata field.
    • Type any constant text and Idoc script directly in the Value field. For example, to set Document as the Type for all documents in the batch load file, enter dDocType in the Field field, and enter Document in the Value field. See Introduction to the Idoc Script Custom Scripting Language in Developing with Oracle WebCenter Content.

    • To add a predefined variable to the Value field, select the variable in the right column and click the << button. For example, to set each document's second-level directory as the Security Group, enter dSecurityGroup in the Field field, and insert the <$dir1$> variable in the Value field.

      Note:

      Be careful when choosing predefined variables. Many metadata fields have length limitations and cannot contain certain characters (such as spaces or punctuation marks). See Managing Content in Managing Oracle WebCenter Content.

  8. Click OK.
  9. Repeat steps 4 through 8 for as many metadata fields as you want to define.
  10. Click OK to save changes and close the Edit BatchBuilder Mapping window.

    The mapping file is saved as MapFileName.hda in the IntradocDir/search/external/mapping/ directory.

  11. Click Close to close the BatchBuilder Mapping List window.

4.2.5 Creating a Batch Load File from the Command Line

You can create a batch load file by entering the BatchBuilder parameters from a command line rather than entering them in the BatchBuilder window. To create a batch load file from the command line:

  1. Open the DomainHome/ucm/cs/bin/intradoc.cfg file in a text editor, and add the following line, where sysadmin is the user name of the Content Server system administrator:
    BatchLoaderUserName=sysadmin
    

    This is required so that the system logs in as the system administrator, because only users who have admin rights have permission to run the Batch Loader and BatchBuilder utilities.

  2. Save and close the file.
  3. Open a command line window and change to the DomainHome/ucm/cs/bin/ directory.

    Caution:

    Run the BatchBuilder using the same operating system account that runs the Content Server instance. Otherwise, the software might not process your data due to permissions problems.

  4. Enter the following command:
    • Windows:

      BatchLoader.exe -spider -q -ddirectory -mmappingfile -nbatchloadfile
      
    • UNIX:

      BatchLoader -spider -q -ddirectory -mmappingfile -nbatchloadfile
      

The following flags can be used with the BatchLoader command to run the BatchBuilder from the command line:

Flag Required? Description

-spider or /spider

Yes

Runs the BatchBuilder utility.

-q or /q

No

Runs the BatchBuilder in quiet mode in the background. (If the BatchBuilder is run from the command line without this flag, the BatchBuilder window will appear.)

-d or /d

Yes

Directory field value.

-m or /m

Yes

Mapping field value.

-n or /n

Yes

Batch Load File field value.

-e or /e

No

Exclude specified files (Exclude check box selected).

-i or /i

No

Include specified files (Exclude check box deselected).

4.2.5.1 Windows Example

The following example shows the correct syntax to run the BatchBuilder from a Windows command line, where:

  • Directory = c:/myfiles

  • Mapping File = MyMappingFile

  • Batch Load File = c:/batching/batchinsert.txt

  • Excluded files = *.exe and *.zip

BatchLoader.exe -spider -q -dc:/myfiles -mMyMappingFile -nc:/batching/batchinsert.txt -eexe,zip
4.2.5.2 UNIX Example

The following example shows the correct syntax to run the BatchBuilder from a UNIX command line, where:

  • Directory = /myfiles

  • Mapping File = MyMappingFile

  • Batch Load File = /batching/batchinsert.txt

  • Excluded files = index.htm and index.html

BatchLoader -spider -q -d/myfiles -mMyMappingFile -n/batching/batchinsert.txt -eindex.htm,index.html

4.3 Running the Batch Loader

This section covers these topics:

4.3.1 About Running the Batch Loader

The Batch Loader uses the information from a batch load file to check in (insert), delete, or update a large number of files on your Content Server instance simultaneously.

  • You can run the Batch Loader from the standalone utility interface or from the command line.

  • After you run the Batch Loader, the Content Server instance processes files through the Inbound Refinery instance and the Indexer as it would for any other content item.

4.3.2 Batch Loading from the Batch Loader Window

To batch load content using the Batch Loader window:

  1. Open the Batch Loader window.
  2. Click Browse, navigate to and select the batch load file.
  3. To change the number of errors that can occur before the Batch Loader stops processing, enter the number in the Maximum errors allowed field.
  4. To delete files from the hard drive after they are successfully checked in or updated, select Clean up files after successful check in.
  5. To create a text file containing the file records that failed during batch loading, select Enable error file for failed revision classes.
  6. Click Load Batch File to start the Batch Loader process.

    When the batch load process is complete, a Batch Loader message window opens, indicating the number of errors that occurred, if any.

  7. If you enabled the error file, write down the file name shown in the message box.
  8. Click OK.
  9. Correct any problems with the batch load.
  10. To save the current Batch Loader settings as the default, choose Options, then Save Configuration.

4.3.3 Batch Loading from the Command Line

You can batch load content by entering the Batch Loader parameters from a command line rather than entering them in the Batch Loader window. To run the Batch Loader from the command line:

  1. Open the DomainHome/ucm/cs/bin/intradoc.cfg file in a text editor, and add the following line, where sysadmin is the user name of the Content Server system administrator:
    BatchLoaderUserName=sysadmin
    

    This is required so that the system logs in as the system administrator, because only users who have admin rights have permission to run the Batch Loader utility.

  2. Save and close the file.
  3. Open a command line window and go to the DomainHome/ucm/cs/bin/ directory.

    Note:

    Run the Batch Loader using the same operating system account that runs the Content Server instance. Otherwise, the software might not process your files due to permissions problems.

  4. Enter the following command:
    • Windows:

      BatchLoader.exe -q -nbatchloadfile
      
    • UNIX:

      BatchLoader -q -nbatchloadfile
      

    The Batch Loader processes the batch load file, but message boxes will not be shown.

  5. Correct any problems with the batch load.

The following flags can be used with the BatchLoader command from the command line:

Flag Required? Description

-q or /q

No

Runs the Batch Loader in quiet mode in the background. (If the Batch Loader is run from the command line without this flag, the Batch Loader window will appear.)

-n or /n

Yes

Batch Load File field value.

-console

No

Echoes all output to the HTML Content Server log and to the console window that is running the Batch Loader. For details, see Batch Loader -console Command Line Switch.

4.3.3.1 Windows Example

The following example shows the correct syntax to run the Batch Loader from a Windows command line, where the batch load file is c:/batching/batchinsert.txt:

BatchLoader.exe -q -nc:/batching/batchinsert.txt
4.3.3.2 UNIX Example

The following example shows the correct syntax to run the Batch Loader from a UNIX command line, where the batch load file is /batching/batchinsert.txt:

BatchLoader -q -n/batching/batchinsert.txt

4.3.4 Using the IdcCommand Utility and Remote Access

Occasionally, you may need to use remote access when managing your Content Server instance. This does not necessarily mean that remote terminal access is required. However, you must have the ability to submit commands to the server from a remote location.

Combining remote access with the IdcCommand utility provides a powerful toolset and an easy way to check in a large number of files to your instance. To take advantage of this functionality, you will need to properly set up the workstation to submit commands and be able to use the IdcCommand utility with a batch load command file.

This section covers the following topics:

4.3.4.1 Batch Load Command Files

A batch load command file contains a set of commands for each file that is loaded. If you are loading a large number of files, the command file may contain hundreds of lines. Using an editing tool can simplify the task of creating the numerous required lines. For example, the procedure for Preparing for Remote Batch Loading shows how you can prepare a batch load command file using the editing and mail merge features of Microsoft Office.

The following is an example Batch Load command file:

@Properties LocalData
IdcService=CHECKIN_UNIVERSAL
doFileCopy=1
dDocTitle=thisfile
dDocType=Native
dSecurityGroup=Internal
dDocAuthor=sysadmin
primaryFile=filename
primaryFile:Path=pathtothefile/primaryfilename
xComments=Initial Check In
@end
<<EOD>>@Properties LocalData
IdcService=CHECKIN_UNIVERSAL
doFileCopy=1
dDocTitle=99.tif
dDocType=Native
dSecurityGroup=Internal
dDocAuthor=sysadmin
primaryFile=350.afp
primaryFile:path=/lofs/invoices/350.afp
xComments=Initial Check In
@end
<<EOD>>
4.3.4.2 Preparing for Remote Batch Loading

You can perform batch loading from remote locations. The following procedure is written for a Microsoft Windows operating system and contains these main stages:

  • Configure the local computer

  • Test the configuration for the remote workstation

  • Create a batch load command file

  • Execute the upload

4.3.4.2.1 Configuring the Local Computer

To configure the local computer:

  1. Open Windows Explorer.
  2. Create a working directory (for example, C:\working_dir).
  3. In the working directory, create one or more directories for the Content Server instances you will be accessing (for example, C:\working_dir\development and C:\working_dir\contribution). These directories can be referred to as DomainHomeName.
  4. In each DomainHomeName directory, create a cmdfiles subdirectory.
  5. From the remote Content Server instance, copy the following directories from MW_HOME\user_projects\domains\Domain_Name\ucm\cs in to their respective DomainHomeName (in this case C:\working_dir\development and C:\working_dir\contribution).
    • working_dir\DomainHomeName\ucm\cs\bin

    • working_dir\DomainHomeName\ucm\cs\config

  6. From the remote Content Server instance, copy the following directories (and their files) to your working directory:
    • working_dir\idc\bin

    • working_dir\idc\components

      (copying the CSDms and NativeOsUtils component files should be sufficient)

    • working_dir\idc\config

    • working_dir\idc\jlib

    • working_dir\idc\resources\core\lang

    • working_dir\idc\resources\core\table

    • working_dir\idc\resources\core\config

  7. Using a text editor, open the DomainHomeName\ucm\cs\bin\intradoc.cfg file on your local system and update the IntradocDir configuration variable to match your directory structure. For example:
    IntradocDir=working_dir\DomainHomeName\ucm\cs,
    IdcHomeDir=working_dir\idc
    WeblayoutDir=working_dir\DomainHomeName\ucm\cs\weblayout
    
  8. Using a text editor, open the working_dir\DomainHomeName\ucm\cs\config\config.cfg file on your local system and verify the following settings are correct.
    IntradocServerPort=4444
    IntradocServerHostName=HostMachineName
    
  9. In the remote Content Server instance, add the IP address of the local computer to the Security Filter, using the Systems Properties utility.
  10. Restart the remote Content Server instance.
4.3.4.2.2 Testing the Configuration for the Remote Workstation

To test the configuration for the remote workstation:

  1. In the cmdfiles directory, create a file named pingservertest.hda and add the following lines:
    @Properties LocalData
    IdcService=PING_SERVER
    @end
    
  2. Open a command prompt and change to your working bin directory (for example, cd C:\working_dir\development\bin )
  3. Issue the following command:
    IdcCommand -f ..\cmdfiles\pingservertest.hda -u sysadmin -l ..\pingservertest.log -c server
    
  4. Confirm the output. If you are successful, you will get the following message from the server.
    3/24/04: Success executing service PING_SERVER.
    You have completed your setup for remote commands.
4.3.4.2.3 Creating a Batch Load Command File

This procedure uses the editing and mailmerge features of Microsoft Office to create a batch load command file. To create a batch load command file:

  1. Create a file listing of your directory contents:

    1. Open a command prompt and change to the root directory representing the files you intend to load.

    2. Create a file listing, using the following command to redirect the output in to a file:

      dir /s /b > filelisting.txt
      
    3. Check your filelisting.txt file; it will look something like this:

      V:\policies\ADMIN\working_dir_Admin\AbbreviationList.doc
      V:\policies\ADMIN\working_dir_Admin\Abbreviations.doc
      V:\policies\ADMIN\working_dir_Admin\AbsencePres.doc
      V:\policies\ADMIN\working_dir_Admin\AdmPatientCare.doc
      V:\policies\ADMIN\working_dir_Admin\AdmRounds.doc
      V:\policies\ADMIN\working_dir_Admin\AdverseEvents.doc
      V:\policies\ADMIN\working_dir_Admin\ArchivesPermanent.doc
      V:\policies\ADMIN\working_dir_Admin\ArchivesRetrieval.doc
      V:\policies\ADMIN\working_dir_Admin\ArchivesStandardReq.doc

      Note:

      When working with batch loads, it is important to note that the file must exist on the server indicated by the primaryFile statement in the batch load command file. Optimally, you should use the same letter to map the directory of files to the server and to your local system. Alternatively, you can copy the directory of files to the server temporarily.

  2. Edit the file listing to create your file name and title data:

    1. Open your filelisting.txt file in Excel.

    2. Using Replace, remove all the directory information leaving only the file name. Also look for and remove the line for filelisting.txt.

    3. Copy column A (containing the file names) to column B. In this example the file name is also used for the title and Column B will become the title.

    4. Using Replace, remove the file extension from the names in column B.

    5. Insert a new first line and enter filename in the first column and title in the second.

    6. Save the file.

  3. Create an .hda file from the file listing using Mail Merge features:

    1. Open the Word application and create a new document with your set of batch load commands. The following example shows basic batch load commands. You must match your configuration settings when you create your batch load commands.

      @Properties LocalData
      IdcService=CHECKIN_UNIVERSAL
      doFileCopy=1
      dDocTitle=
      dDocType=Native
      dSecurityGroup=Internal
      dDocAccount=Policy/Admin
      dDocAuthor=sysadmin
      primaryFile=d:/temp/working_dir_Admin/
      xComments=Initial Check In
      @end
      <<EOD>>
      
    2. Select Tools / Letters and Mailing / Mail Merge Wizard and advance through the wizard. Choose the selections below to use your filelisting.txt file as input to the mail merge.

      • Letter Document (step 1)

      • Current document (step 2)

      • Existing List (step 3) and select your Excel spreadsheet as the data source

      • More Items (step 4), place the title and filename fields in to the word document so that it looks like the following:

        @Properties LocalData
        IdcService=CHECKIN_UNIVERSAL
        doFileCopy=1
        dDocTitle="title"
        dDocType=Native
        dSecurityGroup=Internal
        dDocAccount=Policy/Admin
        dDocAuthor=sysadmin
        primaryFile=d:/temp/working_dir_Admin/"filename"
        xHistory=Initial Check In
        @end
        <<EOD>>
        
    3. Complete the mail merge (Steps 5 and 6) and you will have a new Word document with one merge record per page.

    4. Edit the letters, selecting all, and use the Replace feature to remove all of the section breaks.

    5. Save the file as a plain text file to the /cmdfiles directory with the file extension of hda (for example, filelisting.hda)

4.3.4.2.4 Executing the Batch Load Upload

To execute the upload:

  1. Open a command prompt.
  2. Navigate to the working bin directory.
  3. Issue the command:
    IdcCommand -f ../cmdfiles/filelisting.hda -u sysadmin -l ../filelisting.log -c server
    

    Your files will be checked in to the Content Server repository and a message appears in the command window as each file is checked in.

4.3.5 Batch Loading Content as Metadata Only

Depending on the action you plan to perform using the Batch Loader, certain fields are required in the batch load file. If you are updating only the metadata in existing content items, the primaryFile field is not required in the batch load file; for more information see Update Requirements.

However, if you want to load (insert action) content in to the Content Server instance as metadata only, then the primaryFile field is required in the batch load file. Although the field is ignored by the import, the Batch Loader expects it to be defined. If the primaryFile field is missing, you will get an error as follows (or similar):

Please check record number <number>. BatchLoader: unable to check in '<record>' because the required field 'primaryFile' is missing.

To batch load content as metadata only:

  1. Open the Content Server instance config.cfg file:

    IntradocDir/config/config.cfg

  2. Add the following configuration variables:
    createPrimaryMetaFile=true
    AllowPrimaryMetaFile=true
    
  3. Save and close the config.cfg file.
  4. In the batch load file, add the following fields for each record:
    primaryFile=
    createPrimaryMetaFile=true
    

    Note that leaving the primaryFile field blank is acceptable. The field is ignored but must be included.

  5. Continue to batch load your content using the Batch Loader procedure or the command line procedure. For more information, see Batch Loading from the Batch Loader Window or Batch Loading from the Command Line.

4.3.6 Batch Loader -console Command Line Switch

Adding the -console switch to the Batch Loader command line causes all output to be echoed to the HTML Content Server log and to the console window that is running the Batch Loader. Alternatively, you can use operating system redirects to send the output to a separate log file.

Note:

The -console switch does not follow standard Windows command line syntax (although this may be corrected in later versions). You must use the -console syntax usually associated with UNIX instead of the /console syntax. With most other command line utilities, both syntaxes will work on both platforms.

Command Line Example

  • Windows command line:

    BatchLoader.exe -console -q -nc:/batching/batchinsert.txt
    
  • UNIX command line:

    BatchLoader -console -q -n/u2/apps/batching/batchinsert.txt

Sample Output

Processed 1 of 4 record.
Processed 2 of 4 records.
Processed 3 of 4 records.
Processed 4 of 4 records.
Done processing batch file 'c:/batching/batchinsert.txt'. Out of 4 records processed, 4 succeeded and 0 errors occurred.

4.3.7 Adding a Redirect

You can use a redirect symbol on the command line to send the Batch Loader output to a separate log file. The symbol works on both UNIX and Windows. By default, the -console switch sends the Batch Loader's output to stderr. To redirect the output to a different file, use the special redirect symbol 2>.

In the following examples, each command must be entered all on one line.

  • Windows command line with redirect:

    BatchLoader.exe -console -q -nc:/batching/batchinsert.txt 2> batchlog.txt
    
  • UNIX command line with redirect:

    BatchLoader -console -q -n/u2/apps/batching/batchinsert.txt 2>
    /logs/CSbatchload.log

4.3.8 Correcting Batch Load Errors

To correct any errors that occur during batch loading.

  1. Choose Administration, then Log Files, then Content Server Logs.
  2. In the Content Server log file, look through the Type column for the word Error.
  3. Read the description to determine the problem.
  4. Fix the error in one of these files:
    • The batch load file.

    • The error file for the failed content. (This option is available only if you enabled it on the Batch Loader window.) The error file is located in the same directory as the batch load file, with several digits appended to the batch load file name.

      Note:

      If you rerun an entire batch load file, content items that have already been checked in will usually fail. This occurs because the release dates of the existing content items will be the same as the ones you are trying to insert.

Figure 4-4 Sample Content Server Log File

Description of Figure 4-4 follows
Description of "Figure 4-4 Sample Content Server Log File"

4.4 Optimizing Batch Loader Performance

This section provides some basic guidelines that you can use to improve Batch Loader performance. These suggestions can minimize potentially slow batch load performance when you are checking in a large number of content items. In many cases, proper tuning for batch loading can significantly speed up a slow server.

To minimize batch loading slow downs, try implementing the following Batch Loader adjustments:

  • Temporarily disable other activities such as shutting down Inbound Refinery (see Starting and Stopping Oracle WebCenter Content Server and Inbound Refinery Instances in Managing Oracle WebCenter Content) and suspending the automatic update cycle feature of the Repository Manager.

  • Analyze your database usage during a batch load to help the database query optimizer. Databases have built-in optimizer utilities that can help make database queries more efficient. However, to maximize the efficiency of optimizers, it is necessary to update or re-create the statistics about the physical characteristics of a table and the associated indexes. These characteristics include number of records, number of pages, and the average record length. The optimizers use these statistics to access data.

    Each database has a proprietary command that you can use to invoke the statistic update or recreation process. For example:

    • For Oracle, use the ANALYZE TABLE COMPUTE STATISTICS command

    • For SQL Server, use the CREATE STATISTICS statement

    • For DB2, use the RUNSTATS command

4.5 Best Practice Case Study

This case study describes a very slow load batch performance and the steps taken to diagnose and correct the situation. This information can serve as a model for isolating underlying issues and resolving batch loading performance problems.

4.5.1 Background Information

A user wanted to load 27,000 content items in to the Content Server instance that was running on an AIX server. The DB2 database was running on a separate AIX server. The content items included TIF files as the native files and corresponding PDF files as the web-viewable files. Inbound Refinery generated thumbnails from the native files.

Initially during the batch load, the performance was acceptable with sub-second insert times. However, after a few thousand content items were loaded, the performance began to degrade. Content items started to require a few seconds to load and, eventually, the load time was over 10 seconds per content item.

4.5.2 Preliminary Troubleshooting

While the batch load was running, nothing seemed to be wrong with the Content Server instance. It had sufficient memory, the CPU utilization was low (less than 5%), and there were no disk bottlenecks. The Inbound Refinery server was busy, but was processing thumbnails at an acceptable rate.

Two issues were found with the database server:

  • Two processes were taking turns to update the database. While one process was executing, the second process waited for other process to release database locks. When the first process completed, the second process executed while the first process waited. The processes in this execute/wait cycle included:

    • The actual batch load process that was updating the database tables after inserting a content item.

    • The Content Server instance was updating the database tables; changing the status from GENWWW to DONE after receiving notification that a thumbnail had been completed.

    The two processes should not have been contending with each other because they were not updating the same content items. It seemed that the two processes were locking each other out because DB2 had performed lock escalation and was now locking entire database pages instead of single rows.

  • There were a large number of tablespace scans being performed by both processes.

4.5.3 Solution

A two-step solution was used:

  1. Inbound Refinery was shut down to prevent the status update process from competing with the batch loading process. The performance did improve because there was a 2000+ backlog of content items from the completed thumbnails.

  2. A RUNSTATS command was issued on all the Content Server database tables to update the table statistics. This dramatically improved the performance of the batch load. The insert time returned to sub-second and the batch load completed within a short amount of time. It took 21 hours to insert the first 22,000 content items. After updating the table statistics, the remaining 5,000 content items were inserted in 13 minutes.