Skip Headers
Oracle® Fusion Middleware Content Management Guide for Oracle WebLogic Portal
10g Release 3 (10.3.6)

Part Number E14230-03
Go to Documentation Home
Home
Go to Table of Contents
Contents
Go to Feedback page
Contact Us

Go to previous page
Previous
View PDF

18 Importing Third-Party Content

The BulkLoader is a command-line application that loads content and metadata from a file system into a WLP Virtual Content Repository.

Specifically, the BulkLoader scans a directory structure containing content and loads it into a specified content repository. In addition to loading content, BulkLoader reads prepared metadata files and associates the metadata with each loaded content item. Metadata files can be prepared for specific content items, or more broadly for directories and subdirectories of items.

Note:

The BulkLoader only supports uploading of simple metadata and binary files. It does not support all WLP repository features, such as nested types and link properties.

If you use the BulkLoader to load content into a database repository, both the metadata and binary files are transferred to the repository. If you load into a file system repository, then only the metadata is transferred to the database—the actual content files remain in place on the file system.

Note:

You cannot use the Bulkloader to update existing content (or its metadata) within a file system repository. It can only be used to add new content to the repository. To modify existing content, as well as adding new content, you use the WebLogic Portal Administration Console.

This chapter explains how to use BulkLoader and includes these topics:

18.1 Preparing to Use BulkLoader

Before running BulkLoader, you need to:

18.1.1 Creating a Repository

BulkLoader loads content and metadata into a pre-established content repository. For information on creating a repository, see Chapter 3, "Configuring WLP Repositories."

18.1.2 Creating Appropriate Types

Each piece of content stored in a repository is associated with a type. A type is a definition that includes specific metadata fields that can be used to identify and describe content items associated with that type. The WLP Repository contains several predefined default types. For example, the predefined image type contains three metadata fields:

  • Description

  • File

  • Image Name

You can create your own types or use the provided types. For information on creating types, see the Chapter 6, "Using Content Types in Your WLP Repository."

Note:

A type associated with a content item must be created with binary as its primary property.

18.1.3 Preparing a Content Directory

BulkLoader loads all the content from a specified directory (and, by default, subdirectories) into the content repository. Directories are automatically re-created as hierarchy nodes (folders) in the content repository. The directory structure you load into the repository should only contain the content you wish to add to the repository. BulkLoader loads all files within this directory structure.

Tip:

You can configure BulkLoader, using command-line flags, to ignore or include particular files or folders based on filename pattern matching.

18.1.4 Preparing Metadata Files

Each piece of content in the repository is mapped to a specific type. A type includes default and user-defined properties. These properties, also known as metadata, allow content items in the repository to be identified and searched.

BulkLoader allows you to automatically associate individual files and/or directories of files with specific types. This section describes these associations and how to add metadata when library services are enabled for your repository, plus how to properly name and store metadata files.

18.1.4.1 Defining Metadata for a Directory of Files

If you know that an entire directory (and, by default, its subdirectories) contains files of the same type, you can specify that type to be associated with all of those files when BulkLoader stores them in the repository. To do this, place a file called dir.md.properties in the root directory containing the related content. This file must contain a single line:

nodeType=type

where type is the name of the type to associate with the content. For example:

nodeType=image

By default, all content in the directory and its subdirectories will be associated with the type. If a subdirectory contains another dir.md.properties file, then the type defined in that file overrides the original one for that directory and any of its subdirectories. Furthermore, if a filename.md.properties file is encountered, it also overrides the dir.md.properties file for that specific file. The filename.md.properties file is described next.

18.1.4.2 Defining Metadata for Specific Files

You can also define metadata for specific files loaded by BulkLoader. To do this, create a file called filename.md.properties for each piece of content, where filename is the name of the file with which the metadata is associated. This file must contain all of the name-value pairs associated with a type. For example, the following entries are associated with the Ad type:

nodeType=Ad
height=65
width=115
adTargetUrl=
adTargetContent=
adWinClose=
adWinTarget=
adWinTitle=
adClickTarget=
adUseXhtml=
adAltText=Oracle Logo
adMapName=
adMap=
adBorder=
audience=internal

You can then add values for some or all of these properties and save the file. Place the saved file in the same directory as the content item with which it is associated. When BulkLoader runs, the metadata is stored and permanently associated with the specified content item.

18.1.4.3 Metadata Guidelines

  • A content type may have only one binary property, and it must be marked as the primary property.

  • If a type has required fields, those fields must be given values in the filename.md.properties file.

  • If you are bulkloading for a file system repository, only one binary property is allowed, and it must be the primary property.

  • When uploading DATE/TIME properties, you need to use the java.text.DateFormat.SHORT format which is MM/DD/YY HH:MM AM/PM. The order of the day/month in the date is dependent on the locale of the JVM.

18.1.4.4 Creating Metadata for a Library Services Enabled Repository

If you are storing content in a library services enabled repository, you must include the lifecyclestatus key in the filename.md.properties file for each content item. The lifecyclestatus key takes the following integer values that indicate the status of the content item, as shown in Table 18-1:

Table 18-1 Workflow Status

Workflow Status Integer Used

Draft

1

Ready

2

Published

4

Retired

5


For example, the following md.properties entries are associated with the Ad type, and include the lifecyclestatus entry, where the status value is set to 2, or "ready".

nodeType=Ad
height=65
width=115
adTargetUrl=
adTargetContent=
adWinClose=
adWinTarget=
adWinTitle=
adClickTarget=
lifecyclestatus=2
adUseXhtml=
adAltText=Oracle Logo
adMapName=
adMap=
adBorder=
audience=internal

You can then add values for some or all of the other properties and save the file. Place the saved file in the same directory as the content item with which it is associated.

Note:

If you are bulkloading content into a library services-enabled repository, you can only add content. You cannot update existing content with the BulkLoader when using library services.

18.1.4.5 Naming and Storing Metadata Files

When BulkLoader encounters a directory to process, it attempts to load metadata property files. First, BulkLoader looks for a file called dir.md.properties in the directory. If there are no overriding metadata files, these properties are applied to all content items in the directory and, unless overridden, its subdirectories. Metadata files can be associated with specific content files, and these metadata files override the directory-level file. Metadata files associated with specific content files must be named according to the following convention:

filename.md.properties

where filename is the name of the associated content item file. For example:

logo.gif.md.properties

In this case, the metadata file is associated with an image file called logo.gif.

Note:

You can change the default extension from md.properties to anything you like, using BulkLoader's -mdext parameter.

Tip:

By default, BulkLoader recurses into subdirectories and the properties in an dir.md.properties file are inherited by content in subdirectories. To override this behavior, specify the +recurse flag, which turns off recursion and specify the +inheritProps flag, which turns off metadata property inheritance in subdirectories.

18.1.4.6 Metadata Summary

In summary, BulkLoader gathers content metadata from the following sources, in the order shown:

18.2 Configuring and Running BulkLoader

Typically, you run BulkLoader from a script. You need to edit this script to run in your environment, and to customize parameters that are passed to the BulkLoader program itself.

Note:

If BulkLoader fails with an out of memory error, increase your Java heap size. You may do this in the BulkLoader script by passing -Xms<xxx>m as a parameter to the BulkLoader command, where <xxx> is the number of megabytes. For example, -Xms1000m.

The following script is provided with Weblogic Portal:

Windows: <WLPORTAL_HOME>\content-mgmt\bin\load_cm_data.cmd

Unix: <WLPORTAL_HOME>/content-mgmt/bin/load_cm_data.sh

Note:

WebLogic server must be running when you use BulkLoader.

18.3 Editing the BulkLoader Script

You need to modify the default script to match your needs. The following script is provided with Weblogic Portal:

Windows: <WLPORTAL_HOME>\content-mgmt\bin\load_cm_data.cmd

Unix: <WLPORTAL_HOME>/content-mgmt/bin/load_cm_data.sh

  1. Open the script file for editing.

  2. Set the WL_HOME variable to point to your WebLogic Server installation. For example:

    WL_HOME=C:\bea\wlserver_10.3
    
  3. Set the CM_DATA variable to point to the parent directory of the directory containing the content you wish to load into the content repository. For example, if the content you want to store is located in D:\myContent\images, set CM_DATA to:

    CM_DATA=D:\myContent
    
  4. Configure the BulkLoader command parameters in the command script. For information about BulkLoader command parameters, see Section 18.4, "BulkLoader Parameter Reference."

Tip:

You can run the BulkLoader script from the command line or by double-clicking the file icon.

18.3.1 BulkLoader Command Examples

Note:

The BulkLoader command does not support wildcards or regular expressions in its parameter list.

The following command recursively loads all files in the directories Images, Audio, and Doc in D:\media. Note that Images, Audio, and Doc must each contain a dir.md.properties file, or there must be a filename.md.properties file defined for each content item in those directories. For a description of BulkLoader parameters, see Section 18.4, "BulkLoader Parameter Reference."

Example 18-1 Example of a BulkLoader Command Script

%JAVA_HOME%\bin\java -classpath %CLASSPATH%
com.bea.content.loader.bulk.BulkLoader -verbose -repository "MyRepository"
-application portalApp -d D:\media Images Audio Doc

The command shown in Example 18-2 loads all files in D:\media\images. The command does not recurse into subdirectories. Metadata files with a *.info.properties naming convention are recognized.

Example 18-2 Example of a Bulkloader Command Script

%JAVA_HOME%\bin\java -classpath %CLASSPATH%
com.bea.content.loader.bulk.BulkLoader -verbose -repository "MyRepository"
-application portalApp -mdext info.properties +recurse -d D:\media images

18.4 BulkLoader Parameter Reference

Table 18-2 describes the required parameters for executing the BulkLoader script.

Table 18-2 Required Bulkloader Parameters

Required Parameters Description
-repository <repository name>

The name of the repository to run the loader against.

-application <app name>

The name of the application to run the loader against.

-url <wls url>

The WebLogic Server instance host where the content manager is running. This parameter defaults to t3s://localhost:7002. An SSL (t3s) URL is required; a non-SSL (t3) URL would be invalid.

Note: By default, the SSL Listen Port is not enabled. You can enable it by configuring server settings in the WebLogic Server console.

-d <dir>

Specfies the base directory that contains the directories and files you wish to load into the content repository. If you do not specify this option, the current directory (.) is used. This directory must match the cm_fileSystem_path property that was defined when the repository was created. The path can be relative or absolute.


Table 18-3 describes the optional parameters that you can include while executing the BulkLoader script.

Table 18-3 Optional Bulkloader Parameters

Optional Parameters Description
-verbose

Prints messages while BulkLoader is running.

+verbose

Enables BulkLoader to run quietly, without printing messages. (Default)

-recurse

Recurse into directories. (Default)

+recurse

Do not recurse into directories.

-metaparse

Parse HTML files for META tags. (Default)

+metaparse

Do not parse HTML files for META tags.

-hidden

Ignore hidden files and directories. (Default)

+hidden

Include hidden files and directories.

-inheritProps

Inherit metadata properties when recursing. (Default)

+inheritProps

Do not inherit metadata properties when recursing.

-ignoreErrors

Ignore errors while loading content; errors are still reported.

+ignoreErrors

If an error is encountered, stop processing. (Default)

-htmlPat <pattern>

Specifies file extensions that represent HTML files. If the -metaparse flag is set, the values of <meta> and <title> tags are read from these files and stored as content metadata. You can specify this flag multiple times to define multiple file extensions. By default, *.htm and *.html are used.

-encoding <encoding>

Specifies the file encoding to use. See your JDK documentation for the valid encoding names. (Default: the system's default file encoding)

-match <pattern>

Specifies a file/directory name pattern for BulkLoader to load. All files matching this pattern are loaded. You can specify this flag multiple times to define more patterns. If this flag is omitted, all files and directories are loaded.

-ignore <pattern>

Specifies a file/directory name pattern for BulkLoader to ignore. All files matching this pattern are ignored. You can specify this flag multiple times to define more patterns.

-mdext <ext>

Specifies the file name extension for metadata property files. The value should start with a ".". This defaults to .md.properties.

-filter <filter class>

Although not commonly used, you can write a custom filter to set metadata values based on specific characteristics of a type of content. For instance, a filter might compute the width and height of an image file and set the values in metadata.

This flag sets the class name of a LoaderFilter to run files through. To add to the list of LoaderFilters, you can specify this flag multiple times. The LoaderFilter may assign additional metadata to the file. When BulkLoader starts up, it looks for a content\com\bea\content\loader\bulk file in the class path. From there, it looks for a loader.defFilters property, which is the colon-separated list of LoaderFilter class names that BulkLoader should always load. Unless this file is modified, BulkLoader loads an ImageLoaderFilter, which pulls the width and height from *.gif, *.jpg, *.png, and *.xbm image files.

+filters

Clears the current list of LoaderFilters, including the default filters.

-batch <batch properties file>

This parameter has been deprecated; use -pwdFile.

-user <principal username>

This argument has been deprecated; use -pwdFile.

-password <principal password>

This argument has been deprecated; use -pwdFile.

-pwdFile <encrypted username/password properties file>

Specifies the file that stores the encrypted user name and password. This file resides in the same directory from which the bulkloader script is run.

For information about generating passwords, see the Section 18.4.1, "Generating Encrypted Strings for the User Name and Password Properties File." For more information about the -pwdFile parameter, see the Bulkloader class Javadoc.

-fileSystem

Indicates that the repository is a file system repository.

For more information, see the Section 3.3, "Working with a WLP File System Repository."

-deletePath <path to delete>

Specifies the full path, without the repository name, of the hierarchy to delete, starting with "/". The repository must be specified in the -repository parameter.

--

Everything after this parameter is considered to be a file or directory to be uploaded. No other parameters should follow, only file names.

file1...filen 

Specifies the name(s) of the files and/or folders to be loaded into the content management system. These files and folders are assumed to be located relative to the base directory specified by the -d parameter.

The list of file names, if specified, should be provided after the -- parameter.

-Xms<xxx>m

Increases the Java heap size, where <xxx> is the number of megabytes. For example -Xms1000m. Try using this flag if BulkLoader fails with an out-of-memory error.

-h 

Display command line usage.


18.4.1 Generating Encrypted Strings for the User Name and Password Properties File

For security purposes, passwords sent over a network must be encrypted. By using the -pwdFile parameter, you can pass credentials through a properties file containing the user name and password. This file contains two entries, starting with username= and password=. The string values in these entries may be plain text or encrypted. If the values are not encrypted, then the first time the bulkloader script is run, they will be overwritten with encrypted values. Alternatively, you can encrypt plain text values by running the weblogic.security.Encrypt script and pasting the encrypted string values into the properties file manually.

To generate encrypted strings:

  1. Open a command window.

  2. Run FM_HOME/wlserver_10.3/common/bin/commEnv.cmd. This command sets up the required envirornment variables.

  3. Go to your DOMAIN_HOME directory.

  4. Run the command: java weblogic.security.Encrypt.

If the specified properties file or the default pwd.properties file does not exist and no user name and password are found, then the bulkloader script prompts for a user name and password. The credentials you specify are encrypted and stored in the default pwd.properties file in the same directory from which the bulkloader script is run. If the properties file exists, but the user name and password values are unencrypted, then the credentials are encrypted and the properties file is overwritten with the encrypted values.

Note:

Support for entering a password on the command line is deprecated and is subject to being removed from a future release.

18.4.2 Example BulkLoader Script

Example 18-3 configures the appropriate paths and runs the BulkLoader program. You can modify this script to suit your specific environment and requirements.

Example 18-3 BulkLoader Script (Windows)

@ECHO OFF
REM ############################################################################
REM # (c) Oracle Corporation All rights reserved
REM #
REM # NOTE: WL_HOME and PORTAL_HOME must be set before running this script
REM ############################################################################

SETLOCAL

IF NOT "%WL_HOME%"=="" (
CALL %WL_HOME%\common\bin\commEnv.cmd
) ELSE (
CALL ..\..\..\wlserver_10.3\common\bin\commEnv.cmd
)

SET PORTAL_HOME=%WL_HOME%\..\wlportal_10.3

IF "%WL_HOME%"=="" (
echo ****** NOTE: The environment variables WL_HOME and PORTAL_HOME must be set before running this script
)
echo
echo ***** WL_HOME is currently set as: %WL_HOME%
echo ***** PORTAL_HOME is currently set as: %PORTAL_HOME%
echo

SET CONTENT_HOME=%PORTAL_HOME%\content-mgmt\lib
SET P13N_HOME=%PORTAL_HOME%\p13n\lib\system
SET P13N_JARS=%P13N_HOME%\p13n_system.jar
SET CONTENT_JARS=%CONTENT_HOME%\content-client.jar;%CONTENT_HOME%\content.jar;%CONTENT_HOME%\system\content_system.jar

CALL %WL_HOME%\common\bin\commEnv.cmd

@rem ***************************************************************************
@rem Set any additional CLASSPATH information below
@rem ***************************************************************************

set CLASSPATH=%WEBLOGIC_CLASSPATH%;%P13N_JARS%;%CONTENT_JARS%
echo Running script with CLASSPATH: %CLASSPATH%
echo
REM Set some defaults
if "%CM_DATA%"=="" set CM_DATA=..\db\data\sample\cm_data
if "%DOMAIN_DIR%"=="" set DOMAIN_DIR=..\..\samples\domains\portal
@rem These are the deployed app settings
%JAVA_HOME%\bin\java -classpath %CLASSPATH% 
-Dweblogic.RootDirectory=%DOMAIN_DIR% com.bea.content.loader.bulk.BulkLoader 
-verbose -repository "Shared Content Repository" -application "portalApp" -d %CM_DATA% Ads%*
ENDLOCAL