63 Archiving to EMC Documentum

This chapter provides instructions on archiving WebCenter Sites assets to EMC Documentum.

This chapter contains the following sections:

63.1 Overview of the Archival Process

Archiving WebCenter Sites assets to EMC Documentum requires the components shown in Figure 63-1, "System Architecture for Archiving". Users who are familiar with the process of publishing Documentum objects to WebCenter Sites will recognize archiving to be a similar process.

Figure 63-1 System Architecture for Archiving

Description of Figure 63-1 follows
Description of ''Figure 63-1 System Architecture for Archiving''

The main differences between publishing and archiving are the following:

  • In the archival process, Content Integration Agent treats the Content Server (CS) DataStore, rather than WebCenter Sites, as its source of data. The CS DataStore contains assets in folders and files that map to Documentum folders of type dm_folder and files of type fw_document . The CS DataStore is created when assets in the WebCenter Sites system are published to a RealTime destination while archiving is enabled.

  • Content Integration Agent archives data to Documentum directly. It does not require the Sites Agent Services component to store data on the target system.

This section contains the following topics:

63.1.1 System Architecture and Process Flow

Content Integration Agent synchronizes the CS DataStore and target Documentum folder via the publish command, the synchronization engine, and the mappings.xml file, which provides the metadata map. Following the initial archival session, the synchronization process runs automatically. Details of the implementation are described below.

Initial Synchronization

Issuing the publish command invokes the synchronization engine to archive the CS DataStore to Documentum and thereby initialize the synchronization process. The complete set of steps is outlined below (and shown in Figure 63-1):

  1. Assets in WebCenter Sites are RealTime published with archiving enabled.

  2. The CS DataStore is created.

  3. Issuing the publish command invokes the synchronization engine, which then:

    • refers to the CS DataStore (which is specified in the publish command)

    • reads the files in the CS DataStore, retrieves their metadata, and

    • converts the metadata to a Documentum-compliant format, using mappings.xml

  4. The synchronization engine then stores the CS DataStore files to the target Documentum folder.

Monitoring the CS DataStore

Following the initialization process, the synchronization engine monitors the archived CS DataStore and automatically replicates changes to the Documentum target folder. When an asset based on the archived metadata is created or modified in the monitored CS DataStore (during a RealTime publishing process), the synchronization engine replicates the new or modified asset to the target folder on Documentum. If metadata is modified, mappings.xml must be reconfigured and the CS DataStore must be republished.

Tuning the Integration

Content Integration Agent contains a configuration file named catalog.xml, which stores information about the archival session and allows tuning of the synchronization interval. For more information about tuning, see Section 63.4, "Tuning the Synchronization Process."

63.1.2 CS DataStore

The CS DataStore contains WebCenter Sites assets in folders and files that map to Documentum folders of type dm_folder and files of type fw_document. A CS DataStore is created when assets are published to a RealTime destination with archiving enabled. The export path is:

<CS DataStore = cs.pgexportfolder>/CIP_DataStore/<CS DataStore for Publishing Destination>

Figure 63-2 CS DataStore Archival Example

Description of Figure 63-2 follows
Description of ''Figure 63-2 CS DataStore Archival Example''

Figure 63-3 illustrates the archival of a sample CS DataStore.

Figure 63-3 Sample CS DataStore Archived to Documentum

Description of Figure 63-3 follows
Description of ''Figure 63-3 Sample CS DataStore Archived to Documentum''

Figure 63-4, "Sample Asset Folder" shows the contents of a typical asset folder (the same folder, named 458, is also shown in Figure 63-3, "Sample CS DataStore Archived to Documentum"). Note that the administrative files shown in Figure 63-4 are hidden files. They are shown here only for illustration purposes.

Figure 63-4 Sample Asset Folder

Description of Figure 63-4 follows
Description of ''Figure 63-4 Sample Asset Folder''

63.1.3 Mapping Framework

The CIP mapping framework determines the success of the archival and synchronization processes. A CS DataStore can be archived to Documentum as long as its metadata is mapped. The basic mapping framework involves two configurable components: object types in the Documentum workspace and the mappings.xml file.

This section contains the following topics:

63.1.3.1 Object Types in the Documentum Workspace

Default Documentum target types are fw_asset and fw_documentum, which could be modified or replaced with custom types.

63.1.3.2 mappings.xml

The default mappings.xml file contains a documentum2cs section, which specifies the following mappings:

  • csds_Folder is the data type for all folders in the CS DataStore. Folders of type csds_Folder map to Documentum folders of type dm_folder :

    <assettype-mappingsourceid="csds_Folder" targetid="dm_folder"id="Folder" extends="Item" />
    
  • csds_Document is the data type for all files (except main.xml ) in the CS DataStore. Files of type csds_Document map to Documentum documents of type fw_document.

    <assettype-mappingsourceid="csds_Document" targetid="fw_document"id="Document" extends="Item" />
    
  • csds_Asset is the data type of the main.xml file, which defines the asset (see Figure 63-4, "Sample Asset Folder" ). Files of type csds_Asset map to Documentum documents of type fw_asset .

    <assettype-mappingsourceid="csds_Asset" targetid="fw_asset"id="Asset" extends="Document">
    
  • Attributes of the csds_Asset document type map to Documentum attributes as shown. All WebCenter Sites attributes are system-defined.

    Table 63-1 Document Type to Documentum Attribute Mapping

    source id target id source id target id

    id

    fw_id

    publist

    fw_publist

    name

    fw_name

    status

    fw_status

    createdby

    fw_createdby

    subtype

    fw_subtype

    createddate

    fw_createddate

    updatedby

    fw_updatedby

    description

    fw_description

    updateddate

    fw_updateddate


  • The datemodified attribute is a special attribute that stores the last publication date. The date is taken from the last modification time of the corresponding asset file in the CS DataStore. The datemodified attribute maps to the Documentum attribute fw_publisheddate .

63.1.3.3 Mappings, Publishing, and Synchronization

When archiving, bear in mind the following mapping specifications:

  • Default mapping: Any CS DataStore can be archived to Documentum without your having to modify the default mappings.xml file. Running the publish command archives the CS DataStore to the target Documentum folder. The source CS DataStore is then monitored by the synchronization engine. When a RealTime publishing session updates the CS DataStore, the new assets and modified assets are automatically replicated to the target Documentum folder by the synchronization engine.

  • Custom mapping: You can map selected asset types and definitions to your own Documentum object type. Instructions are available in Section 63.2.3.

63.2 Steps for Archiving WebCenter Sites Assets to Documentum

Section 63.2.1, "Prepare the Documentum System to Store WebCenter Sites Assets"

Section 63.2.2, "Configure the Path to the CS DataStore"

Section 63.2.3, "Add Metadata to mappings.xml"

Section 63.2.4, "RealTime Publish the Site"

Section 63.2.5, "Archive the CS DataStore on Documentum"

Section 63.2.6, "Archive Visitor-Generated Content"

63.2.1 Prepare the Documentum System to Store WebCenter Sites Assets

Follow these steps to prepare the Documentum system to store WebCenter Sites assets.

  1. Create a cabinet for WebCenter Sites assets (FWCabinet, for example).

  2. Create a folder for the given assets (WebSite1, for example).

  3. If you configured event notification, store the associated workflow processes anywhere on the Documentum system.

63.2.2 Configure the Path to the CS DataStore

In this step, you will enable the WebCenter Sites RealTime publishing system to export your selected site in a CS DataStore along the following path (see also Section 63.1.2, "CS DataStore"):

<CS DataStore=cs.pgexportfolder>/CIP_DataStore/<CSDataStore for Publishing Destination>

where:

  • <cs.pgexportfolder> defines the root directory of the CS DataStore. (The cs.pgexportfolder property is located in the futuretense.ini file on the WebCenter Sites delivery system.)

  • CIP_DataStore is a default subdirectory of <cs.pgexportfolder> . Its subfolders will be archived on Documentum.

  • <CSDataStore for Publishing Destination> is a subfolder that holds the assets of the published site.

To configure the path to the CS DataStore

  1. Configure the root directory of the CS DataStore by setting the cs.pgexportfolder property in the futuretense.ini file of the delivery system.

  2. Configure the RealTime publishing process to support archiving.

    1. Configure publishing as shown in Chapter 20, "Configuring the RealTime Publishing Process." If RealTime publishing is already configured, start with Chapter 20, "Create a RealTime Destination Definition on the Source System."

    2. on the Add New Destination form, set the More Arguments field as follows:

      ARCHIVETO=<CSDataStore for Publishing Destination>
      

      Note:

      If you are publishing a small number of assets (less than thousands) and wish to store their files to the same folder in the CS DataStore, specify USEHASHDIRS =false to disable hash folders.
    3. Complete the remaining steps up to and including mirroring site configuration data to the destination database.

63.2.3 Add Metadata to mappings.xml

To add custom mappings to mappings.xml, include the following information, depending on whether you are mapping a flex or basic asset type:

For flex asset types, sourceid takes the form <asset type>;<Definition>. For basic asset types, sourceid takes the form <asset type>. The targetid specifies the corresponding Documentum object type.

For example:

  • To map all flex assets of type Content_C with asset definition FSII Article to the fw_content object type, add the following line to mappings.xml :

    <assettype-mapping sourceid="Content_C;FSII Article" targetid="fw_content" id="Content"></assettype-mapping>
    
  • To map all basic assets of type FW_Article to the fw_article object type, add the following line to mappings.xml :

    <assettype-mapping sourceid="FW_Article" targetid="fw_article" id="Article"></assettype-mapping>
    

63.2.4 RealTime Publish the Site

Publish the content management site and verify that the CS DataStore was created in the specified path (in Section 63.2.2, "Configure the Path to the CS DataStore").

63.2.5 Archive the CS DataStore on Documentum

In this step, you will run the CIP publish command to archive the <CS DataStore for Publishing Destination> and initialize the synchronization process.

Note:

If you changed the port in the Oracle Fusion Middleware WebCenter Sites Installation Guide, make sure that the new port is set in facilities.xml, and add -p <port> to the command in 2, below (which starts CIPCommander ).
  1. Start Content Integration Agent.

  2. Run the CIPCommander executable (located in the bin folder of the system where Content Integration Agent is installed):

    cipcommander publish <source_providerid> <target_providerid> -source_repid <path to data store>-target_repname <cabinet name>-target_path <path within the cabinet>-mapping <mapping_id>-bulk_resynch_interval <seconds>-handlerset csdatastore-create false-replic_mode updated
    

    For example:

    cipcommanderpublish 7833d862-4f8b-4285-84f2-731d5af81865 d7a96a63-e78c-407c-8d7f-e84988806e49-source_repid c:\temp\csdatastore-target_repname Archive-target_path /fatwire-mapping csds2documentum-bulk_resynch_interval 60-handlerset csdatastore-create false-replic_mode updated
    

    Table 63-2 Publishing Parameters

    Publishing Parameter Value

    <source_providerid>

    provider ID for the WebCenter Sites system:

    7833d862-4f8b-4285-84f2-731d5af81865

    <target_providerid>

    Provider ID for the Documentum system:

    d7a96a63-e78c-407c-8d7f-e84988806e49

    -source_repid

    <path to data store>: Path to the

    <CS DataStore=cs.pgexportfolder> folder on the file system.

    -target_repname

    <cabinet name>: Name of the Documentum cabinet to which <CSDataStore for Publishing Destination> will be archived.

    -target_path

    <path within the cabinet> : Path to the Documentum folder in the cabinet specified by target_repname . To publish to the cabinet itself, skip this parameter.

    -mapping

    <mapping_id> : mapping identifier from the mappings.xml file.

    Default value : csds2documentum

    -bulk_resynch_

    interval

    <seconds> : Number of seconds between two successive synchronization events. For optimal performance, set this value to a number that correlates with the frequency of RealTime publishing sessions.

    Default value: 600

    -handlerset

    References the handlerset element from handlers.xml .

    Allowed value: csdatastore

    -create

    Specifies whether to create target repository.

    Allowed value: false

    -replic_mode

    Specifies which types of changes will be replicated.

    Allowed value: updated

    (only new and updated items will be replicated; deletions will not be replicated).


  3. Verify on Documentum the directory structure of the archived assets. For background information, see Section 63.1.2, "CS DataStore."

63.2.6 Archive Visitor-Generated Content

If your WebCenter Sites delivery system runs the WebCenter Sites: Community application and you wish to archive its visitor-generated content (comments and reviews), publish the content from the delivery system to a RealTime destination on a separate WebCenter Sites system (Figure 63-5). Procedures for archiving visitor-generated content are identical to those for archiving assets.

Figure 63-5 Archiving Visitor-Generated Content

Description of Figure 63-5 follows
Description of ''Figure 63-5 Archiving Visitor-Generated Content''

63.3 Testing Synchronization

After the publish command executes, the archive session ends and the synchronization engine starts monitoring the source CS DataStore. Verify that modifications are replicated to Documentum.

To verify that modifications are replicated to Documentum

  1. Create and modify assets on the published content management site.

  2. Republish the site to export the same <CS DataStore for Publishing Destination> folder.

  3. Look for updates in the target Documentum folder.

63.4 Tuning the Synchronization Process

When the publish command executes, the CS DataStore is archived to Documentum and catalog.xml is updated with data points from the publish command. The data points identify the CS DataStore and Documentum system (in the <workspace> tags) and specify replication settings for the CS DataStore (in the <replication> tag).

Following the archival session, the synchronization engine starts monitoring the published CS DataStore. The synchronization interval can be reset in catalog.xml. See BulkResynchInterval in Table 63-2, "Publishing Parameters". (The catalog.xml file is located in the conf folder.)

<workspace id="776a0536-1af8-4e10-9b55-ecb9cfd715b8">
  <provider-ref refid="7833d862-4f8b-4285-84f2-731d5af81865" />
    <init-params>
      <param name="repid">C:\cs\export\CIP_datastore\DCTM</param>
      <param name="repname"></param>
    </init-params>
</workspace>
<workspace id="ae679aa2-572c-492a-b553-b7a866b60a2f">
  <provider-ref refid="d7a96a63-e78c-407c-8d7f-e84988806e49" />
    <init-params>
      <param name="repname">Archive</param>
      <param name="path">/FirstSiteII</param>
      <param name="repid">0c0000018002bd87</param>
      <param name="itemid">0b0000018002bd91</param>
    </init-params>
</workspace>
<replication>
  <link id="b5be4bc4-a8dc-4928-b3df-e0ac2851f4e4">
  <source-ref refid="776a0536-1af8-4e10-9b55-ecb9cfd715b8" />
  <target-ref refid="ae679aa2-572c-492a-b553-b7a866b60a2f" />
  <mapping-ref refid="csds2documentum" />
  <handlerset-ref refid="csdatastore" />
    <init-params>
      <param name="BulkResynchInterval">3</param>
      <param name="ReplicMode">updated</param>
      <param name="IncrementalSyncDelay">10</param>
    </init-params>
  </link>
</replication>