24 Enabling Publishing-Triggered Site Capture

You have the option to enable publishing-triggered site capture once the Site Capture application is installed.

This chapter contains the following topics:

24.1 Integrating Site Capture with Oracle WebCenter Sites' Publishing Process

Site capture can be triggered by the completion of a RealTime publishing session. If you wish to enable publishing-triggered site capture, complete the steps in this section after you have installed the Site Capture application (Chapter 23, "Procedures for Installing Site Capture"). The steps show you how to integrate the Site Capture application with the WebCenter Sites publishing system to enable its communication with Site Capture. For some of the possible system configurations, see Figure 21-2, "Single-Server Installation Enabled for Publishing-Triggered Site Capture" and Figure 21-3, "Clustered Installation Enabled for Publishing-Triggered Site Capture".

To integrate Site Capture with WebCenter Sites' publishing process

  1. On the WebCenter Sites source system:

    1. Deploy the fw-crawler-publish-listener-1.0.jar file to the <cs_deploy>/WEB-INF/lib folder.

    2. Unzip the fw-crawler-publish-listener-1.0-elements.zip file and import FW_PublishingEventRegistry.html using CatalogMover.

      This step creates a RemoteElementInvokingPublishingEventListener record in the FW_PublishingEventRegistry table in the WebCenter Sites database, which will allow publish events to call the InvokeCrawler element on the WebCenter Sites target system.

    3. Restart the WebCenter Sites source system.

  2. On the WebCenter Sites target system:

    1. Using CatalogMover, import ElementCatalog.html and SiteCatalog.html from the unzipped fw-crawler-publish-listener-1.0-elements.zip file extracted in step 1b above (on the WebCenter Sites source system).

      This step imports the InvokeCrawler.jsp, which is used to start the crawler(s) in the Site Capture application.

      Note:

      The crawler(s) must be defined in the publishing destination definition for Site Capture and in the Site Capture application. For more information, see Section 24.2, "Next Step."
    2. Copy the crawler.properties file (in the /<cs_deploy>/WEB-INF/classes folder) and configure the properties listed below:

      • sc.url: Do one of the following:

        For a single-server installation, specify the URL of the Site Capture application:

        sc.url=http://<sitecapturehost:sitecaptureport>/__admin

        For a clustered installation, specify the URL of the load balancer:

        sc.url=http://<loadbalancerhost:loadbalancer>/__admin

      • cas.url=http://<cashost:casport>/cas

        Specify the CAS application that is pointed to by the Site Capture application:

      • cs.username=<RestAdmin User>

        Specify the user name of the WebCenter Sites general administrator exactly as it was specified during the Site Capture installation process:

      • cs.password=<Password>

        Specify the above user's password exactly as it was specified during the Site Capture installation process:

    3. Deploy the fw-crawler-publish-listener-1.1.jar file to the <cs_deploy>/WEB-INF/lib folder on the target WebCenter Sites system.

  3. You have completed the integration process. Continue to Section 24.2, "Next Step" for a summary on setting up a publishing-triggered site capture operation.

24.2 Next Step

At this point, you have completed integrating Site Capture with the WebCenter Sites RealTime publishing process. However, for publishing-triggered site capture to work, the following conditions must also be satisfied:

  • A RealTime publishing destination definition must be configured on the source system to name the crawler(s) that will be invoked to capture the newly published site. The definition must also specify the crawlers' capture mode.

  • The crawler(s) named in the step above must exist in the Site Capture application. In addition, the CrawlerConfigurator.groovy file for each crawler must specify at least a valid starting URI and link extraction logic for the crawler.

    Note:

    Once the above configuration steps are completed, you will publish the site from the source WebCenter Sites system to the target WebCenter Sites system. When publishing ends, site capture begins and proceeds as follows:
    1. The source WebCenter Sites system calls the InvokeCrawler element on the target system.

    2. The target WebCenter Sites system communicates with the Site Capture application and invokes the crawler(s).

    3. The WebCenter Sites target system communicates crawler invocation status to the WebCenter Sites source system. Both the source and target systems record the status information in their own log files (futuretense.txt, by default).

    At the same time, the invoked crawlers capture site resources either statically or in archive mode, depending on your settings in the publishing destination definition.

When setting up publishing-triggered site capture, you can configure as many publishing destination definitions and invoke as many crawlers as necessary. When you are ready to proceed with the configuration steps above, refer to the Oracle Fusion Middleware WebCenter Sites Administrator's Guide for instructions. In the same guide, you will find information about navigating the Site Capture interface, setting up a site capture operation, and coding a crawler's configuration file to control the site capture process.