Running the sample Change Tracking data source

The sample Change Tracking data source illustrates an implementation of the IncrementalDataSourceRuntime interface. This interface provides support to check whether a full acquisition is required from the Change Tracking data source. If a full acquisition is not required, then the data source provides an implementation of runIncrementalAcquisition() to acquire only the changed records.

After you install the extensions into the Integrator Acquisition System, you can configure and then run the sample Change Tracking data source.

To run the sample Change Tracking data source:

  1. In your IAS installation, locate the default crawl configuration files provided in <install path>\IAS\<version>\sample\crawlConfigFiles.
  2. Make a copy of fileSystemCrawl.xml, rename the file with a unique name for your environment, and save it to a local directory.

    For example, copy fileSystemCrawl.xml and save it as sampleChgTracking.xml within <install path>\IAS\<version>\sample\crawlConfigFiles.

  3. Open the new crawl configuration file in a text editor.
  4. Configure the following settings:
    Option Description
    crawlId Specify a unique name to distinguish the crawl from others in IAS. For example, sampleChgTracking.
    moduleId Specify the module ID for a Change Tracking data source. This value must be set to com.endeca.ias.extension.sample.datasource.incremental.ChangeTrackingDataSource.
  5. Create the following settings as new moduleProperty elements in the sourceConfig XML:
    Option Description
    path Specify the path to the documents: <install path>\IAS\<version>\sample\ias-extensions\data\change-tracking-db.xml.
  6. Delete the other moduleProperty elements within sourceConfig.
    After these steps, the sourceConfig XML should be as follows:
    ...
    		<sourceConfig>
    			<moduleId>
    				<id>com.endeca.ias.extension.sample.datasource.incremental.ChangeTrackingDataSource</id>
    			</moduleId>
    			<moduleProperties>
    				<moduleProperty>
    					<key>path</key>
    					<value>C:\Oracle\Endeca\IAS\3.1.0\sample\ias-extensions\data\change-tracking-db.xml</value>
    				</moduleProperty>
    			</moduleProperties>
    			<excludeFilters />
    			<includeFilters />
    		</sourceConfig>
    ...
  7. In the textExtractionConfig XML, change the value of enabled to false.
  8. Configure the following settings within the outputConfig XML:
    Option Description
    moduleId Specify the output type for a crawl. Specify an id of Record Store.
    host Specify the fully qualified name of the host running the Record Store instance. The default value is localhost.
    port Specify the port of the Endeca IAS Service running the Record Store instance. The default value is 8401.
    contextPath If you installed IAS into WebLogic Server, and you modified the default WebLogic context path, then specify the revised context path without including a forward slash. In WebLogic Server installations, the default value of contextPath is ias-server. If you installed IAS into Jetty, you can remove contextPath or specify an empty value.

    For example:

    <moduleProperty>

    <key>contextPath</key>

    <value>ias-server</value>

    </moduleProperty>

  9. Delete the other moduleProperty elements within outputConfig.
    For example, after these steps, the outputConfig XML should be similar to the following:
    ...
    		<outputConfig>
    			<moduleId>
    				<id>Record Store</id>
    			</moduleId>
    			<moduleProperties>
    				<moduleProperty>
    					<key>host</key>
    					<value>mymachine.endeca.com</value>
    				</moduleProperty>
    				<moduleProperty>
    					<key>port</key>
    					<value>8401</value>
    				</moduleProperty>
    			</moduleProperties>
    		</outputConfig>
    ...
  10. Save and close the crawl configuration file.
  11. Run the createCrawls task of ias-cmd to upload the crawl configuration file to IAS.
    For example:
    C:\Oracle\Endeca\IAS\3.1.0\bin>ias-cmd.bat createCrawls -f C:\Oracle\Endeca\IAS\
    3.1.0\sample\crawlConfigFiles\sampleChgTracking.xml
    Created crawl sampleChgTracking
  12. Run the startCrawl task of ias-cmd to acquire data from the sample documents.
    For example:
    C:\Oracle\Endeca\IAS\3.1.0\bin>ias-cmd.bat startCrawl -id sampleChgTracking
  13. After the crawl has completed, you can confirm that the new records exist in the Record Store instance by running the read-baseline task of recordstore-cmd. In this sample, IAS Server created 3 Endeca records.
    For example:
    C:\Oracle\Endeca\IAS\3.1.0\bin>recordstore-cmd.bat read-baseline -a sampleChgTracking
    [Endeca.Id=1, Endeca.Action=UPSERT, Endeca.SourceId=sampleChgTracking, DATA=base
     line data...]
    [Endeca.Id=3, Endeca.Action=UPSERT, Endeca.SourceId=sampleChgTracking, DATA=some
     incremental data...]
    [Endeca.Id=5, Endeca.Action=UPSERT, Endeca.SourceId=sampleChgTracking, DATA=some
     incremental data...]
  14. Navigate to <install path>\IAS\<version>\sample\ias-extensions\data and open change-tracking-db.xml in a text editor.
  15. Update one record in the change-tracking-db.xml file by doing the following:
    1. Add a new <changeHistory> entry to the file as shown in the example below.
    2. Ensure that the <key> value corresponds to an existing <row> entry in the <database>.
    3. Modify the <time> value to indicate a time after the acquisition in step 12 and before the current time. (The time is expressed in UTC format. See http://www.w3.org/TR/NOTE-datetime for guidance about the syntax.)
    For example:
    <changeHistory>
            <key>5</key>
            <changeType>UPDATE</changeType>
            <time>2010-02-02T19:19:43.471-05:00</time>
    </changeHistory>

    Acquiring data from this file results in an incremental update to record 5.

  16. Add one record in the change-tracking-db.xml file by doing the following:
    1. Add a <row> entry to the file and ensure the <key> value is unique, as shown:
      <row> 
              <key>7</key> 
              <data>some incremental data...</data> 
      </row>
    2. Add a <changeHistory> entry for the <row> as shown:
      <changeHistory>
              <key>7</key>
              <changeType>CREATE</changeType>
              <time>2010-02-02T19:19:43.471-05:00</time>
      </changeHistory>
    3. Ensure that the <key> value corresponds to a <row> entry in the <database>.
    4. Ensure that the <changeType> value is set to CREATE.
    5. Modify the <time> value to indicate a time after the acquisition in step 12 and before the current time. (The time is expressed in UTC format. See http://www.w3.org/TR/NOTE-datetime for guidance about the syntax.)
    Acquiring data from this file results in an incremental change that adds record 7.
  17. Delete one record in the change-tracking-db.xml file by doing the following:
    1. Add a <changeHistory> entry for the <row> that has been removed as shown:
      <changeHistory>
              <key>8</key>
              <changeType>DELETE</changeType>
              <time>2010-02-02T19:19:43.471-05:00</time>
      </changeHistory>
    2. Ensure that the <key> value corresponds to a <row> that does not exist in the <database>.
    3. Ensure that the <changeType> value is set to DELETE.
    4. Modify the <time> value to indicate a time after the acquisition in step 12 and before the current time. (The time is expressed in UTC format. See http://www.w3.org/TR/NOTE-datetime for guidance about the syntax.)
    Acquiring data from this file results in an incremental change that removes record 8.
  18. Save and close change-tracking-db.xml.
  19. Run the startCrawl task of ias-cmd to acquire data from the revised sample documents.
    For example:
    C:\Oracle\Endeca\IAS\3.1.0\bin>ias-cmd.bat startCrawl -id sampleChgTracking
  20. After the crawl has completed, you can confirm that IAS Server updated, added, and deleted the records you modified by running the read-baseline task of recordstore-cmd.