Oracle Commerce Guided Search - Running the sample Web Crawler

Running the sample Web Crawler

The sample Web Crawler application writes output to a Record Store instance.

The sample Web Crawler application is located in the sample\webcrawler-to-recordstore directory. The directory contains the run-sample scripts that runs the sample Web Crawler.

The directory also contains a recordstore-configuration.xml file that is configured for records produced by the Web Crawler. In particular, the file has these two configuration properties:

<changePropertyNames/>
<idPropertyName>Endeca.Id</idPropertyName>

Setting the idPropertyName is important because the Record Store instance generates a unique record ID based on the property value.

The sample Web Crawler is configured to write its output directly to a Record Store instance. Specifically, the site.xml file in the conf directory has these three output properties:

<property>
	   <name>output.recordStore.host</name>
	   <value>localhost</value>
	   <description>
	   The host of the record store service.
	   Default: localhost
	   </description>
</property>

<property>
	   <name>output.recordStore.port</name>
	   <value>8500</value>
	   <description>
	   The port of the record store service.
	   Default: 8500
	   </description>
</property>

<property>
	   <name>output.recordStore.instanceName</name>
	   <value>rs-web</value>
	   <description>
	   The name of the record store service.
	   Default: rs-web
	   </description>
</property>

Be sure to change the values if you create a Record Store instance with a different host name and port.

To run the sample Web Crawler:

Start the CAS Service if it is not already running.
- Windows: Start the CAS Service from the Windows Services console.
- UNIX: Run the cas-service.sh script.
From a command prompt window, change to the sample\webcrawler-to-recordstore directory.
Run the run-sample script.

When the Web Crawler finishes, its output is written to the Record Store, instead of to a file on disk. If you check cas-service.log, you should see these messages similar to this example:

Starting new transaction with generation Id 1
Started transaction 1 of type READ_WRITE
Marking generation committed: 1
Committed transaction 1

In the example, the Record Store is storing the record generation with an ID of 1.

Running the sample Web Crawler

Content Acquisition System Developer's Guide