Running a sample Web crawl that writes to a Record Store

In this topic, you run a sample Web crawl that writes output to a Record Store instance instead of to a file on disk. This sample is stored in <install path>\IAS\<version>\sample\webcrawler-to-recordstore. The run-sample script runs the sample Web Crawler.

The directory also contains a recordstore-configuration.xml file that configures the Record Store to generate a unique record IDs based on the value of idPropertyName.

The site.xml file, in the <install path>\IAS\<version>\sample\webcrawler-to-recordstore\conf directory, has the following output properties that specify the Record Store information:
<property>
	<name>output.recordStore.host</name>
	<value>localhost</value>
	<description>
	The host of the record store service.
	Default: localhost
	</description>
</property>

<property>
	<name>output.recordStore.port</name>
	<value>8401</value>
	<description>
	The port of the record store service.
	Default: 8401
	</description>
</property>

<property>
	<name>output.recordStore.contextPath</name>
	<value></value>
	<description>
	The context path of the record store service. If the property is not set, the value is empty (i.e. the root context path).
	</description>
</property>

<property>
	<name>output.recordStore.instanceName</name>
	<value>rs-web</value>
	<description>
	The name of the record store service.
	Default: rs-web
	</description>
</property>
Be sure to change the host and port values to reflect the host running the Endeca IAS Service. If you are running the Endeca Web Crawler on WebLogic Server, ensure that the output.recordStore.contextPath setting is correct. If you are running the Endeca Web Crawler on Jetty, then leave output.recordStore.contextPath empty.

To run the sample Web crawl:

  1. Open a command prompt window.
  2. Change to the <install path>\IAS\<version>\sample\webcrawler-to-recordstore directory.
  3. Run the run-sample script.
When the Web Crawler finishes, the output is written to the Record Store, instead of to a file on disk. If you check IAS\workspace\ias-service.log, you should see these messages similar to this example:
Starting new transaction with generation Id 1
Started transaction 1 of type READ_WRITE
Marking generation committed: 1
Committed transaction 1

In the example, the Record Store is storing the record generation with an ID of 1.