The sample Web Crawler application writes output to a Record Store instance.
The sample Web Crawler application is located in the
sample\webcrawler-to-recordstore
directory.
The directory contains the
run-sample
scripts that runs the sample Web
Crawler.
The directory also contains a
recordstore-configuration.xml
file that is
configured for records produced by the Web Crawler. In particular, the file has
these two configuration properties:
<changePropertyNames/> <idPropertyName>Endeca.Id</idPropertyName>
Setting the
idPropertyName
is important because the Record Store
instance generates a unique record ID based on the property value.
The sample Web Crawler is configured to write its output directly to a
Record Store instance. Specifically, the
site.xml
file in the
conf
directory has these three output properties:
<property> <name>output.recordStore.host</name> <value>localhost</value> <description> The host of the record store service. Default: localhost </description> </property> <property> <name>output.recordStore.port</name> <value>8500</value> <description> The port of the record store service. Default: 8500 </description> </property> <property> <name>output.recordStore.instanceName</name> <value>rs-web</value> <description> The name of the record store service. Default: rs-web </description> </property>
Be sure to change the values if you create a Record Store instance with a different host name and port.
To run the sample Web Crawler:
When the Web Crawler finishes, its output is written to the Record
Store, instead of to a file on disk. If you check
cas-service.log
, you should see these messages
similar to this example:
Starting new transaction with generation Id 1 Started transaction 1 of type READ_WRITE Marking generation committed: 1 Committed transaction 1
In the example, the Record Store is storing the record generation with an ID of 1.