The Deployment Template provides support for running CAS crawls
with the CAS Server Component. A CAS Server component is implemented as a
custom-component
. You configure the component according to
the output type of a crawl. The sections below describe the common
configuration properties, the output-type configuration properties, and then
provide examples for each output type including Record Store output,
MDEX-compatible output, and record file output.
Note
The Deployment Template cannot create a new CAS crawl. You create a crawl using CAS and run it using the Deployment Template. For details about creating a crawl, see the Content Acquisition System Developer's Guide.
The
custom-component
configuration properties identify the
CAS server in the Servers/hosts section of
AppConfig.xml
. The properties are defined as
follows:
The common configuration properties describe the host and port running CAS. The properties are defined as follows:
casHost
- Host name of the server on which the Content Acquisition System is running.casPort
- Port on which the Endeca CAS Service listens. If the application is running in SSL mode, thecasPort
is the SSL port of the Endeca CAS Service The port number must match thecom.endeca.cas.port
value that is used in the CAS Service configuration script. Or, if the Endeca CAS Service is configured for SSL, then the port number must matchcom.endeca.cas.ssl.port
value. The configuration script is in<install path>\CAS\workspace\conf\jetty.xml
.httpSocketTimeout
is the maximum period of inactivity in milliseconds between two consecutive data packets before http times out.
The configuration properties for MDEX-compatible output are defined as follows:
numPartialsBackups
- Indicates the number of backups to keep for the cumulative partials directory (cumulativePartialsDir
). If this property is not configured, then no backups are retained.cumulativePartialsDir
- Indicates the directory on the CAS host where partial MDEX output should be accumulated. This allows partial updates to be reapplied in the event of a failure while applying partial updates.numDvalIdMappingsBackups
- Indicates the number of backups to keep for the dimension value ID mappings file. This allows you to restore dimension value ID mappings if the CAS host fails. If this property is not configured, then five backups are retained. If set to zero, then no backing up is performed.dvalIdMappingsArchiveDir
- Indicates the directory where the dimension value ID mappings files are stored. If this property is not configured, then mappings are written to./data/dvalid_mappings_archive
. However, to provide more secure backups, Oracle recommends that you specify a network drive that is available to CAS but not the same as the CAS host.
This example CAS Server component is configured for MDEX-compatible output:
<!-- ######################################################################## # Content Acquisition System Server # --> <custom-component id="CAS" host-id="ITLHost" class="com.endeca.eac.toolkit.component.cas.ContentAcquisitionServerComponent"> <properties> <property name="casHost" value="localhost" /> <property name="casPort" value="8500" /> <property name="httpSocketTimeout" value="180000" /> <property name="numPartialsBackups" value="5" /> <property name="numDvalIdMappingsBackups" value="5" /> </properties> <directories> <directory name="cumulativePartialsDir">./data/partials/cumulative_partials</directory> <directory name="dvalIdMappingsArchiveDir">./data/dvalid_mappings_archive</directory> </directories> </custom-component>
There are no additional configuration properties required for crawls
that write to a Record Store instance. Only the
custom-component
and common configuration properties
are required.
This example CAS Server component is configured for Record Store output:
<!-- ######################################################################## # Content Acquisition System Server # <custom-component id="CAS" host-id="CASHost" class="com.endeca.eac.toolkit.component.cas.ContentAcquisitionServerComponent"> <properties> <property name="casHost" value="localhost" /> <property name="casPort" value="8500" /> <property name="httpSocketTimeout" value="180000" /> </properties> </custom-component> -->
The configuration properties are defined as follows:
casCrawlFullOutputDestDir
- Indicates the destination directory to which the crawl output file will be copied after a baseline crawl. Note that this is not the directory to which the CAS crawl writes its output; that output directory is set as part of the crawl configuration.casCrawlIncrementalOutputDestDir
- Indicates the destination directory to which the crawl output file will be copied after an incremental crawl. As with the previous property, this is not the directory to which the CAS crawl writes its output. If you run incremental crawls, the default settings assume that the output format will be compressed binary files.casCrawlOutputDestHost
- Indicates the ID of the host on which the destination directories (specified by the previous two properties) reside.
This example CAS Server component is configured for a record file output:
<!-- ######################################################################## # Content Acquisition System Server # --> <custom-component id="CAS" host-id="CASHost" class="com.endeca.soleng.eac.toolkit.component.ContentAcquisitionServerComponent"> <properties> <property name="casHost" value="localhost" /> <property name="casPort" value="8500" /> <property name="httpSocketTimeout" value="180000" /> <property name="casCrawlFullOutputDestDir" value="./data/complete_cas_crawl_output/full" /> <property name="casCrawlIncrementalOutputDestDir" value="./data/complete_cas_crawl_output/incremental" /> <property name="casCrawlOutputDestHost" value="CASHost" /> </properties> </custom-component>