You set output properties in the default.xml file. You can configure output to either an output file (the default) or to a Record Store instance.
The properties in the table below allow you to specify the attributes of a crawl output file, such as its name, location, and output type. The default name of the output file is endecaOut and it is a compressed binary file by default.
| Property Name | Description |
|---|---|
| output.file.directory | Directory name (default is workspace). Specifies the directory for the output file. The name is case-sensitive and is relative to where you run the crawl from. You can specify a multi-level path. Note that this setting can be overridden with the -w command-line flag. |
| output.file.name | File name (default is webcrawler-output). Specifies the filename of the output file. The name is case-sensitive. |
| output.file.is-xml | Boolean value (default is false). Specifies whether the output type is XML (true) or binary (false). XML is useful if you want to visually inspect the Endeca records after crawling. |
| output.file.is-compressed | Boolean value (default is true). Specifies whether to compress the Endeca records in a .gz file. Setting this property to true is useful when storing and transferring large files. |
| output.file.binary.file-size-max | Integer value (default is -1). Sets the maximum file size for binary output files. Output is written to a new file once the maximum size is reached. If the value is set to -1, no limits are imposed on the file size. |
| output.dom.include | Boolean value (default is false). Specifies whether to include the DOM for the Web page in the output Endeca records. |
| output.records.properties.excludes | Space-delimited list of output record properties (default is empty). Specifies the properties that should be excluded from the records. The names can be specified in a case-insensitive format. Note that wildcard names are not supported. |
| log.interval | Integer value in seconds (default is 60). Outputs crawl metrics information to the log every time this number of seconds has elapsed, per depth. |
| log.interval.summary | Integer value in seconds (default is 300). Outputs detailed crawl progress information (organized by host) every time this number of seconds has elapsed. |