The name of a crawl output file is set by the output.file.name
property in the default.xml
configuration file (which can be overridden by the site.xml
file). Assuming the default name of endecaOut
, the full name of the output file depends on the configuration settings
:
For compressed binary files (the default),
endecaOut-sgmt000.bin.gz
will be the name. If more than one output file is generated, the second file will beendecaOut-sgmt001.bin.gz
, and so on.For uncompressed binary files,
endecaOut-sgmt000.bin
will be the name of the first file,endecaOut-sgmt001.bin
for the second file, and so on.For XML files, the name will be either
endecaOut.xml.gz
(if compression is specified) orendecaOut.xml
(if compression is turned off). Note that unlike the binary format, only one XML file is output, regardless of its size.
The format of the file is set with the output.file.is-xml
property, while the output.file.is-compressed
property turns compression on or off.