For the first time that a crawl is run in a given workspace directory, the output file is named as described in the previous section.
For example, if you run a full crawl, the output filename might be endecaOut-sgmt000.bin.gz.
If you then run a second crawl (full or resumable), the Web Crawler works as follows:
A directory named
archiveis created under theoutputdirectory.The original
endecaOut-sgmt000.bin.gzfile is moved to thearchivedirectory and is renamed by adding a timestamp to the name; for example:endecaOut-20091015173554-sgmt000.bin.gz
The output file from the second run is named
endecaOut-sgmt000.bin.gzand is stored in theoutputdirectory.For every subsequent crawl using the same workspace directory, steps 2 and 3 are repeated.
The timestamp format used for renaming is:
YYYYMMDDHHmmSS
where:
Note that the timestamp format is hard-coded and cannot be reconfigured.

