Oracle Commerce Guided Search - Archived output files

Archived output files

For the first time that a crawl is run in a given workspace directory, the output file is named as described in the previous section. For example, if you run a full crawl, the output filename might be endecaOut-sgmt000.bin.gz. If you then run a second crawl (full or resumable), the Web Crawler works as follows:

A directory named archive is created under the output directory.
The original endecaOut-sgmt000.bin.gz file is moved to the archive directory and is renamed by adding a timestamp to the name; for example:
```
endecaOut-20091015173554-sgmt000.bin.gz
```
The output file from the second run is named endecaOut-sgmt000.bin.gz and is stored in the output directory.
For every subsequent crawl using the same workspace directory, steps 2 and 3 are repeated.

The timestamp format used for renaming is:

YYYYMMDDHHmmSS

YYYY is a four-digit year, such as 2009.
MM is the month as a number (01-12), such as 10 for October.
DD is the day of the month, such as 15 (for October 15th).
HH is the hour of the day in a 24-hour format (00-23), such as 17 (for 5 p.m.).
mm is the minute of the hour (00-59).
SS is the second of the minute (00-59).

Note that the timestamp format is hard-coded and cannot be reconfigured.

Copyright © Legal Notices