Oracle Commerce Guided Search - How archive files are handled

How archive files are handled

The following is a detailed view of how the CAS Server handles archive files:

An Endeca record is created for the archive file itself. This record has the Endeca.File.IsArchive property set to true.
In addition to the top-level documents (files or directories), nested archive files are also processed.
Document conversion (if enabled) is performed on all files within the archive, in accordance with document conversion filtering.
A separate Endeca record is created for each document (including nested archives) found in the archive. The record is processed as follows:
Note
In the case of nested archives, the Endeca.File.PathWithinSourceArchive property takes the following format:
```
//path/to/nested/archive//path/within/nested/archive
```
While the properties of archived entries are obtained in an Endeca record, the entries themselves are not physically extracted from the archive (that is, no new files are permanently saved to disk).
If an archive has entries with identical names, the first entry that is processed is kept (that is, an Endeca record is created for it) and the duplicate entry is ignored.
Seeds are restricted to actual files or directories or entries. That is, seeds cannot point to archived files or directories.

The above behavior is the default for all archives crawled. To avoid processing archives, disable the Expand archives option for the data source.

Copyright © Legal Notices