In the context of an Endeca Crawler application, Endeca records represent both the data in the source documents and metadata about the source documents.
Crawlers provide the means to get records representing the source documents into a pipeline. The source documents themselves may reside on a file system, HTTP, or HTTPS host and be in a wide variety of file formats (common examples include PDF, HTML, DOC, and TXT). As with non-crawler pipelines, the source documents themselves are not modified in any way by the pipeline processing.
In the following example, the Endeca Crawler crawls an HTML source document from the Endeca Web site. The document has a title, text that describes Endeca solutions, and links to other areas of the Web site.
During Data Foundry processing, Forge generates an Endeca record for the document. Among other things, that record contains the body of the source document and the title of the document.
Suppose only the properties for the body of the source document and the title of the document are mapped using the property mapper. The Endeca record for this document looks like this:
Although this example is useful for illustrative purposes, such a record is not very useful to application users. Here, it shows the simplest relationship between a source document and an Endeca record with two properties (title and text). An application for users is not likely to have all of a document's data contained in a single property.
In a more user-oriented application, a crawler pipeline might include Perl code to parse properties from the document text and use those to build Endeca properties and dimensions. Alternatively, the pipeline might build dimensions based on any of the metadata properties that are generated for a record.
Suppose the example mapped several more properties. In addition to title and text, it might also map metadata properties. From the metadata properties available, the pipeline could be set up to expose properties such as encoding, date modified, application type, fetch status, and so, for use as dimensions. These properties would be mapped with a property mapper component to provide both record details and navigation controls in the application.
Re-running a baseline to map all available properties would produce an Endeca record that looks like this:
To build a record page that displays all properties, the property mapper must be configured to map all of the source properties to Endeca properties.