The parse-plugins.xml file provides mappings of MIME types to parsers.

The mime-types.xml file has two purposes:

Note that the name of this file is specified to the Web Crawler via the parse.plugin.file property in the default.xml configuration file.

This entry from the file shows how these parsing rules are set:

<mimeType name="text/xml">
   <plugin id="parse-html" />
   <plugin id="endeca-searchexport-converter-parser" />
</mimeType>

In this entry, the HtmlParser plugin is first invoked for a text/xml MIME type. If that plugin is successful, the parsing is finished. If it is unsuccessful, then the endeca-searchexport-converter-parser plugin is invoked.

Note that this entry:

<mimeType name="*">
   <plugin id="endeca-searchexport-converter-parser" />
</mimeType>

indicates that the endeca-searchexport-converter-parser plugin is invoked for any unmatched MIME type.

In general, you should not modify the contents of this file unless you have written your own parser plugin.


Copyright © Legal Notices