The
parse-plugins.xml
file provides mappings of MIME
types to parsers.
The
mime-types.xml
file has two
purposes:
Note that the name of this file is specified to the Web Crawler via
the
parse.plugin.file
property in the
default.xml
configuration file.
This entry from the file shows how these parsing rules are set:
<mimeType name="text/xml"> <plugin id="parse-html" /> <plugin id="endeca-searchexport-converter-parser" /> </mimeType>
In this entry, the HtmlParser plugin is first invoked for a
text/xml
MIME type. If that plugin is successful,
the parsing is finished. If it is unsuccessful, then the
endeca-searchexport-converter-parser plugin is invoked.
Note that this entry:
<mimeType name="*"> <plugin id="endeca-searchexport-converter-parser" /> </mimeType>
indicates that the endeca-searchexport-converter-parser plugin is invoked for any unmatched MIME type.
In general, you should not modify the contents of this file unless you have written your own parser plugin.