Overview of the plugin.xml file

The plugin.xml file describes the plugin to the Web Crawler. The file resides in the plugin directory along with the JAR file.

The following is the plugin.xml file that is included with the HTMLMetatagFilter project:
<?xml version="1.0" encoding="UTF-8"?>
<plugin
id="filter-htmlmetatags"
name=""
version="1.0"
provider-name="com.endeca.eidi.web">
<runtime>
<library name="filter-htmlmetatags.jar">
<export name="*"/>
</library>
</runtime>
<requires>
<import plugin="nutch-extensionpoints"/>
</requires>
<extension id="com.endeca.eidi.web.parse.HTMLMetatagFilter"
name="HTML Metatag filter"
point="org.apache.nutch.parse.ParseFilter">
<implementation id="filter-htmlmetatags"
class="com.endeca.eidi.web.parse.HTMLMetatagFilter">
</implementation>
</extension>
</plugin>
The file defines the name of the JAR (filter-htmlmetatags.jar), the name of the extension point (ParseFilter), and the name of the implementing class (HTMLMetatagFilter). It also sets the ID of the plugin (with the <plugin id> attribute); you set this ID in the configuration file, as shown later.