The plugin.xml file describes the plug-in to the Web Crawler. The file resides in the plug-in directory along with the JAR file.

The following is the plugin.xml file that is included with the HTMLMetatagFilter project:

<?xml version="1.0" encoding="UTF-8"?>
<plugin
id="filter-htmlmetatags"
name=""
version="1.0"
provider-name="com.endeca.itl.web">
<runtime>
<library name="filter-htmlmetatags.jar">
<export name="*"/>
</library>
</runtime>
<requires>
<import plugin="nutch-extensionpoints"/>
</requires>
<extension id="com.endeca.itl.web.parse.HTMLMetatagFilter"
name="HTML Metatag filter"
point="org.apache.nutch.parse.ParseFilter">
<implementation id="filter-htmlmetatags"
class="com.endeca.itl.web.parse.HTMLMetatagFilter">
</implementation>
</extension>
</plugin>

The file defines the name of the JAR (filter-htmlmetatags.jar), the name of the extension point (ParseFilter), and the name of the implementing class (HTMLMetatagFilter). It also sets the ID of the plug-in (with the <plugin id> attribute); you set this ID in the configuration file, as shown later.


Copyright © Legal Notices