The plugin.xml
file describes the plug-in to the Web Crawler. The file resides in the plug-in directory along with the JAR file.
The following is the plugin.xml
file that is included with the HTMLMetatagFilter
project:
<?xml version="1.0" encoding="UTF-8"?> <plugin id="filter-htmlmetatags" name="" version="1.0" provider-name="com.endeca.itl.web"> <runtime> <library name="filter-htmlmetatags.jar"> <export name="*"/> </library> </runtime> <requires> <import plugin="nutch-extensionpoints"/> </requires> <extension id="com.endeca.itl.web.parse.HTMLMetatagFilter" name="HTML Metatag filter" point="org.apache.nutch.parse.ParseFilter"> <implementation id="filter-htmlmetatags" class="com.endeca.itl.web.parse.HTMLMetatagFilter"> </implementation> </extension> </plugin>
The file defines the name of the JAR (filter-htmlmetatags.jar
), the name of the extension point (ParseFilter
), and the name of the implementing class (HTMLMetatagFilter
). It also sets the ID of the plug-in (with the <plugin id>
attribute); you set this ID in the configuration file, as shown later.