The plugin.xml file describes the plugin to the Web Crawler. The file resides in the plugin directory along with the JAR file.
<?xml version="1.0" encoding="UTF-8"?> <plugin id="filter-htmlmetatags" name="" version="1.0" provider-name="com.endeca.eidi.web"> <runtime> <library name="filter-htmlmetatags.jar"> <export name="*"/> </library> </runtime> <requires> <import plugin="nutch-extensionpoints"/> </requires> <extension id="com.endeca.eidi.web.parse.HTMLMetatagFilter" name="HTML Metatag filter" point="org.apache.nutch.parse.ParseFilter"> <implementation id="filter-htmlmetatags" class="com.endeca.eidi.web.parse.HTMLMetatagFilter"> </implementation> </extension> </plugin>The file defines the name of the JAR (filter-htmlmetatags.jar), the name of the extension point (ParseFilter), and the name of the implementing class (HTMLMetatagFilter). It also sets the ID of the plugin (with the <plugin id> attribute); you set this ID in the configuration file, as shown later.