[/map {"- map/map "}) [/map/topicref {"- map/topicref "}) [/map/topicref/topicmeta {"- map/topicmeta "}) [/map/topicref/topicmeta/navtitle {"- topic/navtitle "}) Running the Sample Web Crawler Plugin (navtitle][/map/topicref/topicmeta/linktext {"- map/linktext "}) Running the Sample Web Crawler Plugin (linktext][/map/topicref/topicmeta/shortdesc {"- map/shortdesc "}) This section provides instructions for running the sample Web Crawler plugin, a custom parse filter plugin that adds HTML meta tags as additional properties to the output records. (shortdesc] (topicmeta][/map/topicref/topicref {"- map/topicref "}) [/map/topicref/topicref/topicmeta {"- map/topicmeta "}) [/map/topicref/topicref/topicmeta/navtitle {"- topic/navtitle "}) About the Web Crawler plugin framework (navtitle][/map/topicref/topicref/topicmeta/linktext {"- map/linktext "}) About the Web Crawler plugin framework (linktext][/map/topicref/topicref/topicmeta/shortdesc {"- map/shortdesc "}) The Endeca Web Crawler is based on the Apache Nutch open-source project.そのため、主要な機能はプラグインとして実装されます。Its framework allows you to write your own plugins, such as plugins that extract additional content from Web pages. (shortdesc] (topicmeta][/map/topicref/topicref/topicref {"- map/topicref "}) [/map/topicref/topicref/topicref/topicmeta {"- map/topicmeta "}) [/map/topicref/topicref/topicref/topicmeta/navtitle {"- topic/navtitle "}) How the Web Crawler processes URLs (navtitle][/map/topicref/topicref/topicref/topicmeta/linktext {"- map/linktext "}) How the Web Crawler processes URLs (linktext][/map/topicref/topicref/topicref/topicmeta/shortdesc {"- map/shortdesc "}) Knowing how the Web Crawler processes URLs helps you understand where a new plugin fits in, because the URL processing is accomplished by a series of plugins. (shortdesc] (topicmeta] (topicref] (topicref][/map/topicref/topicref {"- map/topicref "}) [/map/topicref/topicref/topicmeta {"- map/topicmeta "}) [/map/topicref/topicref/topicmeta/navtitle {"- topic/navtitle "}) About the sample custom filter plugin (navtitle][/map/topicref/topicref/topicmeta/linktext {"- map/linktext "}) About the sample custom filter plugin (linktext][/map/topicref/topicref/topicmeta/shortdesc {"- map/shortdesc "}) Custom filters (ParseFilter) implement content extensions.These filters can examine the contents of a page (either the raw page contents or the parsed DOM) and add additional properties to records that are produced. (shortdesc] (topicmeta] (topicref][/map/topicref/topicref {"- map/topicref "}) [/map/topicref/topicref/topicmeta {"- map/topicmeta "}) [/map/topicref/topicref/topicmeta/navtitle {"- topic/navtitle "}) Adding a custom plugin to the Endeca Web Crawler (navtitle][/map/topicref/topicref/topicmeta/linktext {"- map/linktext "}) Adding a custom plugin to the Endeca Web Crawler (linktext][/map/topicref/topicref/topicmeta/shortdesc {"- map/shortdesc "}) This topic provides an overview of how to add a custom plugin to the Endeca Web Crawler. (shortdesc] (topicmeta] (topicref][/map/topicref/topicref {"- map/topicref "}) [/map/topicref/topicref/topicmeta {"- map/topicmeta "}) [/map/topicref/topicref/topicmeta/navtitle {"- topic/navtitle "}) Opening the sample plugin project (navtitle][/map/topicref/topicref/topicmeta/linktext {"- map/linktext "}) Opening the sample plugin project (linktext][/map/topicref/topicref/topicmeta/shortdesc {"- map/shortdesc "}) For the purpose of this sample, you load the sample parse filter plugin project.If you were creating your own plugin, you would create your own Eclipse project. (shortdesc] (topicmeta] (topicref][/map/topicref/topicref {"- map/topicref "}) [/map/topicref/topicref/topicmeta {"- map/topicmeta "}) [/map/topicref/topicref/topicmeta/navtitle {"- topic/navtitle "}) Overview of the sample HTMLMetatagFilter plugin (navtitle][/map/topicref/topicref/topicmeta/linktext {"- map/linktext "}) Overview of the sample HTMLMetatagFilter plugin (linktext][/map/topicref/topicref/topicmeta/shortdesc {"- map/shortdesc "}) For the purpose of this sample, we use the source for the HTMLMetatagFilter class that is in the HTMLMetatagFilter.java source file (in the IAS\<version>\sample\custom-web-crawler-plugin\src directory).If you were writing your own plugin, you would write the code for your custom plugin. (shortdesc] (topicmeta] (topicref][/map/topicref/topicref {"- map/topicref "}) [/map/topicref/topicref/topicmeta {"- map/topicmeta "}) [/map/topicref/topicref/topicmeta/navtitle {"- topic/navtitle "}) Overview of the plugin.xml file (navtitle][/map/topicref/topicref/topicmeta/linktext {"- map/linktext "}) Overview of the plugin.xml file (linktext][/map/topicref/topicref/topicmeta/shortdesc {"- map/shortdesc "}) The plugin.xml file describes the plugin to the Web Crawler.The file resides in the plugin directory along with the JAR file. (shortdesc] (topicmeta] (topicref][/map/topicref/topicref {"- map/topicref "}) [/map/topicref/topicref/topicmeta {"- map/topicmeta "}) [/map/topicref/topicref/topicmeta/navtitle {"- topic/navtitle "}) Building the sample plugin (navtitle][/map/topicref/topicref/topicmeta/linktext {"- map/linktext "}) Building the sample plugin (linktext][/map/topicref/topicref/topicmeta/shortdesc {"- map/shortdesc "}) For the purpose of this sample, use Eclipse to build a JAR of the sample Web Crawler parse plugin. (shortdesc] (topicmeta] (topicref][/map/topicref/topicref {"- map/topicref "}) [/map/topicref/topicref/topicmeta {"- map/topicmeta "}) [/map/topicref/topicref/topicmeta/navtitle {"- topic/navtitle "}) Adding the plugin to the IAS lib directory (navtitle][/map/topicref/topicref/topicmeta/linktext {"- map/linktext "}) Adding the plugin to the IAS lib directory (linktext][/map/topicref/topicref/topicmeta/shortdesc {"- map/shortdesc "}) After you build the Jar for your custom plugin, create a directory for the plugin and copy this to the Web Crawler's plugin directory. (shortdesc] (topicmeta] (topicref][/map/topicref/topicref {"- map/topicref "}) [/map/topicref/topicref/topicmeta {"- map/topicmeta "}) [/map/topicref/topicref/topicmeta/navtitle {"- topic/navtitle "}) Activating the plugin for the Web Crawler (navtitle][/map/topicref/topicref/topicmeta/linktext {"- map/linktext "}) Activating the plugin for the Web Crawler (linktext][/map/topicref/topicref/topicmeta/shortdesc {"- map/shortdesc "}) Oracle recommends that you modify the crawl-specific site.xml file, rather than the global default.xml file (this is because the site.xml settings override the default.xml global settings). (shortdesc] (topicmeta] (topicref][/map/topicref/topicref {"- map/topicref "}) [/map/topicref/topicref/topicmeta {"- map/topicmeta "}) [/map/topicref/topicref/topicmeta/navtitle {"- topic/navtitle "}) Running the Web Crawler with a new plugin (navtitle][/map/topicref/topicref/topicmeta/linktext {"- map/linktext "}) Running the Web Crawler with a new plugin (linktext][/map/topicref/topicref/topicmeta/shortdesc {"- map/shortdesc "}) After you activate the new plugin, you can run new crawls exactly as before. (shortdesc] (topicmeta] (topicref] (topicref] (map]