The Oracle Commerce Web Crawler is based on the Apache Nutch open-source project. As a result, its major functionality is implemented as plug-ins. Its framework allows you to write your own plug-ins, such as plug-ins that extract additional content from Web pages.

The sample plug-in demonstrates how to integrate custom plug-ins into the Web Crawler. The Oracle Commerce Web Crawler APIs contain sample code and documentation to help you create your own plug-ins.

All plug-ins (including the default plug-ins and user-created plug-ins) reside in the CAS/version/lib/web-crawler/plugins directory. Each individual plug-in directory contains one or more JAR files and a plug-in descriptor file (named plugin.xml).


Copyright © Legal Notices