The Oracle Commerce Web Crawler is based on the Apache Nutch open-source project. As a result, its major functionality is implemented as plug-ins. Its framework allows you to write your own plug-ins, such as plug-ins that extract additional content from Web pages.
The sample plug-in demonstrates how to integrate custom plug-ins into the Web Crawler. The Oracle Commerce Web Crawler APIs contain sample code and documentation to help you create your own plug-ins.
All plug-ins (including the default plug-ins and user-created plug-ins) reside in the CAS/
directory. Each individual plug-in directory contains one or more JAR files and a plug-in descriptor file (named version
/lib/web-crawler/pluginsplugin.xml
).