You can examine the configuration and operation of the Web Crawler by running a sample Web crawl. The sample is located in the <install path>\IAS\workspace\conf\web-crawler\polite-crawl directory.
The sample crawls http://www.oracle.com with a pre-configured seed file (endeca.lst) in the <install path>\IAS\workspace\conf\web-crawler\default directory.
The sample crawl is configured to output the records as uncompressed XML. The XML format allows you to easily read the output file to confirm that the crawl collected records. The site.xml file also specifies polite-crawl-workspace as the name of the workspace directory.
To run the sample crawl:
When finished, the Web Crawler displays: Crawl complete. The output file named polite-crawl.xml is in the <install path>\IAS\<version>\bin\polite-crawl-workspace\output directory.