This topic provides an overview of full crawls.
A full crawl means that the crawler processes all the pages in the seeds (except for pages that are excluded by filters). As part of the full crawl, a crawl history database is created with metadata information about the URLs. The database is created in the workspace directory of the crawl.
The crawl database provides persistence, so that its history can later be used for resumable crawls. For example, if the user stops a full crawl via a Control-C in the command window, the crawler closes the database files before exiting. If the crawl is later resumed (via the -r flag), the resumed crawl begins with the first URL that has a status of pending.