This topic provides an overview of resumable crawls.

A resumable crawl (also called a restartable crawl) is a crawl that uses the seed URLs of a previous full or resumed crawl. It also uses a greater depth level and/or a different set of configuration settings.

You use the -r (or --resume) command-line flag to resume a crawl. Resumable crawls use the previously-created crawl history database in the workspace directory, because the database provides the seed and a list of URLs that have already been crawled. Resumable crawls do not recrawl URLs that have a status of complete in the history database.

Among the possible use-case scenarios for resumable crawls are the following:

The rules for resumed crawls are the following:


Copyright © Legal Notices