About removing invalid content

The differential crawl will contain a record for documents that previously existed, but have now disappeared (or are no longer valid if the parameters of the spider have changed). These records will have an Endeca.Document.Status property equal to “Fetch Failed” or “Fetch Aborted” and must be removed from the output.

It is recommended to do this after the join, so that all references to a document that no longer exists are eliminated (and thus the final output can be used as input for the next run). Another record manipulator must be placed after the join to remove these records.

It is also recommended to remove those records where the Endeca.Document.IsRedirection property exists and is true; these typically do not have value within search indexes. This is true of all crawlers, and is not necessary to enable differential crawling. Note that these records should be removed after the join.