Refreshing Content from Content Crawlers

You can refresh metadata and import new content from content crawlers that have previously imported content.

If you are editing an existing content crawler, you see the section Importing Documents. Under Importing Documents, specify whether to import only new documents. By default, the content crawler attempts to import only new documents (those that have not been previously imported by this content crawler or other content crawlers that access this same content source). You can change the content crawler setting to import multiple copies of each document, which might be useful while testing your content crawlers. You can also specify whether the content metadata should be updated.

To import only new documents, select Import only new links. New options display.
If you want to import all content again the next time this content crawler runs, leave the option unselected and skip the rest of the steps.
Specify what new links means:
- To import only those documents that have not been previously imported by this content crawler, choose by this Content Crawler.
- To import only those documents that have not been imported from the associated content source (either by this content crawler, another content crawler, or manually by a user), choose from this Content Source.
Note: The option you choose here also applies to the rejection history and deletion history. For example, if you select from this Content Source, the rejection history includes content rejected by any content crawler that has crawled the content source.
To refresh the previously imported documents as specified on the Document Settings page, select refresh them. Generally, refreshing documents is the job of the Document Refresh Agent; refreshing documents slows the content crawler down. However, if you changed the document settings for this content crawler or changed the property mappings in the associated content types, refreshing documents updates these settings for the previously imported documents.
Note: If you are crawling an RSS feed, the refresh them option refreshes the properties (such as the title and description) with the values from the target documents, not the RSS feed. If you want to retain the properties from the RSS feed, do not select refresh them.
If you created additional folders or applied different filters to destination folders, select try to sort them into additional folders to sort the previously imported documents into new Knowledge Directory folders. Another content crawler might have imported documents from the same content source but into different folders than the destination folders specified for this content crawler. Make sure you really want to re-sort those documents into the destination folders specified for this content crawler.
To re-import documents that were previously deleted (manually, due to expiration, or due to missing source documents), select regenerate deleted links.
Note: This might re-import documents that were at one time deemed inappropriate for your portal.

If absolutely necessary, you can delete the history of documents that have been deleted from the portal. Remember that the deletion history is defined by what you specified as new documents in Step 2.

If you chose by this Content Crawler, the history includes all documents imported by this content crawler that have been deleted.
If you chose from this Content Source, the history includes all documents imported from this content source that have been deleted. Therefore, you are deleting the history for all content crawlers that import documents from this content source.

If you are still sure that you must delete the record of documents deleted from the portal, click Clear Deletion History.

Parent topic: About Importing Content with Content Crawlers

AquaLogic Interaction Administrator Guide

Refreshing Content from Content Crawlers