Refreshing Content from Content Crawlers
You can refresh metadata and import new content from content
crawlers that have previously imported content.
If you are editing an existing content crawler, you see the
section Importing Documents.
Under Importing Documents, specify whether to import only new documents.
By default,
the content crawler attempts to import only new documents (those that
have not been
previously imported by this content crawler or other content crawlers
that access
this same content source). You can change the content crawler setting
to import multiple
copies of each document, which might be useful while testing your
content crawlers. You can also specify whether the content metadata
should be updated.
- To import only new documents, select Import
only new links.
New options display.
If you want to import all content
again the next time this content crawler runs, leave the option unselected
and skip the rest of the steps.
- Specify what new links means:
- To import only those documents that have not been previously
imported by this content
crawler, choose by this Content Crawler.
- To import only those documents that have not been imported
from the associated content
source (either by this content crawler, another content crawler, or
manually by a
user), choose from this Content Source.
Note: The option you choose here also applies to the rejection
history and deletion history. For example, if you select from this Content Source, the rejection history includes
content rejected by any content crawler that has crawled the content
source.
- To refresh the previously imported documents as specified
on the Document Settings
page, select refresh them.
Generally, refreshing documents is the job of the Document Refresh
Agent; refreshing
documents slows the content crawler down. However, if you changed
the document settings
for this content crawler or changed the property mappings in the associated
content
types, refreshing documents updates these settings for the previously
imported documents.
Note: If you are crawling an RSS feed, the refresh them option refreshes the properties
(such as the title and description) with the values from the target
documents, not
the RSS feed. If you want to retain the properties from the RSS feed,
do not select
refresh them.
- If you created additional folders or applied different
filters to destination folders,
select try to sort them into additional folders to sort the previously imported documents
into new Knowledge Directory folders.
Another content crawler might have imported documents from the
same content source
but into different folders than the destination folders specified
for this content
crawler. Make sure you really want to re-sort those documents into
the destination
folders specified for this content crawler.
- To re-import documents that were previously deleted (manually,
due to expiration,
or due to missing source documents), select regenerate
deleted links.
Note: This might re-import documents that were at one time deemed
inappropriate for your
portal.
If absolutely necessary, you can delete the history of documents
that have been deleted
from the portal. Remember that the deletion history is defined by
what you specified as new documents in Step 2.
- If you chose by this Content Crawler, the
history includes all documents imported
by this content crawler that have been deleted.
- If you chose from this Content Source,
the history includes all documents imported
from this content source that have been deleted. Therefore, you are
deleting the
history for all content crawlers that import documents from this content
source.
If you are still sure that you must delete the record of documents
deleted from the
portal, click Clear Deletion History.