Data Synchronization
The table on this page displays information about all schedules.
Primary Schedule
The primary schedule is created at install time and cannot be removed.
The primary schedule must be run at least once before any other schedule
that has one or more web sources assigned are run. Failure to run the
primary schedule first will render the synchronization of other web
sources ineffective.
The web access parameters you define such as Seed URLs and inclusion
domains tell the crawler's primary schedule where to start. To define
web access parameters, use the Web Access
Page in the administration tool.
Altering the Primary Schedule
You can alter the primary schedule in the following ways:
- Change its frequency by clicking on the schedule interval text.
- View its detailed status by clicking on the schedule status.
- Alter its status by clicking on the schedule status.
Launching the Primary Schedule
You can launch the primary schedule in the following ways:
- Set a schedule frequency and wait for the predetermined
launch time.
- Execute it immediately by clicking on its status then "Execute immediately."
Note: Launching the primary schedule can potentially take a very long
time. If the primary schedule has been launched before, the next time it
is launched, all web URLs that do not belong to any other web
sources will be copied over into a queue table.
Depending on the number of URLs to be copied over, the copy operation can potentially take
a very long time. The Administration Tool will display the
schedule state as 'Launching' during the entire time.
Synchronization Schedules
A synchronization schedule has one or more data sources assigned to
it. The synchronization schedule frequency specifies when the data sources
are to be synchronized after the primary schedule is run.
Synchronization schedules are sorted first by name. Within a synchronization
schedule, individual data sources are listed and can be sorted by source
name or source type.
Creating Synchronization Schedules
To create a new schedule, click on the "Create New Schedule" button
and follow Steps 1 to 3. In these steps you name the schedule, set its
frequency, and assign data sources to it. You can also optionally associate
the schedule with a remote crawler
profile.
Editing Synchronization Schedules
Once a synchronization schedule has been defined, you can do the following
in the Synchronization Schedules List:
- Change its frequency by clicking on the schedule interval text.
- Change the remote crawler association
by clicking on the hostname text.
- View its detailed status description by clicking on the schedule
status.
- Alter its status by clicking on the schedule status.
- Edit its name or data source assignments by clicking on the edit
icon.
- Delete it by clicking on the delete icon.
Launching Synchronization Schedules
You can launch a synchronization schedule in the following ways:
- Set a schedule frequency and wait for the predetermined
launch time.
- Execute it immediately by clicking on its status then "Execute immediately."
Note: Launching a synchronization schedule can potentially take a very long
time. If a schedule has been launched before, the next time a
schedule is launched, all URLs that belong to the data
source(s) to be crawled by the schedule will be copied over
into a queue table. Depending on the number of URLs associated
with that data source, the copy operation can potentially take
a long time. The Administration Tool will display the
schedule state as 'Launching' during the entire time.
Email Schedule
The email schedule is created by default when you create an instance.
You cannot delete the email schedule. The crawler uses this schedule
to crawl all email sources.
Index Optimization Schedule
The Crawler synchronization schedule maintains an active index of all
documents crawled over all data sources. In order to ensure fast query
results, the Ultra Search index must be optimized when there are substantial
updates to the index. This page allows you to schedule when to optimize
the index.
It is important that the index be optimized during hours of low usage.
Doing so will ensure minimal disruption to users.
Optimization Process Duration
Specify a maximum duration for the index optimization process. The
actual time taken for optimization does not exceed this limit but might
be shorter. Specifying a longer optimization time results in a more
optimized index.
|