You can assign one or more data sources to a synchronization schedule.
To do so, use the "Data Synchronization" subtab on the Schedules
Page.
You can also assign data sources to data groups to enable restrictive
querying. To do so, use the Queries Page.
In this section, you can define and edit data sources.
Table Source
A table source is a data source that represents content in a database
table or view. The database table or view can reside in the Ultra Search database instance
or a remote database. Ultra Search accesses all remote databases using
database links. (Note: there are some limitations when
using a database link to a remote database. See the bottom
of this section)
You can create as many new
table sources as you wish. You can also edit the name of a table source
by clicking on the "Details" icon.
To create a table source, click on "Create new table source". Follow
steps 1 to 4.
During the process of defining a table source, you will be
asked to speficy a table or view column as the column containing text to
index. We will refer to this column as the text column.
You will also be asked to map other table or view columns to Ultra
Search attributes.
Make sure that you do not map the text
column to an attribute. Doing so will result in no search hits
returned if a search is performed without searching on that
mapped attribute.
Limitations when using remote
database links
The following restrictions apply to base tables or views that reside
on a remote database and are hence accessed over a database link
by the crawler.
- The base table or view must have a ROWID column.
A table or view might not have a ROWID column for various
reasons, some of which are as follows:
- A view which comprises of a join of one or more tables
might not always have a ROWID column.
- A view which is based on a single table but uses a group
by clause might not have a ROWID column.
The surest way to know if a remote table or view can be safely
crawled by Ultra Search is to check for the existence of the
ROWID column. To do so, execute the following
SQL command against that table or view using SQL*Plus:
select min(ROWID) from <table or view name>;
- Base tables or views cannot have text columns of type BFILE.
Web Source
A web source represents HTML content located at a specific
web site. Web sources differ from other data source types
because they exist specifically to facilitate maintenance crawling of
specific websites. Crawling a web source will not result in
any work unless the Primary Schedule is first run.
The Primary Schedule implicitly crawls all URLs reacheable via
the specified Seed URLs. After the Primary Schedule has
been completed, you can recrawl specific web hosts for
maintenance purposes. A Web Source allows you to do this.
You can create as many web sources as you wish.
To create a new web source, do the following:
- Click on "Create new web source".
- Enter an arbitrary name for the web source.
- Enter a URL in the following form: http://<hostname>
All discovered URLs that have the same host as that defined in the
web source URL will be assigned to that web source. The port
number is irrelevant.
- Click on "Apply" to create the source. You can create as many web
sources as you wish.
- Click on "Done" to return to the Web Source List.
Email Source
An email source is a data source which derives its content from emails
sent to a specific email address. When the Ultra Search Crawler crawls
an email source, the crawler collects all emails that have the specific
email address in any of the "To:" or "Cc:" email header fields.
The most popular application of an email source is where an email source
represents all emails sent to a mailing list. In such a scenario, multiple
email sources will be defined where each email source represents an
email list.
To crawl email sources, you need an IMAP account. At present, the Ultra
Search Crawler can only crawl one IMAP account. Therefore, all emails
to be crawled must be found in the inbox of that IMAP account. For example,
in the case of mailing lists, the IMAP account should be subscribed
to all desired mailing list(s). All new postings to the mailing lists
will be sent to the IMAP email account and subsequently crawled. The
Ultra Search crawler is IMAP4 compliant.
When the Ultra Search Crawler retrieve an email message, it deletes
the email message from the IMAP server. Secondly, the crawler converts
the email message content to HTML and temporarily stores that HTML content
in the cache directory for indexing. Thirdly, the Ultra Search Crawler
also stores all retrieved messages in a directory known as the archive
directory. The email files stored in this directory are displayed to
the search end-user when referenced by a query hit.
To crawl email sources, you must specify the username and password
of the email account on the IMAP server. Specify also the IMAP server
hostname and the archive directory.
You may create as many email sources as necessary. An email source
entry comprises of an email address as well as an arbitrary description
of the email source. Note that this description is viewed by all search
end-users. Therefore, you should specify a short but meaningful name.
Finally, you can specify email address aliases for an email source.
Specifying an alias for an email source causes all emails sent to the
main email address as well as the alias address to be gathered by the
crawler.
File Source
A file data source is the set of documents that can be accessed through
the file protocol on the Ultra Search database machine or a remote crawler
machine. You can
create as many new file sources as you wish. You can also edit the name
of a file source by clicking on the "Details" icon.
To create a new file source, do the following:
- Click on "Create new file source".
- In the Create file source page, enter a source name.
- Click "Proceed to step 2" and follow the instructions on how to
specify your file URLs.
|