Table of Contents

Data Sources Page

The Ultra Search Crawler retrieves data from one or more data sources.

There are four different types of sources. They are:

  • Table sources
  • Web sources
  • The Email source
  • File source
  Related Topics

You can assign one or more data sources to a synchronization schedule. To do so, use the "Data Synchronization" subtab on the Schedules Page.

You can also assign data sources to data groups to enable restrictive querying. To do so, use the Queries Page.

In this section, you can define and edit data sources.

Table Source

A table source is a data source that represents content in a database table or view. The database table or view can reside in the Ultra Search database instance or a remote database. Ultra Search accesses all remote databases using database links. (Note: there are some limitations when using a database link to a remote database. See the bottom of this section)

You can create as many new table sources as you wish. You can also edit the name of a table source by clicking on the "Details" icon.

To create a table source, click on "Create new table source". Follow steps 1 to 4.

During the process of defining a table source, you will be asked to speficy a table or view column as the column containing text to index. We will refer to this column as the text column.

You will also be asked to map other table or view columns to Ultra Search attributes. Make sure that you do not map the text column to an attribute. Doing so will result in no search hits returned if a search is performed without searching on that mapped attribute.

Limitations when using remote database links

The following restrictions apply to base tables or views that reside on a remote database and are hence accessed over a database link by the crawler.

  1. The base table or view must have a ROWID column. A table or view might not have a ROWID column for various reasons, some of which are as follows:
    • A view which comprises of a join of one or more tables might not always have a ROWID column.
    • A view which is based on a single table but uses a group by clause might not have a ROWID column.
    The surest way to know if a remote table or view can be safely crawled by Ultra Search is to check for the existence of the ROWID column. To do so, execute the following SQL command against that table or view using SQL*Plus:
    select min(ROWID) from <table or view name>;
  2. Base tables or views cannot have text columns of type BFILE.

Web Source

A web source represents HTML content located at a specific web site. Web sources differ from other data source types because they exist specifically to facilitate maintenance crawling of specific websites. Crawling a web source will not result in any work unless the Primary Schedule is first run.

The Primary Schedule implicitly crawls all URLs reacheable via the specified Seed URLs. After the Primary Schedule has been completed, you can recrawl specific web hosts for maintenance purposes. A Web Source allows you to do this.

You can create as many web sources as you wish.

To create a new web source, do the following:

  1. Click on "Create new web source".
  2. Enter an arbitrary name for the web source.
  3. Enter a URL in the following form: http://<hostname>

  4. All discovered URLs that have the same host as that defined in the web source URL will be assigned to that web source. The port number is irrelevant.
  5. Click on "Apply" to create the source. You can create as many web sources as you wish.
  6. Click on "Done" to return to the Web Source List.

Email Source

An email source is a data source which derives its content from emails sent to a specific email address. When the Ultra Search Crawler crawls an email source, the crawler collects all emails that have the specific email address in any of the "To:" or "Cc:" email header fields.

The most popular application of an email source is where an email source represents all emails sent to a mailing list. In such a scenario, multiple email sources will be defined where each email source represents an email list.

To crawl email sources, you need an IMAP account. At present, the Ultra Search Crawler can only crawl one IMAP account. Therefore, all emails to be crawled must be found in the inbox of that IMAP account. For example, in the case of mailing lists, the IMAP account should be subscribed to all desired mailing list(s). All new postings to the mailing lists will be sent to the IMAP email account and subsequently crawled. The Ultra Search crawler is IMAP4 compliant.

When the Ultra Search Crawler retrieve an email message, it deletes the email message from the IMAP server. Secondly, the crawler converts the email message content to HTML and temporarily stores that HTML content in the cache directory for indexing. Thirdly, the Ultra Search Crawler also stores all retrieved messages in a directory known as the archive directory. The email files stored in this directory are displayed to the search end-user when referenced by a query hit.

To crawl email sources, you must specify the username and password of the email account on the IMAP server. Specify also the IMAP server hostname and the archive directory.

You may create as many email sources as necessary. An email source entry comprises of an email address as well as an arbitrary description of the email source. Note that this description is viewed by all search end-users. Therefore, you should specify a short but meaningful name.

Finally, you can specify email address aliases for an email source. Specifying an alias for an email source causes all emails sent to the main email address as well as the alias address to be gathered by the crawler.

File Source

A file data source is the set of documents that can be accessed through the file protocol on the Ultra Search database machine or a remote crawler machine. You can create as many new file sources as you wish. You can also edit the name of a file source by clicking on the "Details" icon.

To create a new file source, do the following:

  1. Click on "Create new file source".
  2. In the Create file source page, enter a source name.
  3. Click "Proceed to step 2" and follow the instructions on how to specify your file URLs.