Sun Java logo     Previous      Contents      Index      Next     

Sun logo
Sun Java System Portal Server 6 2005Q1 Technical Reference Guide 

Chapter 6
Search Attributes: Database

The Database attributes are divided as follows:


Management

The initial Management page lists the available databases. You can create a new one, reindex, purge, or expire an existing one. Use the checkbox to select a database on which to perform an action. Use the small icons above the checkbox to select or deselect all the databases. When you select Reindex, Purge or Expire, a prompt confirming that you want to perform the action with a list of database names displays. To perform the action, select OK.

You should reindex the database if you have edited the schema to add or remove an indexed field (as author), or if a disk error has corrupted the index. You need to restart the server after you change the schema.

Because the time required to reindex the database is proportional to the number of RDs in the database, a large database should be reindexed when the server is not in high demand.

When you purge the contents of the database, disk space used for indexes will be recovered, but disk space used by the main database will not be recovered; instead, it is reused as new data is added to the database.

Expiring a database deletes all RDs that are deemed out-of-date. It does not decrease the size of the database. By default, an RD is scheduled to expire in 90 days from the time of creation.

You can also edit the database by selecting the Edit link which takes you to a page where you define the database attributes.

Table 6-1 lists the Database Management attributes. The table contains three columns: the first column identifies the attribute, the second column provides the default value for the attribute, and the third column describes the attribute.

Table 6-1  Database Management Attributes  

Attribute

Default Value

Description

Name

Default

Name for the database used by Search.

Title

Blank

A title for the database.

Description

Blank

Describe the database for yourself.


Import Agents

Import agents are the processes that bring resource descriptions from other servers or databases and merge them into your search database.

The initial Import page lists the available import agents. You can create a new one, or run, edit or delete an existing one. Use the checkbox to select an agent to delete. Use the small icons above the checkbox to select or deselect all import agents. Use the radio buttons to turn an Agent Action On or Off. To schedule the import agents, select Schedule on the lower menu bar.

If you choose to edit or modify an existing import agent or create a new one, the following attributes are displayed.

Table 6-2 lists the Database Import Agent attributes. The table contains three columns: the first column identifies the attribute, the second column provides the default value for the attribute, and the third column describes the attribute.

Table 6-2  Database Import Agent Attributes  

Attribute

Default Value

Description

Charset

Blank for new

Specifies the character set of the input SOIF stream. For example, ISO8859-1, UTF-8, UTF-16. Character sets ISO8859-1 through ISO8859-15 are supported.

Import From

Local File

Select either Local File or Search Server (if one is enabled).

Local File Path

Blank for new

Gives the full path name of local file that contains valid resource descriptions in SOIF (Summary Object Interchange Format). This can be a file on another server, as long as the path is addressable as if it were locally mounted.

Database Name

Default

Name of the destination database.

Remote Server

Blank for new

Gives the URL of the search server to retrieve resource descriptions from; format http://www.sesta.com:80

Instance Name

Blank for new

Server instance name used by the search server. You can find this instance name in the Server Preferences for the server you are importing from. Value must be 3.01C or 3.01C SP1.

Search URI

Blank for new

Enter full path and file names. Use /portal/search.

Is Compass Server 3.01X?

False (unchecked)

Is the server you are importing from a Compass Server 3.01X?

Enable SSL

False (unchecked)

If this is a server-to-server transaction, select if the servers should use the SSL (Secure Sockets Layer) protocol.

Authentication

None (default)

None (default) or Use User/Password

This specifies how the import agent should identify itself to the system it imports from. By default, no authentication is used. If the server you want to import from requires authentication, you can specify a user name and password for the import agent to use. Importing from 3.01C does not require authentication. Importing data from 3.01C SP1 requires authentication.

User

Blank for new or none

If you selected Use User/Password, enter a user.

Password

Blank for new or none

If you selected Use User/Password, enter a password (shown as *).

Content Transfer

Use Incremental Gathering of Full Contents (default)

Choice of Use Incremental Gathering of Full Contents (default) or Use Search Query

These specify which resource descriptions to import from the source.

By default, an import agent asks for all resource descriptions added or changed since its last import from the same source.

The search query specifies that the import agent should request only certain resource descriptions from the source. This is much the same way that users request listings of resources from the search database.

Use Scope, View-Attributes and View-Hits fields to specify the query.

Use Collect All RDs to collect new resource descriptions since the last run and remove the timestamp in Newest Resource Description.

Scope

Blank for new

The text of the query. The query syntax is identical to that used for end-user queries from the server.

View-Attributes

Blank for new

Lists which fields (not case sensitive) you want to import in each resource description. For example, title and author. The default is all.

View-Hits

Blank for new

The maximum number of matching resource descriptions to import. If no hits are specified, it defaults to 20.

Agent Description

Blank for new

Appears in the list of available import agents on the initial Import page. It is ignored by the program. If this field is blank, the Resource Description Source file name or server name is used to identify the import agent. Note here if user name and password are needed.

Newest Resource Description

Blank for new

The date of the creation of the newest resource description previously imported by this import agent. This date is used by the Use Incremental Gathering of Full Contents option to determine which resources are new and should be imported.

Network Timeout in seconds

Blank for new

Specifies the number of seconds the import agent will allow before timing out the connection over the network. You can adjust this to allow for varying network traffic and quality.


Resource Descriptions

The initial Resource Descriptions page allows you to search the Resource Descriptions in the database. For example, you can correct a typographical error in an RD or manually assign RDs discovered by the robot to categories.

Table 6-3 lists the Resource Descriptions attributes. The table contains three columns: the first column identifies the attribute, the second column provides the default value for the attribute, and the third column describes the attribute.

Table 6-3  Resource Descriptions Attributes  

Attribute

Default Value

Description

Search For

All RDs

All RDs, Uncategorized RDs, Categorized RDs, RDs by category, Specific RD by URL, RDs that contain???

Text box

Blank

Enter a unique text string to identify the RDs searched for. Use with the RDs by category, Specific RD by URL, and RDs that contain attribute values.

Database

Default

Name of the database to search.

Select Category

 

Browse and select a category from the category tree.

Delete

 

Delete one or more selected RDs that are returned from an RD search.

Next

 

Display the next set of RDs returned from an RD search

Previous

 

Display the previous set of RDs returned from an RD search

Edit Selected

 

Edit the attributes of one or more RDs that are returned from an RD search.

Edit All

 

Edit the attributes of the current set of RDs that are returned from an RD search.

To limit the search by category, select Select Category. A Category Editor page displays allowing you to specify the category from the taxonomy for the search. You can specify the category in the Selected Category text box or browse the taxonomy to select it. After specifying the category, select OK to return to the RD search page.

Table 6-4 lists the Category Editor attributes. The table contains three columns: the first column identifies the attribute, the second column provides the default value for the attribute, and the third column describes the attribute.

Table 6-4  Category Editor Attributes

Attribute

Default Value

Description

Selected Categories

Blank

Text field that displays the selected categories

Expand All

 

Expands the taxonomy so that all entries in the hierarchy display for browsing.

Collapse All

Blank

Collapses the taxonomy so that only categories within the first two levels of the hierarchy display for browsing.

Categories per page

25

Drop down list of the number of categories to display per page. Values are 25, 50, 100, 250, 500, and all.

A successful search displays the Number of RDs found and a list box with the RDs found. After clicking on the Edit link of an RD, the following attributes, which you can edit, and partial text of the RD are displayed. All these attributes except Classification are set to editable in the Database/ Schema page.

Table 6-5 lists the Database RD Editable attributes. The table contains three columns: the first column identifies the attribute, the second column provides the default value for the attribute, and the third column describes the attribute.

Table 6-5  Database RD Editable Attributes  

Attribute

Default Value

Description

Author

Blank

Author(s) of the document.

Author e-mail

Blank

Email address to contact the Author(s) of the document.

Classification

Category name of selected RD.

Category name if classified; No Classification if not classified.

ReadACL

Blank

Related to document level security.

Content-Charset

 

Content-Charset information from HTTP Server.

Content-Encoding

Blank

Content-Encoding information from HTTP Server.

Content-Language

Blank

Content-Language information from HTTP Server.

Content-Length

Blank

Content-Length information from HTTP Server.

Content-Type

Blank

Content-Type information from HTTP Server.

Description

Description from the selected RD.

Description from RD.

Expires

Valid date.

Date on which resource description is no longer valid.

Full-Text

Blank

Entire contents of the document.

Keywords

Keywords, if any, from the selected RD.

Keywords taken from meta- tags.

Last-Modified

Last modification date

Date when the document was last modified.

Partial-text

Partial text of the document

Partial selection of text from the document

Phone

Blank

Phone number for Author contact

Title

Title of the selected RD.

Title of RD

URL

Blank

Uniform Resource Locator for the document


Schema

The schema determines what information is in a resource description and what form that information is in. You can add new attributes or fields to an RD and set which ones can be edited and which ones can be indexed. When importing new RDs, you can convert schemas embedded in new RDs into your own schema.

Table 6-6 lists the Database Schema Edit attributes. The table contains three columns: the first column identifies the attribute, the second column provides the default value for the attribute, and the third column describes the attribute.

Table 6-6  Database Schema Edit Attributes  

Attribute

Description

Author

Author(s) of the document.

Author-EMail

Email address to contact the Author(s) of the document.

Content-Charset

Content-Charset information from HTTP Server.

Content-Encoding

Content-Encoding information from HTTP Server.

Content-Language

Content-Language information from HTTP Server.

Content-Length

Content-Length information from HTTP Server.

Content-Type

Content-Type information from HTTP Server.

Description

Brief one-line description for document.

Expires

Date on which resource description is no longer valid.

Full-Text

Entire contents of the document.

Keywords

Keywords that best describe the document.

Last-modified

Date when the document was last modified.

Partial-Text

Partial selection of text from the document.

Phone

Phone number for Author contact.

ReadACL

Used by Search servers to enforce security.

Title

Title of the document.

URL

Uniform Resource Locator for the document

Aliases

Name

Description

When you import new RDs, you can convert schemas embedded in new RDs into your own schema. You would use this conversion when there are discrepancies between the names used for fields in the import database schema and the schema used for RDs in your database. An example would be if you imported RDs that used Writer as a field for the author and you used Author in your RDs as the field for the author. The conversion would be Writer to Author, so you would enter Writer in this text box.

Data Type

Defines the data type.

Editable

If true (checked), the selected attribute (field) appears in the Database RD Editor, so you can change its values.

Description, Keywords, Title and ReadACL are editable.

Indexable

If true (checked), the selected attribute (field) can be used as a basis for indexing.

Author, Title and URL appear in the menu in the Advanced Search screen for the end user. This allows end users to search for values in those particular fields.

Author, Expires, Keywords, Last Modified, Title, URL and ReadACL can be used as the basis for indexing.

Score Multiplier

A weighting field for scoring a particular element. Any positive value is valid.


Analysis

The Analysis page shows a sorted list of all sites and the number of resources from that site currently in the search database. Select Update Analysis to update the analysis on file.

Table 6-7 lists the Database Analysis attributes. The table contains three columns: the first column identifies the attribute, the second column provides the default value for the attribute, and the third column describes the attribute.

Table 6-7  Database Analysis Attributes  

Attribute

Default Value

Description

Total number of RDs

Current number of RDs in database.

Lists current total number of resource descriptions in the database.

Number of servers

Current number of servers that the database is partitioned across.

The database can be partitioned and placed on a number of servers.

Site

URL or domain that the robot has successfully searched.

A URL or domain that has added resource descriptions to the database.

Number of RDs

Current number of RDs from that site.

Lists current number of RDs from that site.

Type

Type of RD

Resource descriptions can be of many different types, for example, http.

Percentage

Type of RD/ Total number of RDs

Percentage of this type of document compared to the total number of resource descriptions.


Schedule

This page is where you set up the schedule for running the import agents. Table 6-8 lists the Database Import Schedule attributes. The table contains three columns: the first column identifies the attribute, the second column provides the default value for the attribute, and the third column describes the attribute.

Table 6-8  Database Import Schedule Attributes  

Attribute

Default Value

Description

Start Import Time in hours and minutes

00:00

Time that the import agent starts to import.

Days

none selected

Sun -Sat

Check at least one day.



Previous      Contents      Index      Next     


Part No: 817-7696.   Copyright 2005 Sun Microsystems, Inc. All rights reserved.