Sun Java System Portal Server 7.1 Technical Reference

Chapter 5 Search Attributes: Database

This chapter explains the attributes provided for the search database. The Database attributes are divided as follows:

Management

Before knowing about the Search database, you need to know how to partition the database. To partition the database, use the run-cs-cli rdmgr -G command, because stopping the search server is required.

The initial Manage Databases page lists the available databases. You can select a database by selecting the checkbox preceding to it. Click the New, Reindex, Purge, Analyze, Manage, or Expire resource descriptions button to perform the necessary action on the selected database.

You should reindex the database if you have edited the schema to add or remove an indexed field (as author), or if a disk error has corrupted the index. You need to restart the server after you change the schema.

Because the time required to reindex the database is proportional to the number of RDs in the database, a large database should be reindexed when the server is not in high demand.

When you purge the contents of the database, disk space used for indexes will be recovered, but disk space used by the main database will not be recovered; instead, it is reused as new data is added to the database.

Expiring a database deletes all RDs that are deemed out-of-date. It does not decrease the size of the database. By default, an RD is scheduled to expire in 90 days from the time of creation.

The table below lists the Database Management attributes and their description.

Table 5–1 Database Management Attributes

Attribute 

Default Value 

Description 

Name 

True or False 

Name for the database used by Search. 

Federated 

True or False 

For a Federated database, this value is True. Otherwise, the value is False. 

Import Agents

Import agents are the processes that bring resource descriptions from other servers or databases and merge them into your search database.

The initial Manage Import Agents page lists the available import agents. You can select an import agent by selecting the checkbox preceding to it. Click the New, Enable, Disable, Delete, or Run All Enabled Import Agents to perform the necessary action on the selected import agent. To schedule the import agents, select Scheduling on the lower menu bar.

If you choose to create a new import agent or edit or modify an existing import agent, the following database import agent attributes are displayed.

The table below lists the Database Import Agent attributes and their description.

Table 5–2 Database Import Agent Attributes

Attribute 

Default Value 

Description 

Import agent source 

Local File 

Select either Local File or Remote Server (if one is enabled). 

Local File Path 

Blank for new 

Gives the full path name of local file that contains valid resource descriptions in search result (Summary Object Interchange Format). This can be a file on another server, as long as the path is addressable as if it were locally mounted. 

Destination Database 

Blank 

Name of the destination database. 

Remote Server Host 

Blank for new 

Gives the URL of the search server to retrieve resource descriptions from; the format is: www.sesta.com

Remote Server Port 

Blank for new 

Specify the port number for the given remote server host. For example, 8080

Search URI 

Blank for new 

Enter full path and file names. Use /search1/search.

Enable SSL 

False 

If this is a server-to-server transaction, select if the servers should use the SSL (Secure Sockets Layer) protocol. 

User 

Blank for new or none 

If you selected Use User/Password, enter a user. 

Password 

Blank for new or none 

If you selected Use User/Password, enter a password (shown as *). 

Content Transfer 

All 

By default, an import agent asks for all resource descriptions added or changed since its last import from the same source. 

The search query specifies that the import agent should request only certain resource descriptions from the source. This is much the same way that users request listings of resources from the search database. 

Use Scope, View-Attributes and View-Hits fields to specify the query. 

Scope 

Blank for new 

The text of the query. The query syntax is identical to that used for end-user queries from the server. 

View-Attributes 

Blank for new 

Lists which fields (not case sensitive) you want to import in each resource description. For example, title and author. The default is all. 

View-Hits 

Blank for new 

The maximum number of matching resource descriptions to import. If no hits are specified, it defaults to 20. 

Network Timeout in seconds 

Blank for new 

Specifies the number of seconds the import agent will allow before timing out the connection over the network. You can adjust this to allow for varying network traffic and quality.

Title 

Blank for new 

Title of the import agent. 

Remote Database 

Blank 

Name of the database on the remote server. 

Resource Descriptions

The initial Resource Descriptions page allows you to search the Resource Descriptions in the database. For example, you can correct a typographical error in an RD or manually assign RDs discovered by the robot to categories.

The table below lists the Resource Descriptions attributes and their description.

Table 5–3 Resource Descriptions Attributes

Attribute 

Default Value 

Description 

New 

 

Opens up the New Resource Description page where you can enter the URL to create a new search RD. 

Edit 

 

Opens up the Edit URL page where you can modify only the attributes of a search RD, which can be edited. 

Edit All 

 

Opens up the Edit Resource Descriptions page where you can modify a group of search RD. 

Delete 

 

Deletes the selected search RD. 

Filter 

All 

The options available are Categorized (to list Categorized RDs), Uncategorized (to list Uncategorized RDs), and Custom Filter. 

Custom Filter 

 

This attribute provides the following options: 

Query (Selected by default) 

URL 

Category 

Text box — To enter the search string. 

On selecting the Category option, the Choose button appears. Click the Choose button to go to the Select a Category page where you cab select the category. 

A successful search displays the Number of RDs found and a list box with the RDs found. If you navigate to the Edit page for a resource description, you can modify only the attributes of a resource description, which can be edited. By default, you cannot edit some of the RD attributes listed in the table below. To edit all these attributes except the Classification attribute, change the settings in the Database/Schema/Edit schema attribute page.

The table below lists the Database RD Editable attributes and their description. The default value for these attributes depends on the selected RD.

Table 5–4 Database RD Editable Attributes

Attribute 

Description 

Author 

Author(s) of the document. 

Author e-mail 

Email address to contact the Author(s) of the document. 

Classification 

Category name if classified; No Classification if not classified. 

ReadACL 

Related to document level security. 

Content-Charset 

Content-Charset information from HTTP Server. 

Content-Encoding 

Content-Encoding information from HTTP Server. 

Content-Language 

Content-Language information from HTTP Server. 

Content-Length 

Content-Length information from HTTP Server. 

Content-Type 

Content-Type information from HTTP Server. 

Description 

Description from RD. 

Expires 

Date on which resource description is no longer valid. 

Full-Text 

Entire contents of the document. 

Keywords 

Keywords taken from meta- tags. 

Last-Modified 

Date when the document was last modified. 

Partial-text 

Partial selection of text from the document 

Phone 

Phone number for Author contact 

Title 

Title of RD 

URL 

Uniform Resource Locator for the document

virtual-db 

Used to implement virtual database. 

Schema

When you click the Schema tab under Databases, you will get the Manage Search Schema page. This page lists the available Search Schema attributes. The schema determines what information is in a resource description and what form that information is in. You can add new attributes or fields to an RD and set which ones can be edited and which ones can be indexed. When importing new RDs, you can convert schemas embedded in new RDs into your own schema.

The table below lists the Search Schema attributes and their description.

Table 5–5 Search Schema Attributes

Attribute 

Description 

Author 

Author(s) of the document. 

Author-EMail 

Email address to contact the Author(s) of the document. 

Content-Charset 

Content-Charset information from HTTP Server. 

Content-Encoding 

Content-Encoding information from HTTP Server. 

Content-Language 

Content-Language information from HTTP Server. 

Content-Length 

Content-Length information from HTTP Server. 

Content-Type 

Content-Type information from HTTP Server. 

Description 

Brief one-line description for document. 

Expires 

Date on which resource description is no longer valid. 

Full-Text 

Entire contents of the document. 

Keywords 

Keywords that best describe the document. 

Last-modified 

Date when the document was last modified. 

Partial-Text 

Partial selection of text from the document. 

Phone 

Phone number for Author contact. 

ReadACL 

Used by Search servers to enforce security. 

Title 

Title of the document. 

URL 

Uniform Resource Locator for the document 

virtual-db 

Used to implement virtual database. 

When you select the checkbox preceding to a search schema attribute and click on it, the Edit search schema name page appears. This page displays all the attributes to edit a search schema attribute. The table below lists the attributes and their description to edit a search schema attribute.

Table 5–6 Edit Search Schema Attribute Attributes

Attribute 

Default Value 

Description 

Name 

Description 

Aliases 

Author 

Author(s) of the document 

Blank 

When you import new RDs, you can convert schemas embedded in new RDs into your own schema. You would use this conversion when there are discrepancies between the names used for fields in the import database schema and the schema used for RDs in your database. 

An example would be if you imported RDs that used Writer as a field for the author and you used Author in your RDs as the field for the author. The conversion would be Writer to Author, so you would enter Writer in this text box. 

Editable 

false 

If true (checked), the selected attribute (field) appears as Editable attribute in the Edit page for a resource description. 

Description, Keywords, Title and ReadACL are editable. 

Indexable 

true 

If true (checked), the selected attribute (field) can be used as a basis for indexing. 

Author, Title and URL appear in the menu in the Advanced Search screen 

for the end user. This allows end users to search for values in those  

particular fields. 

Author, Expires, Keywords, Last Modified, Title, URL and ReadACL  

can be used as the basis for indexing. 

Score Multiplier 

Blank 

A weighting field for scoring a particular element. Any positive value is valid. 

Data Type 

String 

Defines the data type. You need to choose the data type from the list box. 

Analysis

The Analysis page shows a sorted list of all sites and the number of resources from that site currently in the search database. Select Update Analysis to update the analysis on file.

The table below lists the Database Analysis attributes and their description.

Table 5–7 Database Analysis Attributes

Attribute 

Default Value 

Description 

Number of RDs 

Current number of RDs retrieved from the URL. 

Lists current number of RDs from that URL. 

URL 

URL that the robot has successfully searched. 

A URL that has added. 

Protocol 

Protocol it uses to retrieve the RDs from that URL. 

Lists the protocol used while collecting the RDs from a web site.