Chapter 6
Search Attributes: Database
The Database attributes are divided as follows:
|
Note
|
To partition the database, you must use the command line function because stopping the search server is required.
|
|
Management
The initial Management page lists the available databases. You can create a new one, reindex, purge, or expire an existing one. Use the checkbox to select a database on which to perform an action. Use the small icons above the checkbox to select or deselect all the databases. When you select Reindex, Purge or Expire, a prompt confirming that you want to perform the action with a list of database names displays. To perform the action, select OK.
You should reindex the database if you have edited the schema to add or remove an indexed field (as author), or if a disk error has corrupted the index. You need to restart the server after you change the schema.
Because the time required to reindex the database is proportional to the number of RDs in the database, a large database should be reindexed when the server is not in high demand.
When you purge the contents of the database, disk space used for indexes will be recovered, but disk space used by the main database will not be recovered; instead, it is reused as new data is added to the database.
Expiring a database deletes all RDs that are deemed out-of-date. It does not decrease the size of the database. By default, an RD is scheduled to expire in 90 days from the time of creation.
You can also edit the database by selecting the Edit link which takes you to a page where you define the database attributes.
Table 6-1 lists the Database Management attributes. The table contains three columns: the first column identifies the attribute, the second column provides the default value for the attribute, and the third column describes the attribute.
Table 6-1 Database Management Attributes
Attribute
|
Default Value
|
Description
|
Name
|
Default
|
Name for the database used by Search.
|
Title
|
Blank
|
A title for the database.
|
Description
|
Blank
|
Describe the database for yourself.
|
Import Agents
Import agents are the processes that bring resource descriptions from other servers or databases and merge them into your search database.
The initial Import page lists the available import agents. You can create a new one, or run, edit or delete an existing one. Use the checkbox to select an agent to delete. Use the small icons above the checkbox to select or deselect all import agents. Use the radio buttons to turn an Agent Action On or Off. To schedule the import agents, select Schedule on the lower menu bar.
If you choose to edit or modify an existing import agent or create a new one, the following attributes are displayed.
Table 6-2 lists the Database Import Agent attributes. The table contains three columns: the first column identifies the attribute, the second column provides the default value for the attribute, and the third column describes the attribute.
Table 6-2 Database Import Agent Attributes
Attribute
|
Default Value
|
Description
|
Charset
|
Blank for new
|
Specifies the character set of the input SOIF stream. For example, ISO8859-1, UTF-8, UTF-16. Character sets ISO8859-1 through ISO8859-15 are supported.
|
Import From
|
Local File
|
Select either Local File or Search Server (if one is enabled).
|
Local File Path
|
Blank for new
|
Gives the full path name of local file that contains valid resource descriptions in SOIF (Summary Object Interchange Format). This can be a file on another server, as long as the path is addressable as if it were locally mounted.
|
Database Name
|
Default
|
Name of the destination database.
|
Remote Server
|
Blank for new
|
Gives the URL of the search server to retrieve resource descriptions from; format http://www.sesta.com:80
|
Instance Name
|
Blank for new
|
Server instance name used by the search server. You can find this instance name in the Server Preferences for the server you are importing from. Value must be 3.01C or 3.01C SP1.
|
Search URI
|
Blank for new
|
Enter full path and file names. Use /portal/search.
|
Is Compass Server 3.01X?
|
False (unchecked)
|
Is the server you are importing from a Compass Server 3.01X?
|
Enable SSL
|
False (unchecked)
|
If this is a server-to-server transaction, select if the servers should use the SSL (Secure Sockets Layer) protocol.
|
Authentication
|
None (default)
|
None (default) or Use User/Password
This specifies how the import agent should identify itself to the system it imports from. By default, no authentication is used. If the server you want to import from requires authentication, you can specify a user name and password for the import agent to use. Importing from 3.01C does not require authentication. Importing data from 3.01C SP1 requires authentication.
|
User
|
Blank for new or none
|
If you selected Use User/Password, enter a user.
|
Password
|
Blank for new or none
|
If you selected Use User/Password, enter a password (shown as *).
|
Content Transfer
|
Use Incremental Gathering of Full Contents (default)
|
Choice of Use Incremental Gathering of Full Contents (default) or Use Search Query
These specify which resource descriptions to import from the source.
By default, an import agent asks for all resource descriptions added or changed since its last import from the same source.
The search query specifies that the import agent should request only certain resource descriptions from the source. This is much the same way that users request listings of resources from the search database.
Use Scope, View-Attributes and View-Hits fields to specify the query.
Use Collect All RDs to collect new resource descriptions since the last run and remove the timestamp in Newest Resource Description.
|
Scope
|
Blank for new
|
The text of the query. The query syntax is identical to that used for end-user queries from the server.
|
View-Attributes
|
Blank for new
|
Lists which fields (not case sensitive) you want to import in each resource description. For example, title and author. The default is all.
|
View-Hits
|
Blank for new
|
The maximum number of matching resource descriptions to import. If no hits are specified, it defaults to 20.
|
Agent Description
|
Blank for new
|
Appears in the list of available import agents on the initial Import page. It is ignored by the program. If this field is blank, the Resource Description Source file name or server name is used to identify the import agent. Note here if user name and password are needed.
|
Newest Resource Description
|
Blank for new
|
The date of the creation of the newest resource description previously imported by this import agent. This date is used by the Use Incremental Gathering of Full Contents option to determine which resources are new and should be imported.
|
Network Timeout in seconds
|
Blank for new
|
Specifies the number of seconds the import agent will allow before timing out the connection over the network. You can adjust this to allow for varying network traffic and quality.
|
Resource Descriptions
The initial Resource Descriptions page allows you to search the Resource Descriptions in the database. For example, you can correct a typographical error in an RD or manually assign RDs discovered by the robot to categories.
Table 6-3 lists the Resource Descriptions attributes. The table contains three columns: the first column identifies the attribute, the second column provides the default value for the attribute, and the third column describes the attribute.
Table 6-3 Resource Descriptions Attributes
Attribute
|
Default Value
|
Description
|
Search For
|
All RDs
|
All RDs, Uncategorized RDs, Categorized RDs, RDs by category, Specific RD by URL, RDs that contain???
|
Text box
|
Blank
|
Enter a unique text string to identify the RDs searched for. Use with the RDs by category, Specific RD by URL, and RDs that contain attribute values.
|
Database
|
Default
|
Name of the database to search.
|
Select Category
|
|
Browse and select a category from the category tree.
|
Delete
|
|
Delete one or more selected RDs that are returned from an RD search.
|
Next
|
|
Display the next set of RDs returned from an RD search
|
Previous
|
|
Display the previous set of RDs returned from an RD search
|
Edit Selected
|
|
Edit the attributes of one or more RDs that are returned from an RD search.
|
Edit All
|
|
Edit the attributes of the current set of RDs that are returned from an RD search.
|
To limit the search by category, select Select Category. A Category Editor page displays allowing you to specify the category from the taxonomy for the search. You can specify the category in the Selected Category text box or browse the taxonomy to select it. After specifying the category, select OK to return to the RD search page.
Table 6-4 lists the Category Editor attributes. The table contains three columns: the first column identifies the attribute, the second column provides the default value for the attribute, and the third column describes the attribute.
Table 6-4 Category Editor Attributes
Attribute
|
Default Value
|
Description
|
Selected Categories
|
Blank
|
Text field that displays the selected categories
|
Expand All
|
|
Expands the taxonomy so that all entries in the hierarchy display for browsing.
|
Collapse All
|
Blank
|
Collapses the taxonomy so that only categories within the first two levels of the hierarchy display for browsing.
|
Categories per page
|
25
|
Drop down list of the number of categories to display per page. Values are 25, 50, 100, 250, 500, and all.
|
A successful search displays the Number of RDs found and a list box with the RDs found. After clicking on the Edit link of an RD, the following attributes, which you can edit, and partial text of the RD are displayed. All these attributes except Classification are set to editable in the Database/ Schema page.
Table 6-5 lists the Database RD Editable attributes. The table contains three columns: the first column identifies the attribute, the second column provides the default value for the attribute, and the third column describes the attribute.
Table 6-5 Database RD Editable Attributes
Attribute
|
Default Value
|
Description
|
Author
|
Blank
|
Author(s) of the document.
|
Author e-mail
|
Blank
|
Email address to contact the Author(s) of the document.
|
Classification
|
Category name of selected RD.
|
Category name if classified; No Classification if not classified.
|
ReadACL
|
Blank
|
Related to document level security.
|
Content-Charset
|
|
Content-Charset information from HTTP Server.
|
Content-Encoding
|
Blank
|
Content-Encoding information from HTTP Server.
|
Content-Language
|
Blank
|
Content-Language information from HTTP Server.
|
Content-Length
|
Blank
|
Content-Length information from HTTP Server.
|
Content-Type
|
Blank
|
Content-Type information from HTTP Server.
|
Description
|
Description from the selected RD.
|
Description from RD.
|
Expires
|
Valid date.
|
Date on which resource description is no longer valid.
|
Full-Text
|
Blank
|
Entire contents of the document.
|
Keywords
|
Keywords, if any, from the selected RD.
|
Keywords taken from meta- tags.
|
Last-Modified
|
Last modification date
|
Date when the document was last modified.
|
Partial-text
|
Partial text of the document
|
Partial selection of text from the document
|
Phone
|
Blank
|
Phone number for Author contact
|
Title
|
Title of the selected RD.
|
Title of RD
|
URL
|
Blank
|
Uniform Resource Locator for the document
|
Schema
The schema determines what information is in a resource description and what form that information is in. You can add new attributes or fields to an RD and set which ones can be edited and which ones can be indexed. When importing new RDs, you can convert schemas embedded in new RDs into your own schema.
Table 6-6 lists the Database Schema Edit attributes. The table contains three columns: the first column identifies the attribute, the second column provides the default value for the attribute, and the third column describes the attribute.
Table 6-6 Database Schema Edit Attributes
Attribute
|
Description
|
Author
|
Author(s) of the document.
|
Author-EMail
|
Email address to contact the Author(s) of the document.
|
Content-Charset
|
Content-Charset information from HTTP Server.
|
Content-Encoding
|
Content-Encoding information from HTTP Server.
|
Content-Language
|
Content-Language information from HTTP Server.
|
Content-Length
|
Content-Length information from HTTP Server.
|
Content-Type
|
Content-Type information from HTTP Server.
|
Description
|
Brief one-line description for document.
|
Expires
|
Date on which resource description is no longer valid.
|
Full-Text
|
Entire contents of the document.
|
Keywords
|
Keywords that best describe the document.
|
Last-modified
|
Date when the document was last modified.
|
Partial-Text
|
Partial selection of text from the document.
|
Phone
|
Phone number for Author contact.
|
ReadACL
|
Used by Search servers to enforce security.
|
Title
|
Title of the document.
|
URL
|
Uniform Resource Locator for the document
|
Aliases
Name
Description
|
When you import new RDs, you can convert schemas embedded in new RDs into your own schema. You would use this conversion when there are discrepancies between the names used for fields in the import database schema and the schema used for RDs in your database. An example would be if you imported RDs that used Writer as a field for the author and you used Author in your RDs as the field for the author. The conversion would be Writer to Author, so you would enter Writer in this text box.
|
Data Type
|
Defines the data type.
|
Editable
|
If true (checked), the selected attribute (field) appears in the Database RD Editor, so you can change its values.
Description, Keywords, Title and ReadACL are editable.
|
Indexable
|
If true (checked), the selected attribute (field) can be used as a basis for indexing.
Author, Title and URL appear in the menu in the Advanced Search screen for the end user. This allows end users to search for values in those particular fields.
Author, Expires, Keywords, Last Modified, Title, URL and ReadACL can be used as the basis for indexing.
|
Score Multiplier
|
A weighting field for scoring a particular element. Any positive value is valid.
|
Analysis
The Analysis page shows a sorted list of all sites and the number of resources from that site currently in the search database. Select Update Analysis to update the analysis on file.
Table 6-7 lists the Database Analysis attributes. The table contains three columns: the first column identifies the attribute, the second column provides the default value for the attribute, and the third column describes the attribute.
Table 6-7 Database Analysis Attributes
Attribute
|
Default Value
|
Description
|
Total number of RDs
|
Current number of RDs in database.
|
Lists current total number of resource descriptions in the database.
|
Number of servers
|
Current number of servers that the database is partitioned across.
|
The database can be partitioned and placed on a number of servers.
|
Site
|
URL or domain that the robot has successfully searched.
|
A URL or domain that has added resource descriptions to the database.
|
Number of RDs
|
Current number of RDs from that site.
|
Lists current number of RDs from that site.
|
Type
|
Type of RD
|
Resource descriptions can be of many different types, for example, http.
|
Percentage
|
Type of RD/ Total number of RDs
|
Percentage of this type of document compared to the total number of resource descriptions.
|
Schedule
This page is where you set up the schedule for running the import agents. Table 6-8 lists the Database Import Schedule attributes. The table contains three columns: the first column identifies the attribute, the second column provides the default value for the attribute, and the third column describes the attribute.
Table 6-8 Database Import Schedule Attributes
Attribute
|
Default Value
|
Description
|
Start Import Time in hours and minutes
|
00:00
|
Time that the import agent starts to import.
|
Days
|
none selected
|
Sun -Sat
Check at least one day.
|