Before users can execute searches, they need a database of searchable data against which they can target their searches. To do this, you create a database, called a collection, that indexes and stores information about the documents such as their content and file properties.
Searches require collections of files upon which to perform their searches. Once the documents are indexed, their contents and file properties, such as their titles, creation dates, and authors, are available for searching.
You can add or delete documents from a collection: optimizing, updating, and managing your collections as needed.
Note
Search cannot work if the web publishing collection (web_htm) does not yet exist or has been deleted. If search does not work, restart the server with the web publishing function turned on (the default), and try searching again.
About Collections
When your server administrator indexes all or some of a server's documents, information about the documents is stored in a collection. Collections contain such information as the format of the documents, the language they are in, their searchable attributes, the number of documents in the collection, the collection's status, and a brief description of the collection. For more details, see Displaying Collection Contents.
When you create a collection, you indicate the type of files that it contains: HTML, ASCII, news, email, PDF, or multiple formats. This determines what happens during indexing: which attributes are indexed and what, if any, file conversion has to be done. Files in multi-format collections are converted to HTML. You can index all the files in a directory or only those with a specific extensionfor example, all the HTML, PDF, or *.doc documents.
A collection has records with information about each document that has been indexed. If the document is deleted from the collection, only the collection's entry for that document is removed. The original document is not deleted.
When you have multiple server instances, the collection you create is only associated with the server instance on which the collection was created. Therefore, users can only search collections for that server instance.
About Collection Attributes
Server documents can be in a variety of formats, such as HTML, Microsoft Excel, Adobe PDF, and WordPerfect provided that there is a conversion filter available for a particular file format. With the filters, the server converts the documents into HTML as it indexes them so that you can use your web browser to view the documents that are found for your search. Enterprise Server does not include a standard set of filters for converting different file types to HTML for search indexing and web publisher conversion. If you want to use these capabilities, you can purchase filters from Verity. For more information, see http://www.verity.com/netscapefilters. Once you get the filters you must configure your system to be able to use them. See Installing Filters for more information.
Complex PDF files, such as those that are password protected or that contain graphical navigation elements cannot be correctly converted when they are indexed as part of a multi-format collection. The file data converts correctly when they are part of a PDF-only collection. Graphic elements are not converted.
Certain file formats have a default set of attributes that are indexed for files of that type, as shown in Table 16.2.
By default, HTML collections have Title and SourceType attributes, but they can be indexed to permit searching and sorting by up to 30 file attributes tagged with the HTML <META> tag. You can change the maximum settings for file attributes in webpub.conf, as discussed in Adjusting the Maximum Number of Attributes.
For example, a document could have these lines of HTML code:
If this document was indexed with its META tags extracted, you could search it for specific values in the writer or product fields. For example, you could enter this query: Writer <contains> Bonzini or Product <contains> Comm.
Note. Any attribute values in META-tagged fields are text strings only, which means
that dates and numbers are sorted as text, not as dates or numbers. Also, illegal
HTML characters in a META-tagged attribute are replaced with a hyphen. You
can use the Add Custom Property window (choose Web Publishing and click
the Add Custom Property link) to redefine the text-formatted dates and
numbers so that you can perform searches based on actual dates and numbers
for data in the Web Publishing collection.
Installing Filters
If you are not migrating from Enterprise Server 3.6, and you want to enable document conversion, you must first purchase and install the document filters:
Download the filters from Verity at:
Uncompress them and install them into the server_root/plugins/search directory.
For each instance of the server for which you want to enable document conversion, open the server_root/https-server-id/config/webpub.conf file for editing.
Within the [NS-loader] section of this file, add the following line:
Go to the Server Manager for the server instance.
Click Apply to apply the changes you made to the webpub.conf file for that server instance. The server automatically restarts.
If you are migrating from Enterprise Server 3.6, you may already have the filters you need. For more information, see the migration information in the Enterprise Server Installation and Migration Guide.
Creating a New Collection
You can create a collection that indexes the content of all or some of the files in a directory. You can define collections that contain only one kind of file or you can create a collection of documents in various formats that are automatically converted to HTML during indexing. When you define a multiple format collection (with the auto-convert option), the indexer first converts the documents into HTML and then indexes the contents of the HTML documents. The converted HTML documents are put into the html_doc directory in the server's search collections folder.
You can only have 12 collections on your server, which is limited to 10 user-defined collections for any server that uses web publishing. If you want to use a 13th collection, you must remove one of your existing collections (choose Search and click the Maintain Collection link). Do not remove the web publishing collection if one exists for your server.
You can only have entries for a maximum of 16 million documents in your collections. A document that is indexed in multiple collections counts as multiple documents. It is best to create new collections of over 10,000 documents at low-traffic times, or the indexing operation may affect your system's performance.
Note
You need to have at least 3MB of available disk space on your system to create
a collection. For information on how you can restrict the size of the index files,
see Restricting Your Index File Size.
To create a new collection:
From the Enterprise Server, choose Search.
Click the New Collection link.
You can select any of the items in the drop-down list as a starting point for finding the directory you want to index.
If you want to index a different subdirectory, click the View button to see a list of resources.
You can index any directory that is listed or you can view the subdirectories in a listed directory and index one of those instead. Once you click the index link for a directory, you return to the Create Collection window and the directory name appears in the Directory to Index field.
You can index all HTML files in the chosen directory by leaving the default *.html pattern in the "Documents matching" field or you can define your own wildcard expression to restrict indexing to documents that match that pattern.
Note:
To also index the subdirectories within the specified directory, click "Include Subdirectories."
In the Collection Name field, type a name for your collection.
Note.
In the optional Collection Label field, type a user-defined name for your collection.
In the optional Description field, type a description for your collection up to a maximum of 1024 characters.
Select the type of files the collection is to contain: ASCII, HTML, news, email, PDF, or multiple document formats.
Select whether or not to extract META-tagged attributes from HTML files during indexing.
Select the collection's language from the drop-down list.
Click OK to create a new collection.
Note
Once you begin indexing a collection, you cannot stop the process until either
the indexing is complete or you reboot the system. Shutting down your server
does not kill the process.
Configuring a Collection
After you have initially created a collection, you can modify some of the initial settings for the collection. This data resides in the collection information file, dblist.ini, and when you reconfigure a collection, the dblist.ini file is updated to reflect your changes. For more information about the configuration files, see Configuring Manually. You can revise the description, change its label, define a different URL for its documents, and define how to indicate highlighting in displayed documents, which pattern files to use, and how to format dates.
Note
This window allows you to modify some of the settings for the web publishing default collection, web_htm, because you are not changing actual collection data. Avoid making unnecessary changes to this collection's settings.
To configure a collection:
From the Enterprise Server, choose Search.
Click the Configure Collection link.
In the optional Description field, you can type a description for your collection up to a maximum of 1024 characters.
In the optional Collection Label field, you can type a user-defined name for your collection.
In the URL for Documents field, you can type in the new URL mapping for the collection's documents if that has changed.
In the Highlight Begin and Highlight End fields, you can type in the HTML tagging you want the server to use when highlighting a search query word or phrase in a displayed document.
You can define different default pattern files for displaying the search results: how the search result's header, footer, and list entry line are formatted, respectively.
In the Result Pattern File field, you can enter the name of the pattern file you want to use when displaying a single highlighted document from the list of search results.
In the Date Format field, you can specify how you want input dates to be interpreted when using this collection: MM/DD/YY, DD/MM/YY, or YY/MM/DD.
Click OK to change the collection configuration.
Updating a Collection
After you have initially created a collection, you may want to add or remove files. If you are adding documents, the files' contents are indexed (and converted if necessary), when their entries are added to the collection. If you are removing documents, the entries for the files are removed from the collection along with their metadata. This function does not affect the original documents, only their entries in the collection.
Note. If you selected the Extract Metatags option when you created this collection,
then the META-tagged HTML attributes are indexed whenever you add new
documents to this collection.
To update a collection:
From the Enterprise Server, choose Search.
Click the Update Collection link.
Select the collection you want to update from the drop-down list.
In the Documents Matching field, you can type in a single filename or you can use wildcards to specify the type of files you want added to or removed from the collection.
Select whether to index and add all matching documents from the subdirectories of the document directory that was originally defined for the collection.
Click AddDocs to add the indicated files and subdirectories.
Click RemoveDocs to remove the indicated files.
Maintaining a Collection
Periodically, you may want to maintain your collections. With normal usage, these tasks may not be necessary, but if you do a great deal of indexing and updating of collections, you may want to use some of these functions occasionally. You can perform the following collection management tasks:
Note
Do not use your local file manager to remove collections, especially not the web publishing collections. If by chance you do, when you try to execute a search before restarting your server again, the search will fail even if it doesn't use the web publishing collection. Once you restart your server, a new web publishing collection will be automatically created for you, so your search can execute.
To perform any of the collection management tasks, use The Maintain Collection Page in the Enterprise Server user interface.
Scheduling Regular Maintenance
You can schedule collection maintenance at regular intervals. You can set up separate maintenance schedules for optimizing and reindexing. With normal usage, these tasks may not be necessary, but if you do a great deal of indexing and updating of collections, you may want to use some of these functions occasionally. For example, some very active web sites may require frequent reindexing if new documents are added on a daily basis.
A common combination of tasks is to set up a pair of regularly scheduled reindex and update operations to clean out deleted entries an to add entries for new documents matching your collection criteria.
You can optimize a collection to improve performance if you frequently add, delete, or update documents or directories in your collections. An analogy is defragmenting your hard drive. Optimizing is not done automatically, so you must manually optimize after you reindex or update a collection. One situation when you might want to optimize a collection is just before publishing it to another site or before putting it onto a read-only CD-ROM.
You can reindex a collection, which locates each file that has an entry in the collection and reindexes its attributes and contents, extracting the META-tagged attributes if that option was selected when the files were originally indexed into the collection. This does not add entries for new documents but cleans up the collection by removing entries to files that have been deleted.
You can update a collection, by entering new indexing criteria for the collection, say *.html, which adds any new documents that match the criteria.
To optimize, reindex, or update your collection:
From the Enterprise Server, choose Search.
Click the Schedule Collection Maintenance link.
Choose a collection from the drop-down list.
Choose an action from the drop-down list: Reindex, Optimize, or Update.
If you choose to update your collection, two extra fields are displayed for entering the document matching criteria and for including documents found in subdirectories that match your criteria.
In the Schedule Time field, type in the time of day when you want the scheduled maintenance to take place.
In the section labeled "Schedule Day(s) of the Week," check one or more of the day checkboxes.
Click OK to schedule the maintenance.
For Unix users, to make your newly scheduled maintenance take effect, you must restart the ns-cron process.
To restart the ns-cron process:
Click the Enterprise Server link.
Choose Global Settings.
Click the Cron Control link.
If NS-CRON is already on, click Restart to restart it. If NS-CRON is not on, click Start to start it up.
Unscheduling Collection Maintenance
If you have scheduled regular reindexing or optimizing of a collection, you can remove the scheduled maintenance when you no longer want the collection to be maintained at regular intervals.
To unschedule collection maintenance:
From the Enterprise Server, choose Search.
Click the Remove Scheduled Collection Maintenance link.
Choose a collection from the drop-down list for Choose Collection.
Choose an action from the drop-down list: Reindex or Optimize.
In the lower part of the frame, you can see the time and days of the week when the scheduled maintenance is currently scheduled to take place.
Click OK to remove the scheduled maintenance.
For Unix users, to make your newly scheduled maintenance take effect, you must restart the ns-cron process.
To restart the ns-cron process:
Click the Enterprise Server link.
Choose Global Settings.
Click the Cron Control link.
If NS-CRON is already on, click Restart to restart it. If NS-CRON is not on, click Start to start it up.
|