Administering Search Indexes

This chapter provides an overview of search indexes and discusses how to define search indexes.

Click to jump to parent topicUnderstanding Search Indexes in Enterprise Portal

A search index is a collection of files that is used during a search to quickly find documents of interest. You build a search index to enable searching on a given set of documents. The set of files that make up the index is a collection. This collection contains a list of words in the indexed documents, an internal documents table containing document field information, and logical pointers to the actual document files. Most content in Enterprise Portal can be searched after creating indexes.

Search Limitations

Managed Content that has been imported into another feature is not searchable. You can search for it in the Content Management search, but you will not retrieve results when searching for imported Managed Content in the following features: Action Items, Calendar Events, and Discussions. Content that has been created directly in these features is searchable as well as attachments that have been added in the feature. In addition, the Calendar feature will not index the actual website for a website that has been added directly, but it will index the metadata.

Click to jump to parent topicDefining Search Indexes

This section discusses how to:.

Note. Use only the Enterprise Portal search administration functionality to manage search indexes within Enterprise Portal (Portal Administration, Search). Do not use PeopleTools search administration (PeopleTools, Portal, Build Search Index).

Click to jump to top of pageClick to jump to parent topicPages Used to Define Search Indexes

Page Name

Definition Name

Navigation

Usage

Administer Indexes

EO_PE_SIDX_SUMMARY

Portal Administration, Search, Administer Indexes

Administer all PeopleSoft Enterprise Portal search indexes.

Add Index

EO_PE_ADD_INDEX

Click the Add Index button on the Administer Indexes page.

Add a record-based, file system spider, or HTTP spider index.

Record Indexes

EO_PE_RECD

Click the Edit Properties link for a record-based index on the Administer Indexes page.

Create and build record-based search indexes.

Edit Key

VEGGIE_SEC

Click the Edit Key link on the Record Indexes page.

Change the results that are returned by the Key returned in search results functionality. Oracle recommends that you retain the <pairs/> value in the Key returned in search results field.

Subrecords

EO_PE_RGW_SUBRECRD

Portal Administration, Search, Administer Indexes

Click Edit Properties. Select the Subrecords tab.

Define the subrecords that you want to include in the search index. This page is available only for record-based indexes.

Filesystem Index

EO_PE_FSYS

Click the Edit Properties link for a file system spider index on the Administer Indexes page.

Create and build file system spider search indexes.

HTTP Index

EO_PE_HTTP

Click the Edit Properties link for an HTTP spider index on the Administer Indexes page.

Create and build HTTP spider search indexes.

What To Index

EO_PE_WHATTOINDEX

Portal Administration, Search, Administer Indexes, What To Index

Define the MIME types and file names you want to include in the search index. This page is available only for file system spider and HTTP spider indexes.

Security

EO_PE_SIDXPERM

Portal Administration, Search, Administer Indexes

Click Edit Properties. Select the Security tab.

Define security access for the search index.

Filters

EO_PE_SIDX_PKG

Portal Administration, Search, Administer Indexes

Click Edit Properties. Select the Filters tab.

Define application classes to use as filters for the search index.

Click to jump to top of pageClick to jump to parent topicAdministering Search Index Definitions

Access the Administer Indexes page (Portal Administration, Search, Administer Indexes).

 

Index

Displays the name of the search index. To select an index, select the check box to the left of the index name. Delivered indexes are unavailable for selection because they should not be altered.

Gateway Type

Displays the type of gateway the search index uses to access its content.

Portal Index: Based on the portal registry.

Record-based Index: Based on records.

HTTP Spider Index: Based on a URL.

Filesystem Spider Index: Based on a file system location.

Edit Properties

Click for a record-based index to access the Record Indexes page, where you can edit index properties.

Click for an HTTP spider index to access the HTTP Index page, where you can edit index properties.

Click for a file system index to access the Filesystem Index page, where you can edit index properties.

Add Index

Click to access the Add Index page, where you can add a new index.

Delete Selected Indexes

Click to delete any indexes you have selected. Deleting an index definition also removes the actual collections stored in the file system, if any have been built.

Schedule Indexes

Click to access the Build Search Indexes page, where you can configure and launch the Build Search Indexes process (EO_PE_IBLDR).

See Also

Building Search Indexes

Click to jump to top of pageClick to jump to parent topicEditing a Record-Based Search Index Definition

Access the Record Indexes page (click the Edit Properties link for a record-based index on the Administer Indexes page).

Build Index

Click to run the Build Search Indexes process (EO_PE_IBLDRB) for the selected search index.

System Index

This is a delivered index and is not available for editing.

Index Location

Displays the current location of the index.

By default, the files for an index are located in <PS_CFG_HOME>/data/search/<INDEXNAME>/<db name>/</language cd>. However, you can change this location by specifying the search index location property in the application server and Process Scheduler configuration files.

See Enterprise PeopleTools 8.50 PeopleBook: System and Server Administration, “Building and Maintaining Search Indexes,” Specifying the Index Location

Menu Name

Select the menu name that is associated with the records you want to include in the index.

Market

Select the market that is associated with the records you want to include in the index.

Component

Select the component that is associated with the records you want to include in the index.

Key returned in search results

Displays information that you have entered on the Edit Key page.

This data is used to synthesize the VdkVgwKey, which supports an XML-like syntax enabling you to modify the tag that is returned by Verity. Oracle recommends that you retain the <pairs/> value, which means that the format of the Verity entry key will be FIELDNAME=VALUE.

Edit Key

Click to access the Edit Key page, where you can change the results that are returned by the Key returned in search results functionality. Oracle recommends that you retain the default value delivered.

Parent Data Record

Record

Enter records or views that contain data. Only one record is allowed in a record search index definition. To create a record search index definition that includes multiple records, create a view of multiple records and select the view here.

WHERE clause to append

Enter a SQL WHERE clause that you want to use to fine tune the search result data. For example, if you are indexing a table of all counties in all states in the United State, but you want only counties in California in this particular index, you could add a SQL WHERE clause of STATE = 'CA'.

Fields

How to zone the index

Field zone. Select to create one zone for each PeopleSoft field on the record. Applications can specify that they want to access that particular zone in their searches.

One zone. Select to put all of the data into one zone. With this option, the index builds more quickly, but the application can't restrict searches to the portions of the index that come from a particular field.

Click here for help with the Field Columns

Displays a page of help text.

Record and Field Name

After you select a value in the Record field, the record name and record fields appear in this grid.

Verity Field

Select to indicate that you want the field to be included in style.ufl and indexed as a Verity field. Verity fields are returned with search results and can be compared numerically.

Generally, PeopleSoft fields that contain metadata about what is being indexed (such as ProductID) should be indexed as Verity fields.

Word Index

Select to indicate that you want the field to be included in the word index. Anything that is not included in the word index cannot be searched for as plain text, although it may still be returned in a Verity field if you have selected the Verify Field option.

In general, PeopleSoft fields that contain a lot of descriptive text, such as description fields, should be included in the word index.

Has attachment

Select to indicate that the field contains binary large object (BLOB) data that will be detached and indexed along with the record. You should not select this option unless the attachment is stored as a BLOB.

Select this option if the selected field contains the URL to an attachment. In this way, this option enables you to index attachments that are referenced by URL and include their stored data in the Verity collection. Refer to the PeopleCode Developer's Guide for a description of file attachments.

The indexer downloads the attachment and indexes it as part of the Verity search collection document. This option is not available for selection for numeric fields, as numeric fields cannot contain URLs. It is available for selection only if the selected field contains character data.

You must use this option with a record that was designed for use with this feature. In the record, each row has a text field that contains a URI or an empty string.

The text must be a valid File Transfer Protocol (FTP) URI, including the login and password string, that uses the following format:

  • ftp://user:pass@host/path/to/filename.doc.

  • A valid record URI of the form record://RECORDNAME/path/to/file.doc.

  • A string of the form <urlid name="A_URLID"/>/path/to/file.doc.

The third form references an entry in the URL table defined on the URL Maintenance page. If the URL ID that is named in the name attribute is valid, the entire URI is rewritten with the part in brackets replaced by the actual URI.

For example, if A_URLID is equal to ftp://anonymous:user@resumes.peoplesoft.com, the entire string in the previous example becomes ftp://anonymous:user@resumes.peoplesoft.com/path/to/file.doc and is treated like any other FTP URI.

Rows of data with empty strings in the URI field are ignored with no error.

If the string is in one of these three valid URI formats and a document can be retrieved at the URI, the document is indexed with the same key as the rest of the row of data and is searchable.

Append to Verity Command Line

This control is intended for PeopleSoft internal use only, but can be used by users with adequate Verity knowledge.

See Also

Enterprise PeopleTools 8.50 PeopleBook: System and Server Administration, “Using PeopleTools Utilities,” URL Maintenance

Click to jump to top of pageClick to jump to parent topicEditing Keys

Access the Edit Key page (click the Edit Key link on the Record Indexes page).

Key returned in search results

Enter information to change the results that are returned by the Key returned in search results functionality. You can enter the following values to derive results:

<pairs/>. Inserts a string of NAME=VALUE;. One such pair is returned for each key of the record.

<row/>. Inserts the record keys in a SQL-like syntax.

<field fieldname='MYFIELD'/>. Inserts the value of MYFIELD, if it exists in the record.

<sql stmt='SQL STATEMENT'/>. Inserts the value that is returned by the SQL statement. The system accepts only the first row that is returned. PeopleSoft does not support SQL statements returning more than one column.

Test VdkVgwKey (save first)

Click to test the search results returned by the values you entered in the Key returned in search results field.

Before clicking this button be sure to have a record selected in the Record field on the Record Indexes page.

Click to jump to top of pageClick to jump to parent topicEditing a File System Search Index Definition

Access the Filesystem Index page (click the Edit Properties link for a file system spider index on the Administer Indexes page).

 

Build Index

Click to run the Build Search Indexes process (EO_PE_IBLDRB) for the selected search index.

System Index

This is a delivered index and is not available for editing.

Index Location

Displays the current location of the index.

By default, the files for an index are located in <PS_CFG_HOME>/data/search/<INDEXNAME>/<db name>/</language cd>. However, you can change this location by specifying the search index location property in the application server and Process Scheduler configuration files.

See Enterprise PeopleTools 8.50 PeopleBook: System and Server Administration, “Building and Maintaining Search Indexes,” Specifying the Index Location

Start Location

Specify the network file system path that contains the documents to index. Ensure that the local application server has the proper access to the file systems that you specify.

For Microsoft Windows, this means the drive mappings must be set up from the application server. For UNIX, this means the correct network file system (NFS) mappings must be set on the application server.

Remap to URL

Enter the HTTP alias that you want to assign to the file system crawl results.

Append to Verity Command Line

This control is intended for PeopleSoft internal use only, but can be used by users with adequate Verity knowledge.

Click to jump to top of pageClick to jump to parent topicEditing an HTTP Spider Search Index Definition

Access the HTTP Index page (click the Edit Properties link for an HTTP spider index on the Administer Indexes page).

Build Index

Click to run the Build Search Indexes process (EO_PE_IBLDRB) for the selected search index.

System Index

When selected, indicates that this is a delivered index and is not available for editing.

Index Location

Displays the current location of the index.

By default, the files for an index are located in <PS_CFG_HOME>/data/search/<INDEXNAME>/<db name>/</language cd>. However, you can change this location by specifying the search index location property in the application server and Process Scheduler configuration files.

See Enterprise PeopleTools 8.50 PeopleBook: System and Server Administration, “Building and Maintaining Search Indexes,” Specifying the Index Location

Start Location

Enter the URL to content you want to include in the index. You can include one URL per search index definition. URLs should contain only the alphanumeric characters as specified in RFC 1738. Any special character must be encoded. For example, encode a space character as %20, and encode a < as %3c. Additional examples are available.

See RFC 1738

Stay in Domain

Select to limit indexing to a single domain. For example, suppose that you are indexing http://www.peoplesoft.com. If you have selected this option and a link points to a site outside the PeopleSoft domain, the indexing ignores the link.

Stay in Host

Select to further limit indexing to within a single server. If you select this option, the index contains references to content only on the current web server or host. Links to content on other web servers within the domain are ignored. For example, if you are indexing http://www.peoplesoft.com and you select this option, the index will include documents on http://www.peoplesoft.com, but not on http://www1.peoplesoft.com.

Link Depth

Set the level of detail to which you want to index a certain site. If you enter 1, the indexing starts at the homepage, follows each link on that page, indexes all of the data on the target pages, and then stops. If you enter 2, the indexing follows the links on the target pages and indexes one more level into the website.

As you increase the number, the number of links that the indexing follows increases geometrically. Do not set this value too high, as it can impact performance negatively. You should not need to set this value higher than 10.

Proxy Host and Proxy Port

Enter a host and port for the indexing to use. Enter the same settings that you would use in your web browser if you need a proxy to access the internet.

Append to Verity Command Line

This control is intended for PeopleSoft internal use only, but can be used by users with adequate Verity knowledge.

Click to jump to top of pageClick to jump to parent topicDefining What to Include in File System and HTTP Search Indexes

Access the What To Index page (select the What To Index tab).

Mime Types (Multipurpose Internet Mail Extension types)

Index all Mime-types

Select to index all MIME types on a website.

Index only these Mime-types

Select to index only certain MIME types. Specify the MIME types to include in the MIME/Types Allowed list box. Use a space to separate multiple MIME types.

Exclude these Mime-types

Select to exclude a set of MIME types. Specify the MIME types to exclude in the MIME/Types Allowed list box. Use a space to separate multiple MIME types.

File Names

Index all filenames

Select to index all file names.

Index only these filenames

Select to index only certain file types. Specify the file types to include in the Pathname Globs List list box. Use a space to separate multiple file types.

Exclude these filenames

Select to exclude certain file types, such as temporary files. Specify the file types to exclude in the Pathname Globs List list box. Use a space to separate multiple file types.

Pathname Globs List

Specify the file types you have chosen to include or exclude. You can use wildcard characters (*) to denote a string and “?” to denote a single character. For example, the string *.doc 19??.excel means select all files that end with the .doc suffix and Microsoft Excel files that start with 19, followed by 2 characters.

Click to jump to top of pageClick to jump to parent topicDefining Search Index Security

Access the Security page (select the Security tab).

Access Type

Public. Select to indicate that you want the search index to be searchable by all users.

Roles. Select to indicate that you want the search index to be searchable only by the roles you define in the Role Name field. The search index will be included in a user's PeopleSoft Enterprise Portal search only if the user is a member ofat least one role specified in Role Name.

Click to jump to top of pageClick to jump to parent topicDefining Search Index Filters

Access the Filters page (select the Filters tab).

App Class Type (application class type)

Select the application class type you want to use as a filter for the search index. Available values include:

Index Builder Callout. The application class used by the index builder to extend the processing of the creation of the handled index. At build time, the index builder will attempt to call out to the specified application class to perform any custom processing during the creation of the search collection.

Search Query Filter. The application class used to process and/or filter search results returned from this search collection. The search query filters can be used to post-process raw Verity search results, as well as apply security and prevent certain search results from being returned to certain users.

Package Name

Enter the package name you want to use.

Package Path

Enter the path to the package you want to use.

Application Class Name

Enter the application class name you want to use.

CallOut Type

Lists the SQL object used to select the URLs to be indexed. Enables indexing of the actual website, not just the metadata that lists the website's URL.

  • Content. Selects unique content IDs.

  • Folder. Selects the content ID/folder ID.

  • Portal. Selects the content ID/folder ID/portal name.

Selection SQL

Corresponds to Selection SQL The types are predefined to uniquely select the content rows that are to be indexed.