9 Configuring the Search Features in Oracle Portal

This chapter provides information on setting up the search capabilities in Oracle Portal. This includes how to set up Oracle Text and maintain Oracle Text indexes.

This chapter contains the following sections:

9.1 Search Options in Oracle Portal

Oracle Portal offers powerful search capabilities that you can customize according to your needs. A robust set of built-in search portlets enables you to perform searches on the portlet repository, portal pages and external sites.

Furthermore, you can perform searches against more than 100 document types including HTML, XML, PDF, word processing formats, spreadsheets formats, presentation formats, and other common business formats.

This section introduces the search options that are available in Oracle Portal and gives some guidance on how you can select which option is best for you:

9.1.1 Oracle Portal Search

Oracle Portal includes a set of built-in features tuned for searching content stored and managed within the Oracle Portal Repository. These features are incorporated within four search portlets and they can be configured in a variety of ways:

  • Basic Search — this portlet allows simple keyword searches.

  • Advanced Search — this portlet enables you to enter more detailed search criteria, including operators on multiple attributes values.

  • Custom Search — this portlet is fully customizable and enables you to design a search portlet to suit your needs, including pre-defined searches that display results in place. As this portlet is a superset of the Basic and Advanced search portlets, it can be configured to look and behave like these portlets if required.

  • Saved Searches — this portlet enables you to repeat saved searches.

These portlets search all text-type metadata associated with content in the Oracle Portal Repository. For example, display name, keyword, description, and similar attributes.

In addition to metadata, the portlets can search the portal content and this is possible as Oracle Text is enabled in Oracle Portal by default.


Note:

This does not apply to the content of portlets. Portlet content cannot be indexed and consequently cannot be searched.

This means that Oracle Portal search portlets also search in:

  • Documents/files and URL items — file and URL items in binary format can be indexed providing the file format is filterable by Oracle Text.

  • Web pages that URLs (in URL attributes) point to — the content must be plain text or HTML.


Note:

If more than one search term is specified along with an AND search operator (like Contain All of the Terms, Partially Match All of the Terms, Sound Like All of the Terms, and so on), then the terms must all appear within the same search index to result in a match. For example, if you enter 'weights aerobics' and choose the Contains All operator, then search results are returned only when both these terms are found in item metadata, URL content, or document content. If the term weights is found in URL content and the term aerobics is found in document content, then this does not result in a match.

To find out how to configure Oracle Text for use in Oracle Portal, see Section 9.2.2, "Configuring Oracle Text Options in Oracle Portal". To learn more about Oracle Text, how to maintain Oracle Text indexes, and for troubleshooting information, see Section 9.3, "Oracle Text".

To find out how you can configure Oracle Portal search portlets, see Section 9.2.1, "Configuring Oracle Portal Search Portlets". To learn more about Oracle Portal search portlets and how to add search functionality to Oracle Portal pages, refer to the Oracle Fusion Middleware User's Guide for Oracle Portal.

Disabling Oracle Text

Out-of-the-box, Oracle Text is enabled. Although Oracle does not recommend that you disable Oracle Text, it is possible to do so, if your portal does not require or would not benefit from full text searches for Oracle Portal Repository content. For more information, see Section 9.3.1.1, "Searching With Oracle Text Disabled".

Search Results and Content Security

Oracle Portal search result pages can display items, pages, categories, or perspectives that meet your search criteria. Refer to Section 9.1.3, "Default Search Functionality" for more information. Search results do not include:

  • Content you are not authorized to view

  • Content that has expired, or is not yet published

  • Page content that is derived from a template

  • Portlet instances or Portal Smart Links

  • Multiple versions of an item (when versioning in enabled, only the current version of items are returned in search results)

Page designers can choose whether to display links to associated objects with each search result. For example, users may see links to the page group, page, category and perspective associated with an item. However, users who click such links are denied access to the object, if they do not have the required access privileges.

9.1.2 Oracle Secure Enterprise Search

Oracle Secure Enterprise Search (SES) provides an enterprise search capability over a variety of content repositories and data sources, including the Oracle Portal Repository. Oracle Secure Enterprise Search includes a secure search portlet that can be embedded in Oracle Portal pages.

From this portlet, a user can enter a search term and launch a search that returns a single result set that includes content from all configured data sources. When Oracle Portal is configured as one of the data sources, the search returns public content and any private content that the user has the appropriate privileges to see.

More on OTN

The Oracle Secure Enterprise Search Administrator's Guide provides detailed configuration instructions for Oracle Secure Enterprise Search and is available on the Oracle Technology Network (OTN) at http://www.oracle.com/technology/products/oses/index.html.

See Section 9.4, "Oracle Secure Enterprise Search" for more of an overview of Oracle Secure Enterprise Search.

9.1.3 Default Search Functionality

After a standard Oracle Portal installation you can start using the search features in Oracle Portal right away. Without any additional configuration, you can place one of the built-in Oracle Portal search portlets on a page and use it to search portal content.

During installation, Oracle Text indexes are created and synchronized and Oracle Text searching is enabled in your portal. However, it is important to note that new or modified content (items, pages, categories, perspectives) is not returned in search results until the Oracle Text indexes are next synchronized. See Section 9.3.5.1, "Synchronizing Oracle Text Indexes".

By default, Oracle Text indexes are scheduled to synchronize hourly by a job that calls wwv_context.sync. If you find that the default synchronization interval is not suitable for your portal you can modify it at any time. For details, see.Section 9.3.5.5, "Deciding How Often to Synchronize Oracle Text Indexes".

If you are using Oracle Database 11g, you can specify that Oracle Text indexes synchronize automatically whenever portal objects are added, modified, or deleted. This feature is useful for portal applications where newly added or altered content must be searchable immediately. To find out more, see Section 9.3.5.2, "Synchronizing an Oracle Text Index On Commit".


Note:

If you do not want to make use of the additional features provided by Oracle Text, then you can disable this feature. See Section 9.2.2.1, "Enabling and Disabling Oracle Text in Oracle Portal".

Table 9-1 shows some other default search settings. See Section 9.2, "Configuring Oracle Portal Search Options" for information about how to change the values listed here.

Table 9-1 Default Search Settings

Search Setting Option Default

Results Page - Basic Search Portlets and Basic Search Box Items

Basic Search Results Page

Results Page - Advanced, Custom and Saved Search Portlets

Search Results Page

Advanced Search Link

Advanced Search Page

Internet Search Engine Link

None

Hits per Page

20

Oracle Text


Enabled

Oracle Text - Themes And Gists

Disabled

Oracle Text - Highlight Text Color

Default

Oracle Text - Highlight Text Style

Plain

Oracle Text - Base URL

http://<host>:<port>/portal/pls/<dad>


The following images show default search portlets and pages:

Figure 9-1 Oracle Portal Basic Search Portlet

Description of Figure 9-1 follows
Description of "Figure 9-1 Oracle Portal Basic Search Portlet"

Figure 9-2 Oracle Portal Basic Search Results Page

Description of Figure 9-2 follows
Description of "Figure 9-2 Oracle Portal Basic Search Results Page "

Figure 9-3 Oracle Portal Advanced Search Portlet

Description of Figure 9-3 follows
Description of "Figure 9-3 Oracle Portal Advanced Search Portlet"

Figure 9-4 Oracle Portal Custom Search Portlet

Description of Figure 9-4 follows
Description of "Figure 9-4 Oracle Portal Custom Search Portlet "

Figure 9-5 Oracle Portal Search Results Page

Description of Figure 9-5 follows
Description of "Figure 9-5 Oracle Portal Search Results Page"

Figure 9-6 Oracle Portal Saved Searches Portlet

Description of Figure 9-6 follows
Description of "Figure 9-6 Oracle Portal Saved Searches Portlet "

9.1.4 Deciding Which Search Options to Use

Choosing how to configure searching within Oracle Portal begins with a careful examination of your goals for the search experience and understanding of your portal content. Some key questions include:

  • Searching 'breadth' - do you wish to limit the results returned from your portal search to content managed within the Oracle Portal Repository, or do you want to return results from other repositories?

  • Searching 'depth' - is full text indexing of document content a key requirement, or is a metadata only index sufficient?

  • Content security policies and portal user profiles - is your search experience targeted at primarily public, unauthenticated users searching public content or is it more targeted at individual users who have various levels of access privileges to the content?

  • Advanced searching features - is the ability to order results by relevancy, view document themes and gists, and other features of Oracle Text an important capability to offer your users?

  • Administration - how much time are you willing to invest in administering and maintaining indexes, data sources, and so on?

Use Table 9-2 to help match your search requirements to the most appropriate search configuration:

Table 9-2 Oracle Portal Search Options


Oracle Portal (Oracle Text disabled) Oracle Portal (Oracle Text enabled) Oracle Secure Enterprise Search

Searching 'Breadth'

Oracle Portal Repository only

Oracle Portal Repository only

Oracle Portal Repository and other repositories

Searching 'Depth'

Oracle Portal metadata only

Full text index

Full text index

Content security and user profiles

Returns secure and public content in search results

Returns secure and public content in search results

Returns secure and public content in search results

Advanced searching features

No

Yes

Yes

Administration

Minimal

Maintain full text indexes

Maintain full text indexes and configure data sources


9.1.5 Differences Between Oracle Secure Enterprise Search and Oracle Portal Search

This section highlights the main differences between Oracle Secure Enterprise Search and Oracle Portal Search.

  • Oracle Ultra Search only crawls public content

    Oracle Portal is exposed to Oracle Ultra Search as a file system, and to see content in a folder, the folder must be public. If it is not public, none of the content from the folder or the sub-folder hierarchy is crawled. If you create a piece of content and make it public, then it is only indexed if all the containing folders are also public.

  • Oracle Secure Enterprise Search returns a single list of pages and items:

    To Oracle Secure Enterprise Search, both Oracle Portal pages and items are resources with metadata and content, or a visual representation that can be crawled, indexed, and returned in search results. This means that, Oracle Secure Enterprise Search can return a search result list that contains both pages and items. Oracle Portal Search searches for distinct types of data (pages, items, categories and perspectives) and only one type of data can be searched at a time. Whilst Oracle Secure Enterprise Search does not treat categories and perspectives as separate searchable entities, it can (like Oracle Portal Search), search for items and pages that have a particular perspective or category.

  • Oracle Secure Enterprise Search searches content of displayed pages in addition to metadata:

    Oracle Portal Search searches page and item metadata. The Oracle Secure Enterprise Search crawler sees the rendered content plus the metadata. This means that Oracle Secure Enterprise Search can return results when Oracle Portal search does not return any.

  • Oracle Portal Search excludes some item types:

    Oracle Portal Search can only return items of the following base item types:

    • <None> that is, no base item type

    • Base File

    • Base URL

    • Base Text

    • Base PL/SQL

    • Base Page Link

    • Base Image

    • Base Image Map

    Oracle Secure Enterprise Search indexes the visualization of any item type that appears on a page, along with the item's metadata, irrespective of the base item type, as it is the page rendition that is indexed. This means that all the content on the page, static and dynamic, is indexed by Oracle Secure Enterprise Search including banners and template items, login/logout links, and so on.

  • Oracle Text and scoring systems:

    More on OTN

    Both Oracle Secure Enterprise Search and Oracle Portal Search use Oracle Text to index their content, however their implementations are different. Furthermore, Oracle Secure Enterprise Search uses a slightly different scoring system to Oracle Portal Search, and it may be customized. See the Oracle Secure Enterprise Search Administrator's Guide available from OTN at http://www.oracle.com/technology/products/oses/index.html.

    Both scoring systems give weighting when a search term is found in the title, so title hits will score more highly than hits in the document content. In Oracle Portal searches, the score ranks even higher when there are multiple terms in the title, and weighting is also given when multiple terms are found close together or to search results that contain the most hits.

  • Oracle Secure Enterprise Search crawls external content:

    Oracle Secure Enterprise Search can crawl content outside of Oracle Portal, that is, external Web sources. Oracle Portal searches are restricted to internal content.

9.2 Configuring Oracle Portal Search Options

The Oracle Portal search feature is installed with defaults so you can start using the search features right away. Refer to Section 9.1.3, "Default Search Functionality" for a description of these initial defaults.

This section describes how you, the portal administrator, can configure aspects of the search feature that affect all search portlets:

9.2.1 Configuring Oracle Portal Search Portlets

This section describes how to configure aspects of the search feature that affect all Oracle Portal search portlets:

9.2.1.1 Choosing Search Result Pages

You can determine which page is used to display search results from:

  • Basic Search portlets and Basic Search Box items

  • Advanced, Custom, and Saved Searches portlets

If you choose a different search result page, then it is applied to both new and existing search portlets.

More on OTN

You can override this setting for a particular Custom Search portlet, if required. A Custom Search portlet only uses the result page specified here, if the Where should the search results be displayed? option (on Edit Defaults: Results Display tab) is set to the Default Search Results Page. For more information on how to set options for the Custom Search portlet, refer to the Oracle Fusion Middleware User's Guide for Oracle Portal, available from OTN at http://www.oracle.com/technology/products/ias/portal/documentation.html.

To specify a search result page for Oracle Portal search portlets:

  1. In the Services portlet, click Global Settings.

    By default, the Services portlet is on the Portal subtab of the Administer tab on the Portal Builder page.

  2. Click the Search tab.

  3. In the Search Results Pages section, for Basic Search Portlets and Basic Search Box Items, choose a suitable search results page.

    You can choose any portal page that contains a search portlet. If you select a page without a search portlet, then no results are displayed. The default is the Basic Search Results Page.

  4. For Advanced, Custom and Saved Search Portlets, choose a suitable search results page.

    You can choose any portal page that contains a search portlet. If you select a page without a search portlet, then no results are displayed. The default is the Search Results Page.

  5. Select OK.


Note:

If page caching is enabled, then the change may not be seen in existing search portlets immediately. The cache is cleared automatically every 24 hours for all search portlets. Alternatively, clear the cache manually from the Web Cache Administration page in Oracle Fusion Middleware Control. See Section 8.1, "Using Oracle Enterprise Manager 11g Fusion Middleware Control".

If a page you select is subsequently deleted, then the associated Page field is empty. Choose another page and then click OK. If you click Cancel, then you will see Page Not Found errors after search operations.

9.2.1.2 Limiting the Number of Search Results on a Page

You can limit the number of results that search portlets can display. The limit is applied to Basic, Advanced, and Custom Search portlets.

You cannot change the limit for individual Basic or Advanced Search portlets. However, you can override this setting for a Custom Search portlet, if required.

If the number of results returned by a search exceeds this limit, the search results pages include Next and Previous icons so that users can view all of the results as shown in Figure 9-7. On a Custom Search portlet, these icons may be hidden, if required.

More on OTN

For more information on how to set options for the Custom Search portlet, refer to the Oracle Fusion Middleware User's Guide for Oracle Portal, available on OTN at http://www.oracle.com/technology/products/ias/portal/documentation.html.

Figure 9-7 Hits per Page Setting on Search Portlets

Description of Figure 9-7 follows
Description of "Figure 9-7 Hits per Page Setting on Search Portlets"

For example, if you limit Hits Per Page to 10, the first 10 results are displayed on the first search results page, the next 10 on the second page, and so on.


Note:

If you change the limit, the new value does not effect existing search portlets, only new ones.

To specify the number of search results per page:

  1. In the Services portlet, click Global Settings.

    By default, the Services portlet is on the Portal subtab of the Administer tab on the Portal Builder page.

  2. Click the Search tab.

  3. In the Search Properties section, for Hits Per Page, enter the number of search results to display on a page.

  4. Click OK.

9.2.1.3 Choosing an Advanced Search Link (Basic/Custom Search Portlets)

Typically, advanced searches allow a user to specify additional search criteria. For example, see Figure 9-8.

Figure 9-8 Advanced Search Link on Basic/Custom Search Portlets

Description of Figure 9-8 follows
Description of "Figure 9-8 Advanced Search Link on Basic/Custom Search Portlets"

The advanced search link can be to an external site, another portal page, or a package call within Oracle Portal.

More on OTN

An advanced search link is displayed on Basic Search portlets. Optionally, this link can be displayed on Custom Search portlets. For more information on how to set options for the Custom Search portlet, refer to the Oracle Fusion Middleware User's Guide for Oracle Portal, available on OTN at http://www.oracle.com/technology/products/ias/portal/documentation.html.

You can determine the destination of the Advanced Search Link, for all Basic/Custom Search portlet instances. When you specify a new Advanced Search Link, it is applied to both new and existing search portlets that display an Advanced Search link.

To enter advanced search link details:

  1. In the Services portlet, click Global Settings.

    By default, the Services portlet is on the Portal subtab of the Administer tab on the Portal Builder page.

  2. Click the Search tab.

  3. In the Advanced Search Link section, do one of the following:

    • Specify a destination Page Name for the Advanced Search link.

      The default is the Advanced Search Page, which contains the built-in Oracle Portal Advanced Search portlet. However, you can select any portal page displaying advanced search options, the page does not have to contain one of the Oracle Portal search portlets. For example, you can use a JSP page containing advanced search options if one existed in your portal.

      If the page you select is subsequently deleted, this field is empty. Choose another page and then OK. If you click Cancel, the advanced search links will all still point to the deleted page.

    • Specify a URL for the Advanced Search link.

      Enter the URL you want to use. If you have created a customized search engine that you want to use for advanced searches throughout the portal, you can specify its link here.

      You can specify an absolute URL, or a relative URL. For example, http://www.myfavoritesearchengine.com creates a link directly to this Internet search site.

      If you enter a relative URL (that is, a portal package), the value specified here is appended to the Oracle Portal schema URL and this results in a call to the portal package. Note how the value is appended, depending on whether the value specified begins with '/':

      /value results in this URL: http://<webserver>:<port>//<PORTALSCHEMA>.my_search_package.my_search_method>

      value results in this URL: http://<webserver>:<port>/portal/pls/<dad>//<PORTALSCHEMA>.my_search_package.my_search_method

  4. Select OK.


Note:

If page caching is enabled, the change may not be seen in existing search portlets immediately. The cache is cleared automatically every 24 hours for all search portlets. Alternatively, clear the cache manually from the Web Cache Administration page in Oracle Fusion Middleware Control. See Section 8.1, "Using Oracle Enterprise Manager 11g Fusion Middleware Control".

9.2.1.4 Choosing an Internet Search Engine (Advanced/Custom Search Portlets)

An Internet search engine link is displayed on Advanced Search portlets. So, if users do not find the information they need when they search Oracle Portal, they can extend their search using an Internet Search Engine. See Figure 9-9.

Figure 9-9 Internet Search Engine Link on Advanced/Custom Search Portlets

Description of Figure 9-9 follows
Description of "Figure 9-9 Internet Search Engine Link on Advanced/Custom Search Portlets"

More on OTN

Optionally, this link can be displayed on Custom Search portlets. For more information on how to set options for the Custom Search portlet, refer to the Oracle Fusion Middleware User's Guide for Oracle Portal, available on OTN at http://www.oracle.com/technology/products/ias/portal/documentation.html.

When you specify the URL of an Internet search engine and the link text that users click to access the specified Internet search engine, it applies to all new and existing Advanced/Custom Search portlet instances that display an Internet search link.

  1. In the Services portlet, click Global Settings.

    By default, the Services portlet is on the Portal subtab of the Administer tab on the Portal Builder page.

  2. Click the Search tab.

  3. In the Internet Search Engine section, for URL, enter the URL of an Internet search engine. For example, http://www.myinternetsearch.com.

    The URL must be fully formed. It must include http:// and any associated parameters.

  4. For Link Text, enter the text that users click to access the specified Internet search engine. For example: MySearch

    If you enter MySearch, this text is displayed as a link in Advanced Search portlets and optionally in Custom Search portlets. See Figure 9-9.

  5. Select OK.

If the Internet Search Engine properties (URL and Link Text) are not specified, none of the Advanced or Custom Search portlets display an Internet search engine link.


Note:

If page caching is enabled, the change may not be seen in existing search portlets immediately. The cache is cleared automatically every 24 hours for all search portlets. Alternatively, clear the cache manually from the Web Cache Administration page in Fusion Middleware Control. See Section 8.1, "Using Oracle Enterprise Manager 11g Fusion Middleware Control".

9.2.2 Configuring Oracle Text Options in Oracle Portal

This section describes how to configure Oracle Text features in Oracle Portal:


Note:

If page caching is enabled, changes to Oracle Text search settings may not be seen in existing search portlets immediately. The cache is cleared automatically every 24 hours for all search portlets. Alternatively, clear the cache manually from the Web Cache Administration page in Fusion Middleware Control. See Section 8.1, "Using Oracle Enterprise Manager 11g Fusion Middleware Control".

See Section 9.3, "Oracle Text" for more information about Oracle Text, how to maintain Oracle Text indexes, and troubleshooting information.

9.2.2.1 Enabling and Disabling Oracle Text in Oracle Portal

Oracle Text extends the searching capabilities of Oracle Portal. See Section 9.1.1, "Oracle Portal Search". Out-of-the-box, Oracle Text is always enabled. Although Oracle does not recommend that you disable Oracle Text, it is possible to do so, if your portal does not require or would not benefit from full text indexing content within the Oracle Portal Repository. See also Section 9.2.2.1, "Enabling and Disabling Oracle Text in Oracle Portal".

  1. In the Services portlet, click Global Settings.

    By default, the Services portlet is on the Portal subtab of the Administer tab on the Portal Builder page.

  2. Click the Search tab.

  3. Select Enable Oracle Text Searching to make use of Oracle Text when searching Oracle Portal.

    Deselect this option at any time to disable the use of Oracle Text.


    Note:

    If you see the message Oracle Text is not installed, Oracle Text is not installed in the database and is not available in Oracle Portal. Arrange with your database administrator to have Oracle Text installed. Once installed, you must run the following command in SQL* Plus to create the Oracle Text role:

    inctxgrn.sql

    This file is located in the directory ORACLE_HOME/portal/admin/plsql/wws.

    Log on using the user name and password for the PORTAL schema. You must also create Oracle Text indexes. See Section 9.3.4, "Creating and Dropping Oracle Text Indexes" for more information.


  4. Click OK.

9.2.2.2 Setting Oracle Text Search Result Options

When Oracle Text is enabled, you can display additional information alongside items (documents/files) when they are returned as search results. For each item returned you can view:

  • Major themes in a chart. A theme shows the nouns and verbs that occur most frequently.

  • A short summary about the content (gist). Gists are derived from how frequently those nouns and verbs appear.

  • An HTML version.

  • An HTML version of the file with search terms highlighted in a specific color and font.

Themes and gists are optional and HTML highlighting can be customized as follows:

  1. In the Services portlet, click Global Settings.

    By default, the Services portlet is on the Portal subtab of the Administer tab on the Portal Builder page.

  2. Click the Search tab.

  3. Select Enable Themes And Gists to create a theme and gist for each item returned by the search.


    Note:

    Themes and gists are not available for all languages.

  4. For Highlight Text Color, select the color to highlight search terms found in the HTML version of items returned by the search.

  5. For Highlight Text Style, select the style to apply to search terms found in the HTML version of the items returned by the search.

  6. Click OK.

9.2.2.3 Setting a Base URL for Oracle Text

Oracle Text needs a base URL to resolve relative URLs into fully qualified absolute URLs. See Section 9.3.6.1, "Relative URLs" for more information.

To specify the Base URL for Oracle Text:

  1. In the Services portlet, click Global Settings.

    By default, the Services portlet is on the Portal subtab of the Administer tab on the Portal Builder page.

  2. Click the Search tab.

  3. Enter the Oracle Text Base URL in the format: http://<host>:<port>/portal/pls/<dad>

    For example: http://myportal.com:4000/portal/pls/design

    If no value is specified, no relative URLs are indexed and therefore, any URL content that relative URLs points to, cannot be searched.

  4. Click OK.

9.2.2.4 Configuring Proxy Settings for Oracle Text

Oracle Text uses Oracle Portal proxy server settings to access URL content. This is necessary when Oracle Portal lies behind a firewall and URL items point to content beyond this firewall. See Section 9.3.6.4, "URL Index Proxy Settings" for more information.

Refer to Section 6.5, "Configuring Oracle Portal to Use a Proxy Server" for information about configuring the global proxy settings for Oracle Portal.

9.2.3 Configuring Oracle Secure Enterprise Search Options in Oracle Portal

For information about how to configure Oracle Secure Enterprise Search for use in Oracle Portal, see Oracle Secure Enterprise Search Administrator's Guide, available on OTN at http://www.oracle.com/technology/products/oses/index.html.

9.3 Oracle Text

Oracle Text adds powerful text search and intelligent text management to the Oracle Database. Oracle Portal uses the Oracle Text functionality to extend its search capabilities.

Out-of-the-box, Oracle Text is always enabled. However, use of Oracle Text with Oracle Portal is an optional feature that can be disabled by a portal administrator. For more information, see Section 9.3.1, "Understanding Oracle Portal Searches with Oracle Text Enabled/Disabled".

The use of Oracle Text with Oracle Portal is described in the following sections:

More on OTN

You will find additional information in the Oracle Text documentation, available on OTN at http://www.oracle.com/technology/.

9.3.1 Understanding Oracle Portal Searches with Oracle Text Enabled/Disabled

Out-of-the-box, Oracle Text is always enabled. Although Oracle does not recommend that you disable Oracle Text, it is possible to do so, if your portal does not require or would not benefit from full text indexing Oracle Portal Repository content.

See Section 9.2.2.1, "Enabling and Disabling Oracle Text in Oracle Portal" for information about disabling Oracle Text.

9.3.1.1 Searching With Oracle Text Disabled

If Oracle Text is disabled and you perform a basic search (enter a search term only), the following metadata is searched:

  • Item attributes (Display Name, Description, Keywords, Author)

  • Page attributes (Display Name, Description, Keywords)

  • Category and perspective attributes (Display Name, Description)

A basic search does not search custom attributes.

If more than one search term is specified along with the search operator Contains All of the Terms, then the terms must all appear within the same attribute to result in a match. For example, if you enter weights aerobics, search results are returned only when both these terms are found in a single attribute, such as Description. If the term weights is found in Description and the term aerobics is found in Display Name, then this does not result in a match.

It is also worth noting that fewer search operators are available when Oracle Text is disabled. Only three search operators are available for the main search term: Contain All of the Terms, Contain Any of the Terms, and Contain these Terms Exactly. There are fewer operators for attribute searches too.

Searches that specify criteria against selected attributes (advanced searches), matches against the selected attributes. No file- or URL- based attributes (including the seeded attributes URL and File Name) appear on advanced search forms as these are not searchable when Oracle Text is disabled. Similarly, page designers editing Custom Search portlets are prevented from selecting file- and URL- based attributes as search criteria. If any search portlet specified a file- or URL- based attribute before Oracle Text was disabled, the attribute appears greyed out and italicized if Oracle Text is subsequently disabled.

9.3.1.2 Searching With Oracle Text Enabled

If Oracle Text is enabled when you perform a basic search, all text-type attributes, including custom text attributes are searched. Furthermore, the content of file and URL items are searched.

Documents/file and URL items in binary format can be searched providing that the file format is filterable by Oracle Text. In addition, Web pages that URLs (in URL attributes) point to can also be searched, providing that the content is plain text or HTML. For more information, see Section 9.3.3.1, "Oracle Text Index Overview".

9.3.2 Oracle Text Prerequisites

Oracle Text is a standard component of the Oracle Database 11g. If you want to use the Oracle Text functionality in Oracle Portal, it is essential that the Oracle Text component is correctly installed and functioning properly.

Ensure that:

  • Oracle Text is installed in the Oracle Portal Repository database. The Oracle Text component is required to be in the Oracle Portal Repository database before the Oracle Portal Repository can be installed. This is because some Oracle Portal packages make reference to the ctx_ddl packages in the CTXSYS schema in which the Oracle Text component resides.

  • Oracle Text upgrade steps are complete. In particular, during database upgrades, it is essential that any manual steps that pertain to Oracle Text are completed correctly.

  • Library path for the Oracle Text AUTO_FILTER is set correctly. For AUTO_FILTER to function correctly, the ctxhx executable (called during indexing) needs to be able to load the appropriate shared libraries. See also Section 9.3.3.1, "Oracle Text Index Overview".

    • For UNIX platforms, ensure that the library path used by ld includes ORACLE_HOME/ctx/lib for both the TNS listener and the environment where the database is started. The library path environment variable for the different UNIX platforms are as follows:

      Solaris, Tru64 UNIX, Linux -> $LD_LIBRARY_PATH

      HP/UX -> $SHLIB_PATH and $LD_LIBRARY_PATH

      IBM AIX -> $LIBPATH

      More on OTN

      For detailed information, see the Oracle Text Reference, available on OTN at http://www.oracle.com/technology/documentation/.

      Whenever you change the library path you must restart both the database and the listener for Oracle Text indexing operations to work. If one or both environment variables are not set, documents are not indexed as expected and the table ctx_user_index_errors may be full of DRG-11207, status 137 errors. See also Section 9.3.12.1, "Common Document Indexing Errors".

    • On Windows platforms, the Oracle Text DLLs are located in ORACLE_HOME\bin. Ensure that this path is included in the PATH environment variable, that is, in the environment from where the Oracle server is started.

9.3.3 Oracle Text Indexes

If you want to use the Oracle Text functionality in Oracle Portal, several Oracle Text indexes are required in the Oracle Portal schema. Details of these indexes are described in the following sections:

9.3.3.1 Oracle Text Index Overview

All required Oracle Text indexes are built automatically during Oracle Portal installation by procedures in the package wwv_context.

After portal installation, you can use the procedures in this package to manage the indexes, and this includes removing and re-creating the indexes. For more information, see Section 9.3.4.3, "Dropping All Oracle Text Indexes Using ctxdrind.sql" and Section 9.3.4.1, "Creating All Oracle Text Indexes Using ctxcrind.sql".


Note:

Oracle Text can be disabled, even when Oracle Text indexes are present. See Section 9.2.2.1, "Enabling and Disabling Oracle Text in Oracle Portal".

Table 9-3 describes the Oracle Text indexes that are required.

Table 9-3 Oracle Text Indexes In the Oracle Portal Schema

Index Table.column Purpose Datastore type Filter Type Optional?

WWSBR_CORNER_CTX_INDX

wwpob_page$.ctxtxt

Index page metadata

user datastore

-

No

WWSBR_DOC_CTX_INDX

wwdoc_document$. blob_content

Index document content

direct datastore

AUTO_FILTER

Yes

WWSBR_PERSP_CTX_INDX

wwv_perspectives. ctxtxt

Index perspective metadata

user datastore

-

No

WWSBR_THING_CTX_INDX

wwv_things.ctxtxt

Index item metadata

user datastore

-

No

WWSBR_TOPIC_CTX_INDX

wwv_topics.ctxtxt

Index category metadata

user datastore

-

No

WWSBR_URL_CTX_INDX

wwsbr_url$.absolute_url

Index URL content

URL datastore

AUTO_FILTER

Yes


Most of the Oracle Text indexes use a user datastore. The exceptions are the indexes WWSBR_DOC_CTX_INDX (Document index) and WWSBR_URL_CTX_INDX (URL index):

  • Document index: Uses a direct datastore. It indexes the document content held directly in the BLOB type blob_content column of the wwdoc_document$ table.

  • URL index: Fetches the content to be indexed for each row in the wwsbr_url$ table from the location pointed to by the absolute_url column.

It is possible to disable Document and URL indexing. This can improve the speed and efficiency of portal searches as searches are limited to item, page, category, and perspective metadata only. See Section 9.3.7, "Disabling Document and URL Indexing".

The Document and URL indexes both use a filter (AUTO_FILTER), to convert documents into a plain text format that is suitable for indexing:

  • Binary documents are converted into plain text (providing the binary format is supported by the AUTO_FILTER).

  • Plain text, HTML, XHTML, SGML, and XML documents bypass the filter and are indexed directly.

  • Documents that do not need to be indexed, such as graphics, are ignored by the filter.

See also Section 9.3.3.7, "Maximizing AUTO_FILTER Performance".


Note:

If Oracle Portal is installed into a database that does not have a functional AUTO_FILTER, document and URL searching is automatically disabled as this functionality does not work without the AUTO_FILTER. See also Section 9.3.7, "Disabling Document and URL Indexing".

9.3.3.2 Oracle Text Index Preferences

Oracle Text uses preferences to configure the Oracle Text indexes used by Oracle Portal and these preferences are created and owned by the Oracle Portal schema. The preferences are created using the ctx_ddl package, which resides in the CTXSYS schema, and the data representing the preferences is actually stored in relational tables in the CTXSYS schema.

Oracle Text index preferences must exist before the indexes are created. Subsequent changes to these preferences do not take effect until the Oracle Text indexes are dropped and re-created.

The Oracle Text index preferences that are used during Oracle Portal installation to create Oracle Text indexes can be re-created using the package wwv_context. Some Oracle Text index preferences can also be configured by you, the portal administrator, after installation. For example, global Oracle Portal proxy settings are used by Oracle Text to populate the proxy preferences used in Oracle Text indexes.

Oracle Text indexes also use a number of Lexer preferences to control the linguistic aspects of the indexing. Lexer preferences are created by the script sbrimtlx.sql and you can run this script at any time to re-create the Lexer preferences. The script is located in the directory ORACLE_HOME/portal/admin/plsql/wws.

More on OTN

You will find additional information in the Oracle Text documentation on OTN, http://www.oracle.com/technology/.

9.3.3.3 Datastore Procedures

In an Oracle9i Database Server, for each of the Oracle Text indexes that use user datastores, a datastore procedure is created in the CTXSYS schema where Oracle Text is installed. The procedures are called for each row that is to be indexed for the given index. These procedures in turn call procedures in the Oracle Portal schema.

The datastore procedures are named as follows:

  • WWSBR_THING_CTX_<user_id>

  • WWSBR_CORNER_CTX_<user_id>

  • WWSBR_PERSP_CTX_<user_id>

  • WWSBR_TOPIC_CTX_<user_id>

Where <user_id> is the user_id (as found in the ALL_USERS view) of the Oracle Portal Repository schema. This suffix is required so that the procedure names do not clash, in the case where multiple Oracle Portal repositories exist in the same database.

If for any reason these procedures do not exist, Oracle Text does not work. This might happen, for example, if the CTXSYS schema is dropped and reinstalled. In this situation, the procedures can be reinstalled by running the script inctxgrn.sql as the Oracle Portal schema owner:

SQL> @inctxgrn.sql

This script also grants the CTXAPP role to the Oracle Portal schema. See Section 9.3.3.4, "Granting CTXAPP Role to the Oracle Portal Schema" for details. The script is located in the directory ORACLE_HOME/portal/admin/plsql/wws.

In Oracle Database 11g, the datastore procedures are not created in the CTXSYS schema. Instead, the procedures are owned by the index owning schema, that is, the Oracle Portal schema. In this case, all the procedures are in the package wwsbr_ctx_procs:

  • wwsbr_ctx_procs.thing_ctx

  • wwsbr_ctx_procs.corner_ctx

  • wwsbr_ctx_procs.perspective_ctx

  • wwsbr_ctx_procs.topic_ctx

As the procedures are in the Oracle Portal schema, <user_id> suffixes are not required.

9.3.3.4 Granting CTXAPP Role to the Oracle Portal Schema

To use Oracle Text functionality, the role CTXAPP must be granted to the Oracle Portal schema. This happens automatically during Oracle Portal installation and normally no further action is required.

If for any reason this grant is revoked, Oracle Text does not work. For example, this may occur if the CTXAPP role is dropped when the CTXSYS schema is reinstalled.

To restore the necessary grants, run the script inctxgrn.sql as the Oracle Portal schema owner:

SQL> @inctxgrn.sql

The script is located in the directory ORACLE_HOME/portal/admin/plsql/wws. This script also creates the Oracle Portal user datastore procedures, which are required in the CTXSYS schema. See Section 9.3.3.3, "Datastore Procedures".

9.3.3.5 Multilingual Functionality (Multilexer)

Oracle Portal uses the Oracle Text Multilexer to enable language-specific searching in Oracle Portal. The Multilexer:

  • Controls the way that the linguistic aspects of searching are carried out.

  • Allows content, items, pages, categories, perspectives, and their translations, to be treated in a way that is appropriate to their language.

Lexer preferences are used to configure the Multilexer used for all the Oracle Text indexes. The lexer preferences are created by the script file sbrimtlx.sql,the script is located in the directory ORACLE_HOME\portal\admin\plsql\wws.You can modify these preferences if required, but if you do, you must drop and re-create the Oracle Text indexes for the changes to take a effect.

More on OTN

For more information on the Multilexer, refer to Oracle Text documentation on OTN, http://www.oracle.com/technology/.

9.3.3.6 STEM Searching

By default, STEM searching is used when Oracle Text is enabled in Oracle Portal. STEM searching enables you to search for words that have the same root as the specified term. For example, a stem of $sing expands into a query on the words sang, sung, sing.

However, STEM searching is used only when logged in to Oracle Portal in one of the following languages, where STEM searching is supported in Oracle Text:

AMERICAN ENGLISH
CANADIAN FRENCH
DUTCH
UK ENGLISH
FRENCH
GERMAN DIN
GERMAN
ITALIAN
LATIN AMERICAN SPANISH
MEXICAN SPANISH
SPANISH

In all other languages, the STEM operator is not used.

9.3.3.7 Maximizing AUTO_FILTER Performance

AUTO_FILTER is a universal filter that converts most document formats, such as PDF documents, into a plain text format that is suitable for indexing. In Oracle Portal, only the Document and URL index make use of the AUTO_FILTER.

During the indexing process, AUTO_FILTER converts documents and URL content according to the following AUTO_FILTER_FORMAT settings:

  • BINARY - these documents are converted into plain text (providing the binary format is supported by the AUTO_FILTER).

  • TEXT - these documents bypass AUTO_FILTER and get indexed directly; for example, plain text, HTML, XHTML, SGML, and XML documents.

  • IGNORE - these documents, such as images, are not filtered or indexed.

Filtering content unnecessarily can impact the speed and efficiency of portal searches, so it is important that you optimize the filtering process. The best way to optimize the use of AUTO_FILTER, is to ensure that all document and URL content uploaded to your portal is classified with the correct MIME type and character set:

  • MIME type - In Oracle Portal, it is the MIME type of a document that determines the AUTO_FILTER_FORMAT (the setting that AUTO_FILTER uses to determine whether a document gets filtered). For example, a document uploaded with the MIME type application/PDF gets mapped to BINARY and is filtered, whereas a document with the MIME type text/HTML gets mapped to TEXT and is indexed directly. Other documents, like images with the MIME type image/GIF, are mapped to IGNORE.

  • Character set - AUTO_FILTER can convert documents from a non-database character set to the character set used by the database. This enables you to index and search for documents in other character sets. If AUTO_FILTER cannot determine the character set or it is not one of the supported character sets, the document gets indexed without any character set conversion.

To determine the AUTO_FILTER_FORMAT mapping, the browser uses the following mapping sequence when the content is uploaded to your portal:

Files

  • If a MIME type is specified (either by the browser allocating it, or by the user specifying it with the Mimetype attribute), this MIME type is used to map to an AUTO_FILTER_FORMAT (using the wwdav$mime table).

  • If the MIME type does not exist in the table, then the file extension is used to try and find the AUTO_FILTER_FORMAT.

  • If the file extension does not match any defined extensions, then the default AUTO_FILTER_FORMAT for the Document index is used.

URLs

  • If a MIME type is specified by the user with the Mimetype attribute, this MIME type is used to map to an AUTO_FILTER_FORMAT (using the wwdav$mime table).

  • If the MIME type doesn't exist in the table then, if the URL has a file extension, the file extension is used to try and find the AUTO_FILTER_FORMAT. For example, if a URL is pointing to a PDF file, then the AUTO_FILTER_FORMAT would be set to BINARY.

  • If the extension doesn't match either then the default AUTO_FILTER_FORMAT for the URL index is used.

To ensure that all portal content gets classified and filtered properly, Oracle Portal provides two special attributes for file- and URL- based item types: MIME Type and Character Set. By extending a built-in Base File and Base URL item type to include these attributes, users can enter the correct information when they upload portal content.


Note:

Although the MIME Type and Character Set attributes enable you to specify the correct MIME type and character set for file- and URL- based items, it is not possible to specify these for File and URL attributes:
  • File attributes - browser always determines the MIME type.

  • URL attributes - MIME type is always text/html, so AUTO_FILTER always processes URL attributes as plain text.


If speed and efficiency of portal searches are important in your portal or your portal stores or references non-database character set documents, ask page group administrators to add MIME Type and Character Set attributes to all file- and URL- based item types available in their page groups. See also, Adding Attributes to an Item Type in the Oracle Fusion Middleware User's Guide for Oracle Portal.


Note:

When you search for portal content by MIME type or character set, you will only find content based on an item type that includes the corresponding attribute (MIME Type or Character Set).

More on OTN

You will find additional information in the Oracle Text documentation on OTN, http://www.oracle.com/technology/.

9.3.4 Creating and Dropping Oracle Text Indexes

All the required Oracle Text indexes are created automatically during Oracle Portal Repository installation. However, if the indexes are subsequently dropped, it may be necessary to re-create them.

Creating and dropping indexes is a very time-consuming and resource-intensive operation, so plan this task during non-business hours. Also, dropping and re-creating Oracle Text indexes may affect the search functionality in your portal as all the indexes must be present and valid for the search feature to operate normally. When an index is dropped, Oracle Text functionality, such as extra search operators, special search result attributes, and so on, are temporarily unavailable even though Oracle Text searching is enabled. This is another good reason for planning this task during non-business hours.


Note:

Dropping or creating Oracle Text indexes does not invalidate Oracle Web Cache. Therefore, search results published automatically and existing search forms, are still displayed until they expire from the cache, or someone edits the search portlet (using the Edit Defaults page).

The following sections describe how to create and drop Oracle Text indexes:

9.3.4.1 Creating All Oracle Text Indexes Using ctxcrind.sql

You can re-create all the Oracle Text indexes using scripts and packages provided with Oracle Portal. The primary script for creating the Oracle Text indexes is ctxcrind.sql and it is located in the directory ORACLE_HOME/portal/admin/plsql/wws.

When you run the script ctxcrind.sql as the Oracle Portal Repository schema owner:

  • All the required Oracle Text indexes and preferences are created. See also, Section 9.3.3, "Oracle Text Indexes".

  • If there are existing Oracle Text indexes, all existing preferences and valid indexes are dropped and re-created. Indexes are judged to be valid if:

    • The row in view user_indexes for the relevant index has index_status, domidx_status, and domidx_opstatus all set as 'VALID'.

    • The index has an entry in ctx_user_indexes with the idx_status set to 'INDEXED'.

  • Any indexes that are not present are also created.

This process can take several hours.

To create Oracle Text indexes using the script ctxcrind.sql:

  1. Navigate to the directory ORACLE_HOME/portal/admin/plsql/wws.

  2. In SQL*Plus, log on using the user name and password for the PORTAL schema.

  3. In SQL*Plus, enter this command:

    ctxcrind.sql
    

If the operation is successful, all the Oracle Text indexes and preferences are created in the Oracle Portal Repository schema. If it fails, check that your system has met all the requirements. See Section 9.3.2, "Oracle Text Prerequisites".


Note:

The time it takes to create the Oracle Text indexes, depends on how many items and page groups exist in your portal.

The script ctxcrind.sql makes a call to the procedure:

wwv_context.createindex( p_message => l_message );

Where p_message is an out parameter that passes a completion message. The call wwv_context.createindex() is in turn equivalent to:

wwv_context.drop_prefs;  /* Drop all Oracle Text preferences for the indexes, except Lexer preferences */
wwv_context.drop_invalid_indexes; /* Drop all invalid indexes */
wwv_context.create_prefs; /* Create all Oracle Text preferences,except Lexer preferences */
wwv_context.create_missing_indexes(l_indexes);  /* Create missing indexes and record them in l_indexes */
wwv_context.touch_index(l_indexes); /* Mark all rows for created indexes as requiring synchronization */
wwv_context.sync;     /* Synchronize indexes */
wwv_context.optimize; /* Optimize indexes */

9.3.4.2 Creating a Single Oracle Text Index

If you want to create a specific index, use the procedure wwv_context.create_index(p_index). See also Appendix D, "create_index".

Use p_index to specify which index you want to create. One of:

wwv_context.PAGE_TEXT_INDEX
wwv_context.DOC_TEXT_INDEX
wwv_context.PERSPECTIVE_TEXT_INDEX
wwv_context.ITEM_TEXT_INDEX
wwv_context.CATEGORY_TEXT_INDEX
wwv_context.URL_TEXT_INDEX

This procedure creates an empty index. Search results cannot be returned from an empty index, so you'll need to populate the index too. See Section 9.3.5.6, "Synchronizing All the Index Content" for information about marking an index for update and synchronizing an index.

9.3.4.3 Dropping All Oracle Text Indexes Using ctxdrind.sql

You can drop all of the Oracle Text indexes and preferences (except for the Lexer preferences), using the script ctxdrind.sql. This script is located in the directory ORACLE_HOME/portal/admin/plsql/wws.

To drop all the Oracle Text indexes using the script ctxdrind.sql:

  1. Navigate to the directory ORACLE_HOME/portal/admin/plsql/wws.

  2. In SQL*Plus, log on using the user name and password for the PORTAL schema.

  3. In SQL*Plus, enter this command:

    ctxdrind.sql
    

    This script makes a call to:

    wwv_context.dropindex(p_message  =>l_message);
    

    Where p_message is an out parameter that passes a completion message.


    Note:

    When the Oracle Text indexes are dropped, any views and packages that reference tables on which the indexes were created will become invalid.

    These views and packages are automatically validated when they are next accessed. Alternatively, it is possible to validate the views and packages manually.


9.3.4.4 Dropping a Single Oracle Text Index

You may want to drop a specific Oracle Text index. For example, you may want to drop the URL index so that it can be re-created with a different proxy setting, without having to drop and re-create all the other indexes.

To do this, drop the index directly using the command:

SQL> drop index <index_name> force;

For example, to drop the URL index, enter:

SQL> drop index WWSBR_URL_CTX_INDX force;

Alternatively, you can drop an index using wwv_context.drop_index:

SQL>exec wwv_context.drop_index('<index_name>');

See also, Appendix D, "drop_index".

9.3.5 Maintaining Oracle Text Indexes

It is important that you maintain Oracle Text indexes properly as this ensures that portal search results are returned accurately and efficiently. If you are maintaining Oracle Text indexes you'll need to consider index synchronization and optimization:

  • Synchronization — Updates an Oracle Text index based on a queue.

  • Optimization — Compacts fragmented rows and removes old data in an Oracle Text index. As an index is synchronized, it grows in such a way as to consume more disk space than necessary and this reduces the efficiency of queries.

Oracle Text gives you full control over how often each index is synchronized and optimized. For more information about synchronization, see:

For more information about optimization, see:

9.3.5.1 Synchronizing Oracle Text Indexes

When new content is added to your portal (items, pages, categories, perspectives) it must be indexed before it can be searched. Furthermore, when any row in a table on which the indexes are created are modified, that row is marked as needing synchronization. These are referred to as pending rows and they are not returned in search results until the index is synchronized.

You can see which rows are marked pending, using the view ctx_user_pending. You can also use the script textstat.sql to see the number of rows that need to be synchronized for each index. See also, Section 9.3.8, "Viewing the Status of Oracle Text Indexes".

During installation, Oracle Text indexes are created and synchronized and by default, all indexes are scheduled to synchronize hourly by a job that calls wwv_context.sync. If hourly synchronization is not acceptable for your portal you may modify the default synchronization schedule. For example, you can choose to synchronize every five seconds, if it is important to reflect text changes quickly in the index. Alternatively, you can choose to synchronize once a day, for more efficient use of computing resources and a more optimal index. See Section 9.3.5.4, "Scheduling Index Synchronization".

You can specify that Oracle Text indexes synchronize immediately after portal content is added, updated, or deleted, and this can be configured on an index-by-index basis. This feature is not available on databases earlier than Oracle Database 10g as earlier versions do not support the sync property. Oracle recommend that the page, item, category, and perspective indexes are configured to synchronize on commit as this configuration does not impact search performance. You can also configure the document and URL indexes to synchronize on commit but as this configuration can impact the speed and efficiency of portal searches, you will need to evaluate its use on a portal-by-portal basis. Table 9-4 summarizes the recommended synchronization schedule for Oracle Text indexes on Oracle Database 11g:

Table 9-4 Recommended Synchronization Schedule for Oracle Text Indexes on Oracle Database 11g

Oracle Text Index Index Synchronization on Oracle Database 11g

Page, Item, Category, Perspective

Synchronize immediately after a commit—whenever an associated portal object is added, modified or deleted. See Section 9.3.5.2, "Synchronizing an Oracle Text Index On Commit".

Document, URL

Synchronization scheduled hourly (or some other regular interval) by a job that calls wwv_context.sync. See Section 9.3.5.4, "Scheduling Index Synchronization".


The following sections describe your synchronization options:

9.3.5.2 Synchronizing an Oracle Text Index On Commit

Use the procedure wwv_context.commit_sync() to specify whether an Oracle Text index synchronizes immediately after data is committed to your portal. For more information, see also, Appendix D, "commit_sync".

The commit does not return until the synchronization is complete. Since the synchronization is performed as a separate transaction, there may be a period, usually small, when the data is committed but index changes are not. The operation uses the memory specified with the memory parameter. See also, Appendix D, "set_sync_memory".


Note:

Use textstat.sql to determine the current status of this setting. For more information, Section 9.3.8, "Viewing the Status of Oracle Text Indexes"

To specify that an Oracle Text index synchronizes on commit:

Execute wwv_context.commit_sync as the Oracle Portal schema owner from SQL*Plus, using the command:

exec wwv_context.commit_sync('<Index_name>', true);

To specify that an Oracle Text index synchronizes manually:

Execute wwv_context.commit_sync as the Oracle Portal schema owner from SQL*Plus, using the command:

exec wwv_context.commit_sync('<Index_name>', false);

To verify the status of on commit synchronization for an Oracle Text index:

Execute wwv_context.get_commit_sync as the Oracle Portal schema owner from SQL*Plus, using the command:

set serveroutput on
begin
        dbms_output.put_line(
           case wwv_context.get_commit_sync('<index_name>')
               when true then
                 'Index synchronizes automatically when data commits'
               when false then
                  'Index needs to be synchronized manually'
            end
       );
end;

9.3.5.3 Synchronizing All Oracle Text Indexes Manually

Use the procedure wwv_context.sync() to synchronize the Oracle Text indexes manually. This procedure indexes all pending rows. See also, Appendix D, "sync".

With manual synchronization you can also specify memory size and parallel synchronization. See also, Appendix D, "set_parallel_degree" and Appendix D, "set_sync_memory".


Note:

wwv_context.sync ignores any index that synchronizes immediately after data is committed (wwv_context.commit_sync is set to true).

To synchronize Oracle Text indexes manually:

Execute this procedure as the Oracle Portal schema owner from SQL*Plus, using the command:

exec wwv_context.sync();

Use the following syntax to specify the degree of parallelism used during synchronization:

exec wwv_context.set_parallel_degree('<index_name>', <parallel_degree>);

For example:

exec wwv_context.set_parallel_degree('WWSBR_CORNER_CTX_INDX', 2);

Use the following syntax to specify the amount of memory used during synchronization:

exec wwv_context.set_sync_memory('<index_name>', <sync_memory);

For example:

exec wwv_context.set_sync_memory('WWSBR_CORNER_CTX_INDX', 12582912);

This procedure operates across all virtual private portal subscribers.

9.3.5.4 Scheduling Index Synchronization

In most installations, it is desirable to schedule index synchronization to run automatically at regular intervals so that newly added or updated content gets indexed periodically. You can schedule a job using the script textjsub.sql. This uses dbms_job to call wwv_context.sync at regular intervals.

The script takes three parameters and it can also be used to alter or remove a synchronization job:

start_time        - a valid date or 'START' or 'STOP'
start_time_fmt    - start time format mask.
                    Ignored if start_time is 'START' or 'STOP'
interval_minutes  - minutes between each run. Ignored if 'STOP' 

If you set start_time to START, the second argument is ignored and the next job is scheduled to run immediately. Subsequent jobs are run after the interval specified.

If you set start_time to STOP, the job is removed and other arguments are ignored.

To schedule Oracle Text index synchronization:

Run the script textjsub.sql. For example, to schedule index synchronization every 60 minutes, enter:

SQL> @textjsub.sql START NOW 60

9.3.5.5 Deciding How Often to Synchronize Oracle Text Indexes

The appropriate interval between index synchronization jobs depends on:

  • How often new content is added to your portal site.

  • Whether it matters that newly added or altered content is not searchable immediately.

  • How long is it reasonable to have to wait before added or updated content is searchable.

Depending on your requirements, the synchronization interval could be anything from a few minutes to several days.

It is more efficient to synchronize a larger number of rows on a single occasion than to repeatedly synchronize a smaller number of rows, as the index becomes less fragmented. If an index is less fragmented, then it needs to be optimized less frequently. See Section 9.3.5.7, "Optimizing Oracle Text Indexes" for more information.

However, indexing a larger number of rows at once places a heavier load on the server. Synchronizing more frequently increases the total amount of work but spreads the load on the server. The job only synchronizes the rows that are pending, however, there is always some overhead, however small, in starting up the synchronization job.

9.3.5.6 Synchronizing All the Index Content

You can synchronize all the content for a particular Oracle Text index by marking every row in that index as requiring synchronization.

For example, when an index is initially created it is empty, so you would need to update the entire index content. This involves performing an update for the column that the index is created on. For every row in the indexed table use the procedure wvv_context.touch_index(p_index) to update the column.

After running this procedure, there is an entry in the table ctx_user_index_pending for every row in the table upon which the index was created.

Note also that this procedure works across all virtual private portal subscribers.

To synchronize all the content of an index:

Use the procedure wvv_context.touch_index(p_index). Where p_index enables you to specify one of these index names:

wwv_context.PAGE_TEXT_INDEX
wwv_context.DOC_TEXT_INDEX
wwv_context.PERSPECTIVE_TEXT_INDEX
wwv_context.ITEM_TEXT_INDEX
wwv_context.CATEGPRY_TEXT_INDEX
wwv_context.URL_TEXT_INDEX

To synchronize all the content of multiple indexes:

Use the procedure wvv_context.touch_index(p_indexes). Where p_indexes enables you to specify a varray of index names to be synchronized (wwsbr_array).

9.3.5.7 Optimizing Oracle Text Indexes

Synchronizing Oracle Text indexes causes them to become fragmented. Each Oracle Text index is an inverted index where search terms are listed in a form that is efficient to look up. Each search term references the location of the term.

When new terms are added during synchronization, duplicate terms are not removed, so the index may contain the same term several times. This inflates the size of the index and causes the performance of search queries to deteriorate.

The solution is to optimize the Oracle Text indexes. This process compacts the indexes and (optionally) removes old data.

To optimize all of the Oracle Text indexes:

To optimize all of the Oracle Text indexes, use the procedure wwv_context.optimize(). See also, Appendix D, "optimize".

This procedure takes the following parameters:

wwv_context.optimize
(
  p_optlevel in varchar2 default CTX_DDL.OPTLEVEL_FULL, -- FULL, FAST, TOKEN
  p_maxtime in number default null,  -- Maximum time for full optimization, in minutes
  p_token in varchar2 default null -- Token to optimize (when TOKEN)
);

Internally, this procedure calls the Oracle Text procedure ctx_ddl.optimize_index for each Oracle Text index and passes these parameters. It performs full index optimization as opposed to fast or token optimization.

More on OTN

You will find additional information in the Oracle Text documentation on OTN, http://www.oracle.com/technology/.


Note:

If no Oracle Text indexes exist, the procedure wwv_context.optimize has no effect.

wwv_context.optimize only optimizes an Oracle Text index if it is sufficiently fragmented to require optimization. The measure of the fragmentation used is the average number of times a token that appears more than once, is found in the index. If this average is greater than 10, the index is judged to require optimization. The fragmentation query used is as follows:

SELECT AVG(COUNT(*)) FROM DR$<index_name>$I
GROUP BY TOKEN_TEXT HAVING COUNT(*) > 1

Where <index_name> is the name of the index to be measured.

9.3.5.8 Scheduling Index Optimization

In most installations it is desirable to schedule the index optimization process to run automatically at regular intervals. You can schedule a job using the script optjsub.sql. This uses dbms_job to call wwv_context.optimize at regular intervals.

This script optjsub.sql takes three parameters and it can also be used to alter or remove an optimization job:

start_time       - A valid date or 'START' or 'STOP'
start_time_fmt   - Start time format mask.
                   Ignored if start_time is 'START' or 'STOP'
interval_minutes - Minutes between each run. Ignored if 'STOP'

If you set start_time to 'START', the second argument is ignored and the next job is scheduled to run immediately. Subsequent jobs are run after the interval specified.

If you set start_time to 'STOP', the job is removed and other arguments are ignored.

During Oracle Portal installation, a job is set up to optimize all of the Oracle Text indexes, every 24 hours.

To schedule Oracle Text index optimization:

Run the script optjsub.sql. For example, to schedule index optimization every 60 minutes, enter:

SQL> @optjsub.sql START NOW 60

This script is located in the directory ORACLE_HOME/portal/admin/plsql/wws. If no Oracle Text indexes are present when you run this optimization job, the procedure has no effect.

9.3.5.9 Choosing the Optimization Interval

It is difficult to predict how often Oracle Text indexes need to be optimized as the frequency depends on the amount of content that is being loaded, the type of content being loaded, the synchronization schedule, and many other factors.

However, if you measure the index fragmentation at regular intervals, you can determine how rapidly it is becoming fragmented. Using this information, you can set an appropriate optimization interval.

The procedure wwv_context.optimize only optimizes the index if it is judged to be fragmented. So, other than the minimal overhead of calling the job, it is quite safe to run this job more often than perhaps is required.

During Oracle Portal installation, a job is set up to optimize all of the Oracle Text indexes, every 24 hours.

9.3.6 Indexing and Searching URL Content

When Oracle Text is enabled in Oracle Portal, the content of URL attributes attached to items or pages are indexed by default. Note that indexing does not include portlet content on pages. Once the URL content is indexed, it is searchable. When you enter search criteria for URL attributes, it is this URL content that is searched.


Note:

If you do not want portal users to search URL content you can disable the URL index. See Section 9.3.7, "Disabling Document and URL Indexing" for more information.

9.3.6.1 Relative URLs

In Oracle Portal you can enter a relative URL for a URL attribute. When these URLs display as links on a portal page they are relative to the base HREF that is set in the HTML <head> section for a portal page. The format of the base HREF is:

<protocol>://<server>:<port>/portal/pls/<dad>/

For example, in the HTML <head> section you might see:

<base href="http://myserver.abc.com/portal/pls/portal/">

In this example:

  • The relative URL /help/index.html is resolved by the browser to:

    http://myserver.abc.com/help/index.html

  • The relative URL!PORTAL.mypackage.proc (with no leading /) is resolved by the browser to:

    http://myserver.abc.com/portal/pls/portal/!PORTAL.mypackage.proc

The base HREF on a page is dependent on the URL used to request the page. As it is possible to use more than one URL to access the page, the base HREF reflects the URL used to access the page.

Oracle Text Base URL Setting

When indexing URL content, Oracle Text needs to know how to resolve relative URLs into fully qualified absolute URLs. As Oracle Text does not have the context of an initial request from which to determine the correct base HREF, you must specify the base HREF that is used. You set this option, by specifying the Oracle Text Base URL property on the Global Settings: Search page. See Section 9.2.2.3, "Setting a Base URL for Oracle Text" for details.

During Oracle Portal installation, this option is set automatically.

The format of the Oracle Text Base URL is:

<protocol>://<server>:<port>/portal/pls/<dad>/

For example: http://myserver.abc.com/portal/pls/portal/


Note:

Do not specify an Oracle Text Base URL beginning with https, as HTTPS URLs are not indexed by Oracle Text. If you do this, no relative URLs are indexed.

If you change the Oracle Text Base URL, it does not take effect immediately. When a URL is edited, it is marked as requiring synchronization and Oracle Text will use the new preference the next time the index is synchronized. If you want to force all URLs to immediately use a new Oracle Text Base URL value, you can mark the entire content of the URL Index as requiring synchronization, using the procedure:

SQL> wwv_context.touch_index(wwv_context.URL_TEXT_INDEX);

This procedure acts across all subscribers. In a single virtual private portal subscriber, this is equivalent to:

SQL> update wwsbr_url$ set absolute_url = null;
...
SQL> commit;

9.3.6.2 Unsupported URLs

Oracle Text cannot index URLs that use these protocols:

  • https

  • javascript

If a URL item specifies one of these protocols it is not indexed. You will not see a corresponding error in the Oracle Text error logs.

9.3.6.3 Supported URLs

Oracle Text can index URLs that use these protocols:

  • http

  • file - File URLs must be accessible from the database server.

  • ftp - FTP URLs must point to locations that do not require authentication as Oracle Text is not able to authenticate — even as an anonymous user.

9.3.6.4 URL Index Proxy Settings

When indexing URL content, Oracle Text can use proxy servers to access URLs. This may be necessary when Oracle Portal lies behind a firewall and URLs point to content beyond this firewall. As indexing takes place from the Oracle Portal Repository server, it is the proxy settings required on this computer that are important.

The URL index uses the same proxy settings that are used globally for Oracle Portal. These are set on the Proxy Settings page, available from the Services portlet. See Section 9.2.2.4, "Configuring Proxy Settings for Oracle Text" for details.

The proxy settings are used when Oracle Text indexes are created. So, if you change the proxy settings the indexes must be re-created. If you need to drop all your indexes and re-create them, use the scripts ctxdrind.sql (drop indexes) and ctxcrind.sql (create indexes). See Section 9.3.4, "Creating and Dropping Oracle Text Indexes" for more information:

SQL> @ctxdrind.sql
...
SQL> @ctxcrind.sql
...

These scripts drop and re-create all of the indexes and this can take a long time if your indexes are large. Alternatively, you can drop and re-create the Oracle Text preferences and URL index only:

begin
   -- Drop and re-create the Oracle Text preferences
   -- to pick up the new proxy settings.
   wwv_context.drop_prefs();
   wwv_context.create_prefs();
end;
/
-- Check that the proxy settings used by the index are correct
select prv_attribute attribute, prv_value value 
  from ctx_user_preference_values 
  where prv_attribute in ('TIMEOUT','HTTP_PROXY','NO_PROXY')
/

begin 
   -- Drop and re-create the URL index
   wwv_context.drop_index(wwv_context.URL_TEXT_INDEX);
   wwv_context.create_index(wwv_context.URL_TEXT_INDEX);

   -- Mark all of the rows for the index as pending
   wwv_context.touch_index(wwv_context.URL_TEXT_INDEX);

   -- Syncronize and optimize
   wwv_context.sync();
   wwv_context.optimize();
end;
/

9.3.7 Disabling Document and URL Indexing

By default, the content of files uploaded to the Oracle Portal Repository and the content referenced in URL items or custom URL attributes is indexed. This allows users to search and find terms in document and URL content and for most cases, this is desirable.

When portal users do not need to search within file and URL content you may wish to disable the Document and URL indexes. In this case, searching is limited to item, page, category, and perspective metadata, including Title, Author, Keywords, Description, Update Date and all custom Text, Boolean and Date attributes. Metadata-only searching is more efficient and therefore faster than searches that include file and URL content.


Note:

If the Oracle Portal Repository is installed into a database that does not have a functional AUTO_FILTER, document and URL searching is automatically disabled as this functionality does not work without the AUTO_FILTER.

Use the following procedures to specify whether the document and URL indexes are required:

Both procedures belong to the package wwv_context. For more detail, see Appendix D, "Using the wwv_context APIs".

If you disable Document and URL indexes, the script ctxcrind.sql (which normally creates missing Oracle Text indexes) removes existing Document/URL indexes as they are no longer required. If you do not remove Document/URL indexes, the indexes are still updated when the synchronization and optimization jobs are run. Therefore, it is more efficient to remove the unused indexes by running ctxcrind.sql. See Section 9.3.4, "Creating and Dropping Oracle Text Indexes".

Whenever you make changes to these Document and URL index settings, the appearance and behavior of search portlets in Oracle Portal are affected. If portlets are being cached, such changes might not appear immediately. Therefore, you should clear the portal cache manually, after making any index changes. See Section 6.7.4.4, "Clearing the Cache for a Particular Portal Object".

For example, when you disable the Document index, search portlets display fewer search operators for file- based attributes, that is, only Match All Within File Name and Match Any Within File Name. Similarly, if the URL index is disabled, the only operators available for URL-based attributes are Match All Within URL and Match Any Within URL. Other search operators (such as Content Contains All) are not displayed as file and URL content is not searchable when these indexes are disabled.

When you disable the Document index, Themes, Gists and View as HTML features are no longer available, so you must disable Themes and Gists on the Global Settings: Search page. See Section 9.2.2.2, "Setting Oracle Text Search Result Options" for details.

To enable or disable Document and URL indexes:

Use the following procedures:

-- To enable the document index
execute wwv_context.set_use_doc_index(true);
-- To disable the document index
execute wwv_context.set_use_doc_index(false);
-- To enable the URL index
execute wwv_context.set_use_url_index(true);
-- To disable the URL index
execute wwv_context.set_use_url_index(false);

9.3.8 Viewing the Status of Oracle Text Indexes

You can determine the status of Oracle Text indexes from several tables and views accessible from the portal schema.

To view a status report for Oracle Text indexes, run the script textstat.sql as the portal schema owner:

SQL> @textstat.sql

This script is located in the directory ORACLE_HOME/portal/admin/plsql/wws. Here is an example of the information that is generated by this script:

SQL> @textstat
Portal Text Indexes:

INDEX_NAME            STATUS   DOMIDX_STATUS DOMIDX_OPSTATUS IDX_STATUS
--------------------- -------- ------------- --------------- ------------
WWSBR_CORNER_CTX_INDX VALID    VALID         VALID           INDEXED
WWSBR_DOC_CTX_INDX    VALID    VALID         VALID           INDEXED
WWSBR_PERSP_CTX_INDX  VALID    VALID         VALID           INDEXED
WWSBR_THING_CTX_INDX  VALID    VALID         VALID           INDEXED
WWSBR_TOPIC_CTX_INDX  VALID    VALID         VALID           INDEXED
WWSBR_URL_CTX_INDX    VALID    VALID         VALID           INDEXED

Document and URL index preferences:
Document Index: true  - index will be used if valid
URL Index:      true  - index will be used if valid

Indexes with rows waiting to be indexed:

Index                     Rows to Index
-----------------------   -------------
WWSBR_CORNER_CTX_INDX        2677

PL/SQL procedure successfully completed.

Scheduled Text Jobs:

LAST_DATE LAST_SEC  NEXT_DATE NEXT_SEC  B  FAILURES INTERVAL            WHAT
--------- --------- --------- ---------- - ------ ------------------------- 
25-AUG-05 04:57:32  26-AUG-05 04:57:32  N   0     SYSDATE + 24/24 wwsbr_stats.gather_stale;
25-AUG-05 04:57:32  26-AUG-05 04:57:32  N   0     SYSDATE + 1440/(24*60) wwv_context.optimize(CTX_DDL.OPTLEVEL_FULL,1440,null);
25-AUG-05 06:59:30  25-AUG-05 07:59:30  N   0     SYSDATE + 60/(24*60)  wwv_context.sync;

Running Text Jobs:
no rows selected

Indexes sync on commit setting:
Item Index:        true    - Index will sync automatically when data commits
Page Index:        true    - Index will sync automatically when data commits
Document Index:    false   - Index needs to be syncronized manually
Category Index:    true    - Index will sync automatically when data commits
Perspective Index: true    - Index will sync automatically when data commits
URL Index:         false   - Index needs to be syncronized manually

SQL>

From this script you can view the following information:

  • Portal Text Index Status - Shows whether all of the Oracle Text indexes exist and their current status. All working, valid indexes display VALID for the first three status columns and INDEXED for the final column as shown in this example. See also, Section 9.3.3.1, "Oracle Text Index Overview".

  • Document and URL Index Status - Indicates whether the Document and URL indexes are enabled (true) or disabled (false). See also, Section 9.3.7, "Disabling Document and URL Indexing".

  • Number of Pending Rows Per Index - Lists any indexes that are waiting to be indexed. An entry is listed for every index that has rows waiting to be indexed, or are pending. The number of pending rows is also shown. See also, Section 9.3.5.1, "Synchronizing Oracle Text Indexes".

  • Scheduled Oracle Text Job Details - Lists any jobs that are scheduled for Oracle Text index maintenance. The report shows the last date and time that the job was run and the next date when the job is due to be run. The column labeled B shows whether the job is broken or not; if the job is marked Y it is broken and does not run. The Interval column indicates the next time that a job will run and finally, the What column indicates the procedure that will be run for each job. See also, Section 9.3.5.4, "Scheduling Index Synchronization".

  • Active Oracle Text Job Details - Details any jobs that were running when the textstat.sql report ran.

  • Indexes Sync On Commit Setting - This section shows which indexes are configured to synchronize immediately after data commits to your portal and which ones need to be synchronized manually using wwv_context.sync (manually or using a job). See also, Section 9.3.5.2, "Synchronizing an Oracle Text Index On Commit".

    On earlier database versions, the following information is displayed:

    Indexes sync on commit setting:
    ...Not available for this database version, available from 10g onwards.
    

9.3.9 Monitoring Oracle Text Indexing Operations

Oracle Text logs information to a file when indexes are created and populated. This enables you to monitor the progress of indexing operations, keep track of indexes, and troubleshoot any problems that may arise.

9.3.9.1 Using start_log to Monitor Index Operations

You can use the ctx_output.start_log (filename) command to log output from the indexing process. In the subsequent example, the log file is named textindex.log:

ctx_output.start_log('textindex.log');
ctx_output.add_event(ctx_output.event_index_print_rowid);
...
-- Create or syncronize the indexes
...
ctx_output.end_log;

You can determine the location of the log file using the LOG_DIRECTORY parameter in ctx_adm.set_parameter, for example, /tmp. Once the directory is set, all subsequent Oracle Text logs output to this directory:

ctxsys.ctx_adm.set_parameter('LOG_DIRECTORY', '/tmp');

9.3.9.2 Using logcrind.sql to Monitor Index Creation

You can use the script logcrind.sql (instead of ctxcrind.sql) to create the Oracle Text indexes with logging enabled. The script takes one parameter which is the name of the log file, for example:

SQL> @logcrind.sql textindex.log

This script sets the LOG_DIRECTORY to be the same as the database udump directory, as specified by the user_dump_dest initialization parameter.

The add_event call (used in the preceding example) is also used in the script logcrind.sql and this outputs the rowid of every row indexed to the log. This logging allows indexing operations to be tracked and also indicates whether the indexing of each row is successful or not.

Here is a sample from an Oracle Text indexing log:

13:53:27 05/06/03 begin logging
13:53:27 05/06/03 event
13:53:42 05/06/03 log
13:53:42 05/06/03 event
13:53:48 05/06/03 Creating Oracle index "MYPORTAL"."DR$WWSBR_CORNER_CTX_INDX$X"
13:53:48 05/06/03 Oracle index "MYPORTAL"."DR$WWSBR_CORNER_CTX_INDX$X" created
13:53:49 05/06/03 Creating Oracle index "MYPORTAL"."DR$WWSBR_DOC_CTX_INDX$X"
13:53:49 05/06/03 Oracle index "MYPORTAL"."DR$WWSBR_DOC_CTX_INDX$X" created
13:53:49 05/06/03 Creating Oracle index "MYPORTAL"."DR$WWSBR_PERSP_CTX_INDX$X"
13:53:49 05/06/03 Oracle index "MYPORTAL"."DR$WWSBR_PERSP_CTX_INDX$X" created
13:53:50 05/06/03 Creating Oracle index "MYPORTAL"."DR$WWSBR_THING_CTX_INDX$X"
13:53:50 05/06/03 Oracle index "MYPORTAL"."DR$WWSBR_THING_CTX_INDX$X" created
13:53:51 05/06/03 Creating Oracle index "MYPORTAL"."DR$WWSBR_TOPIC_CTX_INDX$X"
13:53:51 05/06/03 Oracle index "MYPORTAL"."DR$WWSBR_TOPIC_CTX_INDX$X" created
13:53:51 05/06/03 Creating Oracle index "MYPORTAL"."DR$WWSBR_URL_CTX_INDX$X"
13:53:51 05/06/03 Oracle index "MYPORTAL"."DR$WWSBR_URL_CTX_INDX$X" created
13:54:16 05/06/03 sync index: MYPORTAL.WWSBR_CORNER_CTX_INDX
13:54:17 05/06/03 Begin document indexing
13:54:17 05/06/03 INDEXING ROWID AAAUUcAAJAAAlhMAAA
13:54:17 05/06/03 INDEXING ROWID AAAUUcAAJAAAlhMAAI
..
13:54:18 05/06/03 INDEXING ROWID AAAUUcAAJAAAlhQAAk
13:54:18 05/06/03 Errors reading documents: 0
13:54:18 05/06/03 Index data for 159 documents to be written to database
13:54:18 05/06/03    memory use: 225971
13:54:18 05/06/03 Begin sorting the inverted list.
13:54:18 05/06/03 End sorting the inverted list.
13:54:18 05/06/03 Writing index data to database.
13:54:18 05/06/03    index data written to database.
13:54:18 05/06/03 End of document indexing. 159 documents indexed.

9.3.10 Viewing Indexing Errors

Any errors that occur when an index is created or synchronized are logged in the view CTX_USER_INDEX_ERRORS. You can see details for these errors, using the command:

SQL> desc ctx_user_index_errors;
 Name                   Null?    Type
 ---------------------- -------- ---------------
 ERR_INDEX_NAME         NOT NULL VARCHAR2(30)
 ERR_TIMESTAMP                   DATE
 ERR_TEXTKEY                     VARCHAR2(18)
 ERR_TEXT                        VARCHAR2(4000)

SQL>

This view gives the index name, the rowid (ERR_TEXTKEY column) corresponding to the row in the indexed table, and an error message that indicates the cause of the failure. Furthermore, the error log file indicates the rowid for the row in the table that is being indexed and a success or failure message.

Typically, you do not see errors for the item (WWSBR_THING_CTX_INDX), page (WWSBR_CORNER_CTX_INDX), category (WWSBR_TOPIC_CTX_INDX) or the perspective (WWSBR_PERSP_CTX_INDX) indexes as these index content that is produced by Oracle Portal and this content is easy to index. It is more common to see indexing errors for document and URL content.

For the Document index, the content may have to be filtered to turn a binary document into plain text for indexing. There are a number of reasons this may fail. For example, the document format may not be supported by AUTO_FILTER, the Oracle Text filter. See also, Section 9.3.3.7, "Maximizing AUTO_FILTER Performance".

For the URL index, the URL content has to be fetched and this could fail for a number of reasons. For example, the URL may indicate a location that is not accessible as the Oracle Portal server is behind a firewall and the proxy settings are not set correctly. Or, maybe the URL is incorrect, or perhaps the site that is being access is down.

In addition, URL content is filtered and this may produce errors. For example, all URL attributes are presumed to be plain text, so you'll see an error for any URL attribute that is not plain text.

9.3.11 Translating Indexing Errors to Objects in Oracle Portal

The indexing errors shown in the view CTX_USER_INDEX_ERRORS or the Oracle Text indexing logs, show the rowid of the row in the table being indexed when the error occurred. You can use this information to determine which row is causing an indexing problem and you can also determine exactly which portal item or page this row corresponds to.

The following sections describe some queries that you may find useful if you need to determine the cause of an indexing issues:

9.3.11.1 Item Indexing Errors

The rowid gives the row in the items table that is causing problems. You can use a direct query to find out more information about that row. For example:

select i.name, i.title,          -- item title
       p.name page_name,         -- page name
       p.title page_title,       -- page display name
       pg.name page_group,       -- page group name
       sl.title page_group_title -- page group display name (default language)
  from wwv_things i,
       wwpob_page$ p,
       wwpob_item$ pi,
       wwsbr_sites$ pg,
       wwsbr_site_languages$ sl
 where i.masterthingid = pi.master_thing_id
   and i.siteid = pi.site_id
   and pi.page_id = p.id
   and sl.siteid = pg.id
   and sl.language = pg.defaultlanguage
   and pi.page_site_id = p.siteid
   and pg.id = i.siteid
   and i.rowid = 'AAAOwMAAJAAAWISAAF

9.3.11.2 Page Indexing Errors

The rowid gives the row in the pages table. You can use a direct query to find out more information about the page being indexed. For example:

select p.name page_name,
       p.title page_title,
       pg.name page_group,
       sl.title page_group_title
  from wwpob_page$ p,
       wwsbr_sites$ pg,
       wwsbr_site_languages$ sl
 where sl.siteid = pg.id
   and sl.language = pg.defaultlanguage
   and pg.id = p.siteid
   and p.rowid = 'AAAOv/AAJAAAaSSAAB'

9.3.11.3 Category Index Errors

You can use a direct query against the category table to determine faulty categories. You can also use a join to show the page group. This query shows the category name and display name, and the page group name and display name.

select c.title, c.name, pg.name, sl.title
  from wwv_topics c,
       wwsbr_sites$ pg,
       wwsbr_site_languages$ sl
 where sl.siteid = pg.id
   and sl.language = pg.defaultlanguage
   and pg.id = c.siteid
   and rowid='AAAOv/AAJAAAaSSAAB'

9.3.11.4 Perspective Indexing Errors

These are similar to categories. If you use a direct query against the perspective table it shows the faulty perspectives. You can also use a join to show the page group.

select p.title, p.name, pg.name, sl.title
  from wwv_perspectives p,
       wwsbr_sites$ pg,
       wwsbr_site_languages$ sl
 where sl.siteid = pg.id
   and sl.language = pg.defaultlanguage
   and pg.id = p.siteid
   and p.rowid = 'AAAOv/AAJAAAaSSAAB'

9.3.11.5 Document Index Errors

You are more likely to see errors with the Document index. In this case the index is on the table where the documents are actually stored. Therefore, you have to join back to the item table to determine the associated item.

The following query gives the document file name and item's Name and Display Name that a document query is associated with:

select d.filename, i.name, i.title  from wwv_things i,
       wwdoc_document$ d,
       wwv_docinfo di
 where
    d.name = di.name(+)
    and di.thingid = i.id(+)
    and di.masterthingid = i.masterthingid(+)
    and di.siteid = i.siteid(+)
    and d.rowid = 'AAAOYyAAJAAAWAaAAF'

Note that not all documents are necessarily associated with items, in which case you need to modify the query to join in a similar way to the page table.

9.3.11.6 URL Index Errors

Like the Document index, you have to join back to the item table to determine the associated item.

The following query shows the URL, and item Name and Display Name.

select u.url, u.absolute_url, i.name, i.title
  from wwv_things i,
       wwsbr_url$ u
 where u.object_id = i.id
   and u.object_siteid = i.siteid
   and u.object_type = 'ITEM'
   and u.rowid = 'AAAOYyAAKAAAWAaAAB' 

Note that a URL may not be attached to an item, it may be attached to a page, in which case you need to modify the query to join in a similar way to the page table.

9.3.12 Common Indexing Errors

Some common indexing errors are described in the following sections:

9.3.12.1 Common Document Indexing Errors

Typically, document indexing errors are in the format:

DRG-11207: user filter command exited with status n

The actual exit status indicates the cause of the problem. For a description of common exit status values and their meanings, log on to Oracle Metalink, at http://metalink.oracle.com and read the article Troubleshooting DRG-11207 errors. This article has DocId 210319.1.

9.3.12.2 Common URL Indexing Errors

Here are some common URL indexing errors. The list is not exhaustive but it highlights some of the more common errors you may see:

DRG-11604 URL store: access to %(1)s is denied

Access to the document is denied to the indexing user agent. The crawler is not capable of authenticating or managing cookies returned by the site. Check that the URL can be accessed. If it is protected, it may not be possible to index the content.

DRG-11609 URL store: unable to open local file specified by %(1)s
DRG-11610 URL store: unable to read local file specified by %(1)s

These occur for file:// URLs where the file indicated cannot be opened or read. Remember that the file needs to be accessible from the computer on which the Oracle Portal Repository database is running. Check that the file exists and that it is accessible from the database computer as the database user.

DRG-11611 URL store: unknown protocol specified in %1)s

The protocol specified in the URL is not one that the Oracle Text user agent recognizes. This can happen if no protocol is specified. A common cause of this problem is that a relative URL is specified but the Oracle Text Base URL option is not set to fully qualify the URL. Also, Oracle Text can only index http, file and ftp URLs. Look at the URL that has failed and make sure that it is in a supported fully qualified format, including a valid protocol. See also, Section 9.3.6, "Indexing and Searching URL Content"

DRG-11612 URL store: unknown host specified in %(1)s 

The URL specified a host in the URL that cannot be resolved from the Oracle Portal repository database server. It may be that a firewall lies between the Oracle Portal repository server and the location specified by the URL. In this case it might be necessary to use a proxy server to access the URL. Check that the URL is correct and that the host is accessible from the Oracle Portal database server. Also check that the Oracle Portal proxy settings are correct and that the index is using the proxy settings. See also, Section 9.2.2.4, "Configuring Proxy Settings for Oracle Text".

DRG-11613 URL store: connection refused to host specified by %(1)s

This means that the host specified in the URL was resolved but the HTTP request was refused. Check that the URL is correct and that it is accessible.

DRG-11614 URL store: communication with host specified in %(1)s timed out

The request timed out. Check that the URL is correct and accessible.

DRG-11616 URL store: too many redirections trying to access %(1)s

When accessed, a URL can cause a redirect to another URL. This in turn can cause a redirect, and so on. If a large number of redirects occur, this error is displayed. This can occur if a redirection loop is found.

DRG-11622 URL store: unknown HTTP error getting %(1)s

An HTTP error that is not explicitly handled by Oracle Text has occurred. The HTTP error is reported in the error message.

9.3.13 Handling Indexing Hangs or Crashes

If for any reason a document or URL cannot be indexed, an error is logged. This situation should not prevent the indexing operation completing normally. However, any content that fails to be indexed is not searchable.

Sometimes an indexing operation can fail catastrophically in which case, the index operation is terminated before the indexes are properly populated. In most cases, such problems should be reported to Oracle Support Services. However, in some instances you may be able to work around the problem temporarily by excluding the content that is causing a failure. See Section 9.3.13.2, "Preventing Indexes From Hanging and Crashing" for more information.

Rarely, an indexing operation causes a disastrous failure, that is, the server process performing the indexing is terminated. When this happens, this message is displayed in the client running the indexing operation:

ORA-03113 End of file on communication channel

Note:

If you are unsure whether an indexing operation completed successfully, repeat the operation from SQL Plus where end of file errors are clearly reported.

If the server process is terminated, the event should be recorded in the database logs. Use the database alert log to determine the location of any trace files that are written. The trace files may indicate errors such as ORA-0600 or ORA-7445. For example, this trace file shows errors that occurred whilst creating Oracle Text indexes using the script logcrind.sql:

ksedmp: internal or fatal error
ORA-7445: exception encountered: core dump [drsfdatam()+308] [SIGSEGV] 
[Address not mapped to object] [0x0] [
] []
Current SQL statement for this session:
declare
l_dump_dest varchar2(512);
p_logfile varchar2(100) := 'sync_2012.log';
begin
dbms_output.enable(10000);
select value into l_dump_dest from v$parameter
where name = 'user_dump_dest';
ctxsys.ctx_adm.set_parameter('LOG_DIRECTORY',l_dump_dest);
ctx_output.start_log(p_logfile);
ctx_output.add_event(ctx_output.event_index_print_rowid);
dbms_output.put_line('Log file is: '||ctx_output.logfilename);
wwv_context.sync();
ctx_output.end_log;
end;
----- PL/SQL Call Stack -----
object line object
handle number name
8198f83c 244 package body CTXSYS.DRIDISP
8198f83c 377 package body CTXSYS.DRIDISP
8198f83c 334 package body CTXSYS.DRIDISP
8178acc8 403 package body CTXSYS.DRIDML
827124b0 2033 package body CTXSYS.DRIDDL
827124b0 2090 package body CTXSYS.DRIDDL
817ea0f0 1324 package body CTXSYS.CTX_DDL
8185a488 828 package body TOOLS.WWV_CONTEXT
82d83ed8 18 anonymous block
----- Call Stack Trace -----

9.3.13.1 Identifying Whether an Index Operation is Hanging

The easiest way to determine if an indexing operation is hanging is to run the indexing operation with Oracle Text logging enabled. See Section 9.3.9, "Monitoring Oracle Text Indexing Operations".

With logging enabled, the rowid of each row is recorded when it is indexed and you can see when an indexing operation hangs on the same row for a prolonged period. It may be normal for some rows to take a few minutes to process but if an operation takes much longer than expected, this may indicate a problem.

In general, the view CTX_USER_INDEX_ERRORS is not very useful if you are trying to find out why an indexing process is hanging or crashing. This is because information is only visible in this view after it is committed and a commit does not occur whilst an indexing operation is hanging. In fact, a commit may not occur at all if the operation crashes.

Operations such as URL indexing and document filtering can take quite a long time to process. Both of these operations are subject to timeout mechanisms to avoid lengthening this process even further:

  • URL indexing timeout -The default timeout for fetching URL content is 30 seconds. If URL content is not retrieved within 30 seconds, the attempt is abandoned, a failure error is reported in the view CTX_USER_INDEX_ERRORS and the indexing process continues to the next row. In most cases, 30 seconds is sufficient time to fetch URL content. However, once the content is retrieved it must be indexed, so the total time can be slightly more than the URL timeout value.

  • Document filtering timeout - The timeout for document filtering operations is not a hard timeout limit. The timeout setting, which by default is 120 seconds, is the time that is waited while no output is produced by the AUTO_FILTER. If the timeout is exceeded the current filtering operation is terminated, the content for the current document is not indexed, and the indexing process proceeds to the next document. If the AUTO_FILTER output file is still growing after 120 seconds, the filtering operation is allowed to continue.

These timeout mechanisms help to avoid problems with URL and document indexing, two areas where issues are likely to arise. However, you may still encounter situations where an indexing operation hangs indefinitely.

9.3.13.2 Preventing Indexes From Hanging and Crashing

If certain content is causing indexing operations to fail, you can exclude the content from the indexing process. First, you must identify the row that is causing the problem. This section describes how to do this and the additional steps required to exclude such content.

Step 1 Identify the rowid Causing Indexing Problems

You can do this using the Oracle Text logging facility, with print rowid event enabled. If you look at the generated log file you can determine the rowid (of the row being processed) when failure occurred. In most cases it is this rowid that is causing indexing problems.

However, in some cases the actual rowid being processed may not be written to the log file when the failure occurs. In this case you must determine the next rowid:

  • If the entire table is being synchronized, for example, when an index is first created, the rowid is the next rowid from the table. To determine the rowid, select from the table without an order by clause.

  • When only a few pending rows are being updated, look at the view ctx_user_pending to determine the next rowid.

When you have identified which row is causing your indexing problems, you should verify that it is the correct row. You do this by reproducing the failure while synchronizing that row only.

If the Oracle Text indexes do not exist, create the indexes (but do not populate them) using these command:

SQL> exec wwv_context.drop_prefs;
PL/SQL procedure successfully completed.
SQL> exec wwv_context.create_prefs;
PL/SQL procedure successfully completed.
SQL> declare
  2      l_indexes wwsbr_array;
  3  begin
  4      wwv_context.create_missing_indexes(l_indexes);
  5  end;
  6  /
PL/SQL procedure successfully completed.
SQL>

This creates all of the indexes, with no rows pending.

Step 2 Mark the Problem rowid As Pending

The next step is to mark the row suspected of causing indexing problems as pending. The column you need to update depends on which index you are updating. The names of these columns are indicated in the subsequent examples. You must replace the rowid given in these examples, with the rowid you wish to verify:

URL index (WWSBR_URL_CTX_INDX) The absolute_url column is populated by a trigger, so set it here to null:

update wwsbr_url$ set absolute_url=null where rowid = 'AAAOwQAAJAAAU0+AAL';

Document index (WWSBR_DOC_CTX_INDX) Update the blob_content column, but preserve the original blob_content value:

update wwdoc_document$ set blob_content = blob_content where rowid = 'AAAOYyAAJAAAWAaAAF'

Item index (WWSBR_THING_CTX_INDX) This index uses a user datastore created on the ctxtxt column. The value of this column is irrelevant and in Oracle Portal is always 1.

update wwv_things set ctxtxt = '1' where rowid = 'AAAOwMAAJAAAU0eAAB'

Page index (WWSBR_FOLDER_CTX_INDX) Similar to the item index.

update wwpob_page$ set ctxtxt = 1 where rowid = 'AAAOwMAAJAAAWITAAA'

Category index (WWSBR_TOPIC_CTX_INDX) Similar to the item index.

update wwv_topics set ctxtxt = 1 where rowid = 'AAAOwMAAJAAAWITAAA'

Perspective index (WWSBR_PERSP_CTX_INDX) Similar to the item index.

update wwv_perspectives set ctxtxt = 1 where rowid = 'AAAOwMAAJAAAWITAAA'

If you have a site with several subscribers installed then you may need to switch subscriber before you can see the row that you are interested in. To change subscribers, use the following procedure to set the session context for a lightweight user:

     wwctx_api.set_context
     (
            p_user_name     IN varchar2,
            p_password      IN varchar2 default null,
            p_company       IN varchar2 default null
      );
More on OTN

wwctx_api is a public PL/SQL API package. For more information, refer to the Oracle Portal PL/SQL API Reference available on OTN at http://www.oracle.com/technology/products/ias/portal.

After the column update, the suspect row is placed in the pending queue.

Step 3 Synchronize the Index

Now you can synchronize the index and see if the same problem occurs, using the command:

SQL> exec wwv_context.sync();

This command synchronizes the suspect row only as it is the only row in the pending queue. The row can be updated again to repeat the test. See also, Appendix D, "sync".

Step 4 Exclude the Content Causing Problems

You can prevent the indexing operation from hanging or crashing in the future, by modifying, or even removing the row causing indexing problems. For example, if it is a document, you can edit the associated item in Oracle Portal and remove the document.


Note:

Contact Oracle Support Services if your system hangs or crashes during indexing operations. If you can provide specific detail relating to the content causing the problem, it will help them to reproduce the problem more readily.

9.4 Oracle Secure Enterprise Search

This section introduces Oracle Secure Enterprise Search and the sample Oracle Secure Enterprise Search Secure portlet. Specific topics in this section include:

9.4.1 Oracle Secure Enterprise Search Overview

Oracle Secure Enterprise Search is built on Oracle Database and Oracle Text technology and provides uniform search-and-locate capabilities over multiple repositories: Oracle Databases, other ODBC compliant databases, IMAP mail servers, HTML documents served up by a Web server, files on disk, and more.

Oracle Secure Enterprise Search uses a crawler to collect documents. You can schedule the crawler to suit the Web sites that you want to search. The documents stay in their own repositories, and the crawled information is used to build an index that stays within your firewall in a designated Oracle Database. Oracle Secure Enterprise Search also provides APIs for building content management solutions.

In addition, Oracle Secure Enterprise Search offers the following:

  • A complete text query language for text search inside the database

  • Full integration with the Oracle Database server and the SQL query language

  • Advanced features like concept searching and theme analysis

  • Attribute mapping to facilitate attribute search across disparate repositories

  • Indexing of all popular file formats (150+)

  • Full globalization, including support for Chinese, Japanese and Korean (CJK), and Unicode

More on OTN

You will find additional information on OTN at:

http://www.oracle.com/technology/products/oses/index.html

Oracle Secure Enterprise Search is integrated with Oracle Portal so that you can add powerful multi repository search facilities to portal pages. It also has the capability to crawl Oracle Portal's own repository and search public and private content.

9.4.1.1 About the Oracle Secure Enterprise Search Sample Query Applications

Oracle Secure Enterprise Search includes fully functional sample query applications to query and display search results. The query applications are written as J2EE-compliant Web applications. The sample query applications also include the Oracle Secure Enterprise Search Secure portlet.

The Oracle Secure Enterprise Search Secure portlet demonstrates how to write a search portlet for use in Oracle Portal. When the user issues a search query a list of results matching the user's search criteria are returned.

For information about how to use the Oracle Secure Enterprise Search Secure portlet in Oracle Portal, see Appendix A of Oracle Secure Enterprise Search Administrator's Guide.

9.4.1.2 About Oracle Secure Enterprise Search Configuration

The Oracle Secure Enterprise Search Administrator's Guide provides detailed configuration instructions for Oracle Secure Enterprise Search.

9.4.2 Oracle Secure Enterprise Search Secure Portlet

Oracle Secure Enterprise Search provides a search portlet that can be embedded in Oracle Portal pages.

For more information, see Appendix A of Oracle Secure Enterprise Search Administrator's Guide.