Oracle9iAS Portal Configuration Guide
Release 3.0.9

Part Number A90096-01

Library

Solution Area

Contents

Index

Go to previous page Go to next page

6
Setting up the Search Feature in Oracle Portal Content Areas

This chapter provides information on setting up the built-in interMedia Text search capabilities in Oracle Portal content areas.

This chapter contains the following sections:

6.1 New Search Features

This release of Oracle Portal includes the following new features for search:

To access these new features, click Search Settings in the Services portlet. By default, the Services portlet is located on the Oracle Portal home page's Administer tab.

6.2 Prerequisites

You must be logged on as an Oracle Portal administrator to configure interMedia Text, and create, alter, update, and drop interMedia Text indexes.

Before using interMedia Text in Oracle Portal, perform the following tasks:

6.2.1 interMedia Text on UNIX

For the interMedia Text feature to work, set the following UNIX environment variables before starting the Oracle8i database and Net8 Listener.

Korn Shell (ksh)

LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ORACLE_HOME/ctx/lib
export LD_LIBRARY_PATH
C-Shell (csh)

setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:${ORACLE_HOME}/ctx/lib

6.3 Searching in Oracle Portal

The main points to know when searching in Oracle Portal content areas include:

6.3.1 Basic Search

This type of search looks for the specified words in the item attributes such as the display name, description, and keywords of items, as well as the display name and description of folders, categories, and perspectives are searched. If interMedia Text is enabled, basic search also looks in the content of documents and URLs. A search results page displays all items with this criteria.

Typically, a Search field appears on the Navigation bar enabling end uses to perform a basic search as shown below:

Figure 6-1 Basic search screen in Oracle Portal


Text description of basicsea.gif follows.
Text description of the illustration basicsea.gif


See also:

The Oracle Portal Online Help topics: Performing a basic search, Setting up the search feature, and Editing navigation bars


6.3.2 Advanced Search

With advanced search (which is always enabled), you can:

Figure 6-2 Advanced search screen in Oracle Portal


Text description of advsear.gif follows.
Text description of the illustration advsear.gif


See also:

The Oracle Portal Online Help topic: Performing an Advanced Search


6.3.3 interMedia Text Search

As discussed earlier, Oracle Portal has built-in support for interMedia Text indexing. It is worth repeating that when search is enabled in Oracle Portal, search is enabled for all content areas created in your Oracle Portal installation. It cannot be enabled on one content area and disabled on another content area. Search results, however, are restricted to the content area on which the search is performed. The search is performed on the actual content in documents such as PDF, PowerPoint, and Word as well as the contents on URL pages, text, and HTML.

If interMedia Text is not enabled, end users can always perform a basic or advanced search in the content area.

6.3.4 Viewing interMedia Text Search Results

If themes and gists are enabled from the Search Settings page (see Figure 6-4), then you can access the themes and gists for documents returned by a search from the search results. You can:

6.4 interMedia Text Performance

interMedia Text performance may be affected by the following query, indexing, and update considerations:

6.4.1 Query Considerations

How does the size of my data affect queries?

The speed at which the text index can deliver ROWIDs is not affected by the actual size of the data, but by the size of the Token Table which holds the list of words, and information about the rows in which they appear. Text query speed will be related to the number of rows that must be fetched from this Token Table, and the length of each row.

Thus, it should be nearly as fast to find a rare word in a large document set as it is to find a common word (or many uncommon words) in a smaller document set.

How does the source type of my data affect queries?

The format of the documents (for example, plain ASCII text, HTML or Microsoft Word) should make no difference to query speed. The documents are filtered to plain text at indexing time, not query time.

The "cleanliness" of the data makes a difference. Spell-checked and sub-edited text for publication tends to have a much smaller total vocabulary (and therefore size of token table) than informal text such as e-mails, which contain many spelling errors and abbreviations to bloat the token table.

6.4.2 Indexing Considerations

How long should indexing take?

Indexing text is a resource-intensive process. Obviously, the speed of indexing depends on the power of the hardware involved, but you should expect somewhere between 50MB per hour on workstation-class NT machine (approximately 400MHz CPU, 128MB memory) to more than 1GB per hour on a large multi-CPU, multi-gigabyte server machine. The latter figure assumes you are using parallel indexing on a partitioned table (a new option for 8.1.6).

For most real-life systems, the time to index a complete table of documents will be measured in hours, and in some cases days.

How do I track the progress of the indexing process?

You can use the ctx_output.start_log <filename> command to log output from the indexing process. The <filename> will normally be written to $ORACLE_HOME/ctx, but you can change the directory using the log_directory parameter in ctx_adm.set_parameter.

Otherwise, for a course-grained answer, you can count the number of rows in the DR$xxx$K table. There will be one row in here for each row that has been indexed. However, these rows are only committed when the indexing process runs out of indexing memory and does a "flush" to the database. It is even possible that this will never happen until indexing is complete.

How much disk space overhead will the indexing require?

The overhead (the amount of space needed for the DR$ index tables) varies between approximately 25% of the original text volume, and 100%. Generally, the larger the total amount of text, the smaller the overhead, but many small records will use more overhead than fewer large records. Also, "clean" data (such as published text) will require less overhead than "dirty" data such as e-mails or discussion notes, since the "dirty" data is likely to include many unique words from misspellings, abbreviations, and so on.

Theme indexes are generally much smaller than text indexes. Creating a theme index only will generally require very little storage, but creating a text index only will not save you much space over a combined index, though it is likely to be significantly faster.

How does the format of my data affect indexing?

Looking at indexing overhead, you can expect much lower overheads for formatted documents (for Microsoft Word files) since such documents tend to be very large compared to the actual text held in them.

So 1GB of Word documents might only require 50MB of index space, whereas 1GB of plain text might require 500MB, since there is ten times as much "plain text" in the latter set.

Indexing time is harder to determine. Although the reduction in the amount of text to be indexed will have an obvious effect, we must balance this out against the cost of filtering the documents. In general, these will roughly cancel out, so the time to index 1GB of formatted docs will be about the same as to index 1GB of plain text, although it may be a little longer.

6.4.3 Update Considerations

How often should I index new or updated records?

How often do you need to? The less often you run re-indexing (via the command ALTER INDEX indexname REBUILD ONLINE PARAMETERS('SYNC')) then the less fragmented your indexes will be, and the less you will need to optimize them. However, this means that your data will become progressively more out of date, which may be unacceptable for your users.

Many systems can handle overnight indexing. This means data that is less than a day old is not searchable. Other systems use hourly, ten minute, or five minute updates.

Note:

The Context Server (ctxsrv) has been deprecated and should no longer be used. Use drbgdml.sql instead.

How can I tell when my indexes are getting fragmented?

The best way is to time some queries, run index optimization, then time the same queries. You must restart the database to clear the SGA each time. If the queries speed up significantly, then optimization is worthwhile.

A more scientific method involves counting the number of rows for each term in the DR$xxx$I table:

SELECT AVG(COUNT(*)) FROM DR$index_name$I
   GROUP BY TOKEN_TEXT HAVING COUNT(*) > 1;
Note:

Ignore all words with only a single row in the index table.

A value greater than 10 from this query may indicate the need to optimize the index, but experimentation should yield the best value in any particular circumstances. Very large tables will inevitably have a lot of rows where the TOKEN_INFO data overflows the 4K internal limit, so you would expect the average to be greater on large systems.


For more information:

interMedia Text Performance FAQ at: http://technet.oracle.com/products/text/ 


6.5 Setting up interMedia Text Searching

There are four main steps for setting up interMedia Text in Oracle Portal:

6.5.1 Step 1: Set up the Global Page Settings

The first step requires you to configure the global page settings in the following way:

  1. In the Services portlet, click Global Settings. By default, the Services portlet is located on the Oracle Portal home page's Administer tab.

Figure 6-3 Global page settings - Proxy Server


Text description of interm1.gif follows.
Text description of the illustration interm1.gif

  1. In the Proxy Server section:

    1. Enter the host name of your proxy server for the HTTP Server.

      Note: Do not prefix http:// to the proxy server name.

    2. Enter the domains that you do not want redirected to the proxy server.

    3. Enter the maximum amount of time (in seconds) that a connection should be attempted before timing out.

  2. Click OK.

6.5.2 Step 2: Enable interMedia Text Searching

Before creating the interMedia Text indexes, configure the interMedia Text settings in Oracle Portal in the following way:

  1. In the Services portlet, click Search Settings. By default, the Services portlet is located on the Oracle Portal home page's Administer tab.

    Figure 6-4 Services portlet - interMedia Text properties


    Text description of interm2.gif follows.
    Text description of the illustration interm2.gif


    Note:

    If you see the message, "interMedia Text is not installed", interMedia Text was not installed with the database and is not available for your content areas. Arrange with your database administrator to have interMedia Text installed. After it is installed, you need to run the following command in SQL*Plus:

    inctxgrn.sql

    This file is located in the <ORACLE_HOME>/portal30/src/wws directory.

    Log on using the user name and password for the schema that owns the Oracle Portal content area. For example, if the schema name is "SCOTT", log on with the user name "SCOTT" and the appropriate password. 


  2. Select Enable interMedia Text Searching to make interMedia Text searching available in your content areas.

  3. Select Enable Themes And Gists to create a theme and gist for each item returned by the search.

  4. In the Highlight Text Color list, choose the color to highlight the search words in the HTML renditions of the items returned by the search.

  5. In the Highlight Text Style list, choose the style to apply to the search words in the HTML renditions of the items returned by the search.

  6. Click OK.

    Now that you've enabled interMedia Text, you'll need to create a new interMedia Text index.

6.5.3 Step 3: Create an interMedia Text Index

To create an interMedia Text index:

  1. Make sure that you have configured the interMedia Text settings in Oracle Portal as discussed in the previous section, Section 6.5.2, "Step 2: Enable interMedia Text Searching".

  2. In the Specify interMedia Text Search Properties section, as shown on the previous page, click Create Index.

The following interMedia Text indexes are created:

Table 6-2 interMedia Text indexes created

Index Name  Description 

WWSBR_CORNER_CTX_INDX 

Index based on folders. 

WWSBR_DOC_CTX_INDX 

Index based on the content of uploaded documents. 

WWSBR_PERSP_CTX_INDX 

Index based on perspectives. 

WWSBR_THING_CTX_INDX 

Index based on items. 

WWSBR_TOPIC_CTX_INDX 

Index based on categories.  

WWSBR_URL_CTX_INDX 

Index based on the objects' Uniform Resource Locator (URL). 


Note:

The time required for creating indexes varies depending on the number of items you have in your content area. 


6.5.3.1 Creating indexes in SQL*Plus

You can also create indexes in SQL*Plus as follows:

  1. On the database where Oracle Portal is installed, log on to SQL*Plus with the appropriate username and password for the portal schema. For example:

          sqlplus portal30/portal30
    
    
  2. Enter the following command:

    set serverout on
    declare
       l_message varchar2(2000);
    begin
       wwv_context.createindex(p_language => 'us', p_message => l_message);
       dbms_output.put_line(l_message);
    end;
    /
    

    When the index is created, the message, "interMedia Text indexes created successfully" appears.


    See also:

    If you encounter problems creating an interMedia Text index, see "Problem: Unable to create interMedia Text indexes."in Chapter 8, "Troubleshooting"


6.5.4 Step 4: Maintain an interMedia Text Index

interMedia Text lets you create a text index (an inverted index) on documents stored in the database. Updating an inverted index requires heavy processing, so changes to a text column are queued and processed in batch. The process of updating the inverted index based on the queue is referred to as "synchronizing" the index.

The second aspect of maintaining your interMedia Text index is optimizing. As your index is synchronized, it grows in such a way as to consume more disk space than necessary and reduces the efficiency of queries.

Optimizing your index works differently depending on the mode you select. Optimizing in FAST MODE works on the entire index and compacts fragmented rows, but does not remove old data. FULL MODE permits optimization of the whole index or a portion of the index and both compacts fragmented rows and removes old data. For more information, see the Oracle8i interMedia Text Reference for the ALTER INDEX command.

interMedia Text gives you full control over how often each text index is synchronized. You can choose to synchronize every five seconds, for example, if it is important for your application to reflect text changes quickly in the index. You can choose to synchronize once a day, for more efficient use of computing resources and a more optimal index.

After creating your interMedia Text index, you need to consider a strategy for maintaining the index. For example, if you have many inserts, updates, or deletes throughout the day, consider synchronizing the interMedia Text index on a daily basis.

6.5.4.1 Synchronize the interMedia Text Index

The following example assumes that you installed ctx_schedule.


Note:

ctx_schedule is provided as an example, and is not supported as a component of either interMedia Text or Oracle9iAS Portal. For more information, see note 132689.1 on Oracle MetaLink at: http://metalink.oracle.com. 


  1. Log on to SQL*Plus with the appropriate user name and password.

  2. Enter the following commands:

    exec ctx_schedule.startup ( 'ex1_index', 'SYNC', 1 ) ; 
    exec ctx_schedule.startup ( 'ex1_index', 'OPTIMIZE FAST', 120 ) ; 
    
    

In this example, the index ex1_index is synchronized every minute, and optimized every two hours. This is true even if the database is shut down and restarted.

6.5.4.2 Stop Index Maintenance

The following example assumes that you installed ctx_schedule.


Note:

ctx_schedule is provided as an example, and is not supported as a component of either interMedia Text or Oracle9iAS Portal. For more information, see note 132689.1 on Oracle MetaLink at: http://metalink.oracle.com. 


  1. Log on to SQL*Plus with the appropriate user name and password.

  2. Enter the following commands:

    exec ctx_schedule.stop ( 'ex1_index' ) ; 
    exec ctx_schedule.stop ( 'ex1_index', 'OPTIMIZE FAST' ) ; 
    
    

    ctx_schedule.stop assumes that the operation to be stopped is "SYNC", unless you specify otherwise.


    Note:

    To have new text searched immediately (every five seconds), consider using the drbgdml.sql script located in:

    <ORACLE_HOME>/ctx/sample/script/drbgdml.sql 


6.6 Setting up interMedia Text Indexes

In troubleshooting, you may need to reinstall interMedia Text, or you may need to recreate the ctxsys schema. In both of these cases, you need to run the following script in SQL*Plus to reset the Oracle Portal interMedia Text environment:

inctxgrn.sql

This file is located in the <ORACLE_HOME>/portal30/src/wws directory.

Log on using the user name and password for the schema that owns the Oracle Portal content area. For example, if the schema name is "SCOTT", log on with the user name "SCOTT" and the appropriate password.

6.7 Dropping an interMedia Text Index

Dropping an index is a very time-consuming and resource-intensive operation so plan this task during non-business hours.

You would drop an interMedia Text index in the following situations:

You can drop interMedia Text indexes in the following ways:

  1. In the Services portlet, click Search Settings. By default, the Services portlet is located on the Oracle Portal home page's Administer tab.

  2. In the interMedia Text Properties section, click Drop Index.

    The interMedia Text index is dropped from the server.


    Note:

    • You can also drop an interMedia Text Index, by running the following script in Oracle SQL*Plus and logging in as the Oracle Portal schema owner:

      ctxdrind.sql 


6.8 Setting up Your Environment for interMedia Text

For Oracle8i Release 8.1.6.2 on UNIX, configure the listener.ora and tnsnames.ora files in your Oracle8i Home directory to support these external procedures. On Windows NT/2000, you require an Oracle 8.1.7 database if you want to use interMedia Text with Oracle Portal.


Note:

If you are running Oracle8i Release 8.1.7, ignore this section as the database does not use external procedures to perform document filtering.  


The listener.ora and tnsnames.ora files on your computer should look similar to the following examples:

6.8.1 listener.ora

The following lines define a complete listener definition in your listener.ora file.

LISTENER =
   (DESCRIPTION_LIST =
   (DESCRIPTION =
   (ADDRESS_LIST =
   (ADDRESS = (PROTOCOL = IPC) (KEY = EXTPROC0))
                    )
   (ADDRESS_LIST =
   (ADDRESS = (PROTOCOL = TCP) (HOST = oracle) (PORT=1521))

6.8.1.1 System ID

If you want to add a system identifier (SID) name of PLSExtProc and a program name of EXTPROC in the server's listener.ora file, insert the following in the SID_LIST_LISTENER definition:

SID_LIST_LISTENER =
  (SID_LIST =
  (SID_DESC = ...

  (SID_DESC =
  (SID_NAME=PLSExtProc)(ORACLE_HOME=/oracle/db/dev118)
  (PROGRAM=extproc)


Notes:

  • EXTPROCO and PRESENTATION=RO ends with the letter "o" not the number zero.

  • SID=PLSExtProc is non-negotiable.

 

6.8.2 tnsnames.ora

Add the following lines to the end of your tnsnames.ora file, to add a net service name description entry for EXTPROC0 in the server's tnsnames.ora file, using SID rather than SERVICE_NAME in the CONNECT_DATA section. For example:

          extproc_connection_data =
             (DESCRIPTION=
                (ADDRESS_LIST =
                   (ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC0))
                )
                (CONNECT_DATA=
                   (SID=PLSExtProc)
                   (PRESENTATION = RO)
                )
             )


Notes:

  • Do not substitute your SID anywhere.

  • Do not change the case of any text.

  • EXTPROCO ends with the letter "o" not the number zero.

  • Start the Net8 Assistant to help you configure the tnsnames.ora.

  • See also Net8 Administrator's Guide for more information about creating these configuration files.

 

6.9 Multilingual Functionality (Multilexer)


Note:

You may need to increase your tablespace to at least 20 MB to support multilexer. 


Oracle8i (8.1.6 and above) provides multilexer which allows you to use language-specific features on documents of different languages stored in the same table. Multilexer is a feature of the index and is configured during index creation. Multilexer requires an extra column in your table, which identifies the language of each document.

At query time, the multilexer chooses a language-specific lexer to lex the query tokens. This is based on the NLS_LANG setting for the query session. Thus, a query session in the FRENCH language uses the lexer for FRENCH.

During installation of Oracle Portal, the sbrimtlx.sql script creates the language-specific lexer preferences and gathers them under a single multilexer preference.

6.10 interMedia Text-related Procedures Created in Oracle Portal

The Oracle Portal installation creates the following procedures in the ctxsys schema. These procedures are created to support the user datastores that are used in Oracle Portal content areas for interMedia Text indexing.

where <n> is the user_id of the Oracle Portal schema which may be different for each database. This value is the user_id column value from all_users.


Go to previous page Go to next page
Oracle
Copyright © 2001 Oracle Corporation.

All Rights Reserved.

Library

Solution Area

Contents

Index