Skip Headers

Oracle Ultra Search User's Guide
Release 9.0.3

Part Number B10043-01
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Feedback

Go to previous page
Previous
Go to next page
Next
View PDF

6
Ultra Search Developer's Guide and API Reference

This chapter explains the Ultra Search APIs and related information. This chapter contains the following topics:

Overview of Ultra Search APIs

Ultra Search provides the following APIs:

Ultra Search also includes highly functional query applications to query and display search results. The query applications are based on Java Server Pages (JSP) and work with any JSP1.1 compliant engine.

Ultra Search Query API

Ultra Search provides a Java API for querying indexed data. The API methods retrieve and display query results. Because it is written in Java, it is compatible with a large spectrum of Web application servers that support any Java-based technology, such as Java server pages (JSP version 1.1 and higher). The API uses JDBC connection pooling for scalability.

The Java API does not impose any HTML rendering elements. The application can completely customize the HTML interface. For example:

You embed Ultra Search query functionality in your Web application with the supplied Ultra Search Java query API. The API supports two methods:

  1. Methods that retrieve query result data only.
  2. Methods that retrieve HTML code containing query result data.

The methods that retrieve HTML code support features such as allowing you to embed query input boxes and result lists in your Web application. The data-only methods do not return any HTML and can be used when you require full control over the HTML code to be rendered.

Some features of the Ultra Search Java query API:

Ultra Search Crawler Agent API

You can implement a crawler agent to crawl and index a proprietary document repository, such as Lotus Notes or Documentum. In Ultra Search, the proprietary repository is called a user-defined data source. The module that enables the crawler to access the data source is called a crawler agent.

The agent collects document URLs and associated metadata from the user-defined data source and returns the information to the Ultra Search crawler, which enqueues it for later crawling. The crawler agent must be implemented in Java using the Ultra Search crawler agent API.

Ultra Search provides a sample implementation of user-defined crawler agents using the Ultra Search agent API. Upon invocation, this sample agent connects to a specified Oracle database and retrieves the contents of a table for the crawler to collect and index.

The sample agents are fully functional and can be customized to adapt to other database-based data sources. This agent performs the following task:

Crawler Agent Overview

A crawler agent does the following:

From the crawler's perspective, the agent retrieves the list of URLs from the target data source and saves it in the crawler queue before processing it.


Note:

If the crawler is interrupted for any reason, the agent invocation process is repeated with the original last crawl time stamp. If the crawler already finished enqueueing URLs fetched from the agent and is half way through crawling, then the crawler only starts the agent, but does not try to fetch URLs from the agent. Instead, it finishes crawling the URLs already enqueued.


There are two kinds of crawler agents:

Standard Agent

The standard agent returns the list of URLs currently existing in the data source. It does not know whether any of the URLs had been crawled before, and it relies on the crawler to find any updates to the target data source. The standard agent's interaction with the crawler is the following:

Smart Agent

The smart agent uses a modified-since time stamp (provided by the crawler) to return the list of URLs that have been updated, inserted, and deleted. The crawler only crawls URLs returned by the agent and does not recrawl existing ones. For URLs that were deleted, the crawler removes them from the URL table. If the smart agent can only return updated or inserted URLs but not deleted URLs, then deleted URLs are not detected by the crawler. In this case, you must change the schedule crawler recrawl policy to periodically run the schedule in force recrawl mode. Force recrawl mode signals to the agent to return every URL in the data source.

The agent API isDeltaCrawlingCapable() tells the crawler whether the agent it invokes is a standard agent or a smart agent. The agent API startCrawling(boolean forceRecrawl, Date lastCrawlTime) lets the crawler tell the agent the last crawl time and whether the crawler is running in force recrawl mode.

Document Attributes and Properties

Document attributes, or metadata, describe document properties. Some attributes can be irrelevant to your application. The crawler agent creator must decide which document attributes should be extracted and saved. The agent can be also created such that the list of collected attributes are configurable. Ultra search automatically registers attributes returned by the agent. The agent can decide which attributes to return for a document.

Crawler Agent Functionality

Data Source Type Registration

A data source type is an abstraction of a data source. You can define new data source types with the following attributes:

Ultra Search does not enforce the occurrence of parameters. You cannot specify a particular parameter to have 0 or more, at least 1, or only 1 occurrence.

Data Source Registration

After a data source type is defined, any instance of that data source type can be defined:

Data Source Attribute Registration

You can add new attributes to Ultra Search by providing the attribute name and the attribute data type. The data type can be string, number, or date. Attributes with the same name but different data type can be added. Attributes returned by an agent are automatically registered if they have not been defined.

User-Implemented Crawler Agent

The crawler agent has the following requirements:

Interaction Between the Crawler and the Crawler Agent

The crawler crawls data sources defined by the user through the invocation of the user-supplied crawler agent. The crawler can do the following:

Crawler Agent APIs and Classes

The crawler agent API is a collection of methods used to implement a crawler agent. A sample implementation of a crawler agent SampleAgent.java is provided under $ORACLE_HOME/ultrasearch/sample/.

UrlData: The crawler agent uses this interface to populate document properties and attribute values. Ultra Search provides a basic implementation of this interface that the agent can use directly or extend if necessary. The class is DocAttributes with a constructor that has no argument. The agent might decide to create a pool of UrlData objects and cycle through them during crawling. In the most simple implementation, the agent creates one DocAttributes object, repeatedly resets and populates the data, and returns this object.

LovInfo: The crawler agent uses this interface to submit attribute LOV definitions.

DataSourceParams: The crawler agent uses this interface to read and write data source parameters.

AgentException: The crawler agent uses this exception class when an error occurs.

CrawlerAgent: This interface lets the crawler communicate with the user-defined data source. The crawler agent must implement this interface.

Sample Agent Files

The sample agent files are located in the $ORACLE_HOME/ultrasearch/sample directory. You can directly view the sample agent source code using your preferred text editor.

There is a sample_agent_readme.htm file and a SampleAgent.java file. This is for the sample crawler agent implementation using agent APIs.

Setting up the Sample Crawler Agent

Compiling and Building the Agent Jar File

The Java source code for the sample agent must be first compiled into class files and put into a jar file in the $ORACLE_HOME/ultrasearch/lib/agent/ directory, where $ORACLE_HOME is the Oracle home directory where the Ultra Search server component, not the middle tier component, is installed.

The classes needed for compilation are the JDK class (classes.zip), Oracle JDBC thin driver (classes12.zip), and ultraserach.jar. For example:

   javac -J-ms16m -J-mx96m -O -classpath /jdk1.2.2 
05/lib/classes.zip:/lib/classes12.zip:
   $ORACLE_HOME/ultrasearch/lib/ultrasearch.jar SampleAgent.java

To build the sampleAgent.jar file:

  /jdk1.2.2_05/bin/jar cv0f /oracle/ultrasearch/lib/agent/sampleAgent.jar 
  SampleAgent.class 'SampleAgent$DocNode.class'

Creating a Data Source Type

A data source type that uses the sample agent must be created first.

Defining Data Source Parameters

Define parameters for a data source type:

Defining a Data Source of this Type

A data source is defined, which initializes the data source parameters. For example, the value specified accesses a table whose schema is the following:

    TABLE NEWS (
    ARTICLE_NO    NUMBER,
    NEWS_URL      VARCHAR2(740),
    TITLE         VARCHAR2(200),
    AUTHOR        VARCHAR2(100),
    PUB_DATE      DATE default SYSDATE,
    PUBLISHER     VARCHAR2(100),
    PRICE         NUMBER,
    LANG          VARCHAR2(10),
    IGNORE        NUMBER DEFAULT 0,
    PRIMARY KEY (NEWS_URL)
    );

Ultra Search Java Email API

Ultra Search provides a Java API for accessing archived emails. The API is used by the Ultra Search query application to display emails addressed to mailing lists that have been indexed by the Ultra Search system. The API can also be used to build your own custom query application.

The application user-interface logic is entirely controlled in the JSP, therefore the look-and-feel can be completely customized to your needs.

Email documents contain valuable information, but they are not structured to find specific relevant information easily. Ultra Search lets you retrieve and index emails on a server that supports the IMAP4 protocol.

An email source is a data source that derives its content from emails sent to a specific email address. When the Ultra Search crawler searches an email source, the crawler collects all emails that have the specific email address in any of the "To:" or "Cc:" email header fields.


Note:

Ultra Search stores copies of all retrieved emails in the local file system of the Ultra Search server installation.


A possible application of an email source is where an email source represents all emails sent to a mailing list. In such a scenario, multiple email sources are defined where each email source represents an email list.

Ultra Search email crawling and rendering is built on top of the JavaMail API using Sun Microsystems' reference implementation of JavaMail. This enables Ultra Search to provide a Java API for accessing indexed emails. The API is known as the Ultra Search Java Email API. This API lets you retrieve information such as email header information, email body content, and attachments of an email.

Use this API to embed Ultra Search email browsing functionality into Java server page (JSP) or servlet-based Web applications. Ultra Search ships a fully functional JSP Web application that directly uses this API to render indexed emails. Because the source code is viewable, you can use it as an example for building your own customized email browser.

JavaMail Implementation

Ultra Search requires a JavaMail 1.1 compliant implementation. The reference implementation by Sun Microsystems is JavaMail version 1.2. This reference implementation is shipped with Ultra Search.

Java Email API

The Ultra Search Java Email API is encapsulated in the oracle.ultrasearch.query package.

Sample Mailing List Browser Application Files

The sample mailing list browser applications files are located in the $ORACLE_HOME/ultrasearch/sample/query directory. You can directly view the sample mailing list browser application source code using your preferred text editor.

The following tables describe all sample mailing list browser application files:

README File and Stylesheets:

File Description
README.html

Readme

mail.css

Style sheet for sample email Web application

Sample Java Server Page Mailing List Browser Applications Files:

File Description
mail.jsp

Mailing list browser applications that selectively include HTML code returned by other JSP files, depending on what the end user wants to view

mailindex.jsp

JSP page that displays all email sources (mailing lists) of an Ultra Search instance

mailmsgs.jsp

JSP page that displays all emails for an email source (mailing list)

mailreader.jsp

JSP page that displays an email

mailutil.jsp

JSP page that defines various functions that are used by mailreader.jsp

Graphics Files for All Applications:

File Description
images/ultra_
mediumbanner.gif

Ultra Search banner

images/wsd.gif

Background image used in sample query application

Setting up the Sample Mailing List Browser Application

For detailed instructions on setting up the sample JSP mailing list browser application, see "Installing the Ultra Search Middle Tier on Web Server Hosts".

Ultra Search URL Rewriter API

A URL rewriter is a user supplied Java module that implements the Ultra Search UrlRewriter Java interface. When activated, it is used by the crawler to filter and rewrite extracted URL links before they are inserted into the URL queue.

Web crawling generally consists of the following steps:

  1. Get the next URL from the URL queue. (Web crawling stops when the queue is empty.)
  2. Fetch the contents of the URL.
  3. Extract URL links from the contents.
  4. Insert the links into the URL queue.

The generated new "URL link" is subject to all existing host, path, and mimetype inclusion and exclusion rules.

There are two possible operations that can be done on the extracted URL link:

URL Link Filtering

Users control what type of URL links are allowed to be inserted into the queue with the following mechanisms supported by the Ultra Search crawler:

With these mechanisms, only URL links that meet the filtering criteria are processed. However, there are other criteria that users might want to use to filter URL links. For example:

The possible criteria is endless, which is why it is delegated to a user-implemented module that can be used by the crawler when evaluating an extracted URL link.

URL Link Rewriting

For some applications, due to security reasons, the URL crawled is different from the one seen by the end user. For example, crawling is done on an internal Web site behind a firewall without security checking, but when queried by an end user, a corresponding mirror URL outside the firewall must be used.

A display URL is a URL string used for search hit display. This is the URL used when users click the search hit link. An access URL is a URL string used by the crawler for crawling and indexing. An access URL is optional. If it does not exist, then the crawler uses the display URL for crawling and indexing. If it does exist, then it is used by the crawler instead of the display URL for crawling.

For regular Web crawling, there are only display URLs available. But in some situations, the crawler needs an access URL for crawling the internal site while keeping a display URL for the external use. For every internal URL, there is an external mirrored one.

For example:

http://www.acme-qa.us.com:9393/index.html
http://www.acme.com/index.html

When the URL link 'http://www.acme-qa.us.com:9393/index.html' is extracted and before it is inserted into the queue, the crawler generates a new display URL and a new access URL for it:

Access URL - http://www.acme-qa.us.com:9393/index.html
Display URL - http://www.acme.com/index.html

The extracted URL link is rewritten, and the crawler crawls the internal Web site without exposing it to the end user.

Another example is when the links that the crawler picks up are generated dynamically and can be different (depending on referencing page or other factor) even though they all point to the same page. For example:

http://compete3.acme.com/rt/rt.wwv_media.show?p_type=text&p_id=4424&p_
currcornerid=281&p_textid=4423&p_language=us
http://compete3.acme.com/rt/rt.wwv_media.show?p_type=text&p_id=4424&p_
currcornerid=498&p_textid=4423&p_language=us

Because the crawler detects different URLs with the same contents only when there is sufficient number of duplication, the URL queue could grow to a huge number of URLs, causing excessive URL link generation. In this situation, allow "normalization" of the extracted links so that URLs pointing to the same page have the same URL. The algorithm for rewriting these URLs is application dependent and cannot be handled by the crawler in a generic way.

When a URL link goes through a rewriter, there are the following possible outcomes:

Creating and Using a URL Rewriter

Follow these steps to create and use a URL rewriter:

  1. Create a new Java file implementing the UrlRewriter interface open(), close(), and rewrite() methods. A sample rewriter, SampleRewriter.java, is available for reference under $ORACLE_HOME/ultrasearch/sample/.
  2. Compile the rewriter Java file into a class file. For example:
    /jdk1.3.1/bin/javac -O -classpath $ORACLE_
    HOME/ultrasearch/lib/ultrasearch.jar SampleRewriter.java 
    
    
  3. Package the rewriter class file into a jar file under the $ORACLE_HOME/ultrasearch/lib/agent/ directory. For example:
    /jdk1.3.1/bin/jar cv0f $ORACLE_HOME/ultrasearch/lib/agent/sample.jar 
    SampleRewriter.class 
    
    
  4. Specify the rewriter class name and jar file name (for example, SampleRewriter and sample.jar) in the administration tool in step 2 of Creating Web Sources or in the crawler parameters page of an existing Web data source.
  5. Enable the UrlRewriter option from Web Sources page in the administration tool.
  6. Crawl the target Web data source by launching the corresponding schedule. The crawler log file confirms the use of the URL rewriter with the message Loading URL rewriter "SampleRewriter"...


    Note:

    URL rewriting is available for Web data sources only.


    See Also:

Ultra Search Sample Query Applications

Ultra Search provides several sample query applications and a sample crawler agent. Use the sample query applications as examples for creating your own query application. The query applications are written as Java server page (JSP) applications. Your query application will use the Ultra Search query API. You can also use the sample crawler agent to create your own crawler agent.


Note:

Pointers to the sample query applications and the sample crawler agent Java source code, as well as their corresponding readmes, are in the Ultra Search welcome page: http://<hostname.domainname>:<HTTPport>/ultrasearch/index.html


The sample query applications are shipped as a deployed J2EE Web application (sample.ear). This component depends on a J2EE container to host the Web pages, a JDBC driver, and Java Mail API for displaying email results. After the sample.ear file is deployed by the Oracle Containers for J2EE (OC4J), you see a set of JSP files that demonstrate the query API usage.

The sample query applications include a sample search portlet. The sample Ultra Search portlet demonstrates how to write a search portlet for use in Oracle 9iAS Portal.

When the user issues a query in any of the query applications, a hit list containing query results is returned. The user can select a document to view from the hit list. A hit list can include HTML documents, files, database table content, archived emails, or Oracle 9iAS items. The Ultra Search sample query applications also incorporate an email browser for reading and browsing emails.

The Ultra Search administration tool and the Ultra Search sample query applications are part of the Ultra Search middle tier components module. However, the Ultra Search administration tool is independent from the Ultra Search sample query applications. Therefore, they can be hosted on different machines to enhance security or scalability.

If you do not want to use the sample query applications, you can build your own query application by directly invoking the Ultra Search Java Query API. Because the API is coded in Java, you can invoke the API methods from any Java-based application, such as from a Java servlet or a Java server page (as in the case of the provided sample query applications). For rendering emails that have been crawled and indexed, you can also directly invoke the Ultra Search Java email API methods.

Java Server Page (JSP) Sample Query Applications

The JSP sample query applications are located in the $ORACLE_HOME/ultrasearch/sample directory.

Java Server Page Concepts

As mentioned earlier, you can use JSP code and the supplied Java APIs to create your Web application. Typically, your Web application runs in an application server, such as Oracle iAS. The application server typically runs on a separate machine from the Oracle server for performance and scalability reasons. The Oracle server holds the Ultra Search indexes.

JSP applications are compiled into Java servlets at runtime. The compiled servlets run in one or more Java Virtual Machine processes. The JSP application communicates with the Oracle server through the Oracle JDBC driver.

As in any Java application, you must include the following files in your servlet engine classpath to use the Java query and email APIs:

  1. $ORACLE_HOME/ultrasearch/lib/ultrasearch_query.jar
  2. $ORACLE_HOME/lib/mail.jar
  3. $ORACLE_HOME/lib/activation.jar

Figure 6-1 shows how your Web query application calls the Ultra Search Java query API.

Figure 6-1 Calling Java Server Pages

Text description of isrch009.gif follows.

Text description of the illustration isrch009.gif

Ultra Search Query Tag Library

On top of the Java query API, Ultra Search provides a JSP tag library as an alternative for developing search applications. Based on the Sun Microsystems Java Server Pages specification version 1.1, the Ultra Search tag library better separates the dynamic/Java development effort from the static/HTML development effort, and enables Web developers who are unfamiliar with Java to incorporate search functionality into their applications.

The Ultra Search tag library provides a subset of the features in the Java Query API. Advanced features, such as custom query expansion and URL submission, are not available as tags. The main features of the tag library are the following: Ability to retrieve search attributes, groups, languages, and LOVs for rendering the advance query form

Ability to iterate through the resulting hit set, and retrieve document attributes and properties for rendering the result page.

The tag library is summarized in following table:

Tag Description Attributes

instance

This tag establishes a connection to an Ultra Search instance.

instanceId

username

password

URL

dataSourceName

tablePagePath

emailPagePath

filePagePath

showAttributes

For an advanced query, use this tag to show the list of attributes available.

instance

locale

showGroups

For an advanced query, use this tag to show the list of groups.

instance

locale

showLanguages

For an advanced query, use this tag to show the list of languages defined in the instance.

instance

showLOV

Show all values defined for a search attribute.

instance

locale

attributeName

attributeType

getResult

Perform the search.

resultId

instance

query

queryLocale

documentLanguage

from

to

boostTerm

withCount

fetchAttribute

This is a nested tag within getResult to specify which attributes of each document should be fetched along with the query results. There can be any number of nested fetchAttribute tags.

attributeName

attributeType

showHitCount

If withCount="true" in the getResult tag, then the result includes a total number of hits, and you can use showHitCount to display this number.

result

showResults

Renders the results of the search.

result

instance

showAttributeValue

Renders a document attribute.

attributeName

attributeType

Details of these tags are described in the following subsections. Note the following requirements for using Ultra Search tags:

The Ultra Search tag library definition (TLD) file can be found in $ORACLE_HOME/ultrasearch/sample/query/WEB-INF/ultrasearch-taglib.tld after sample.ear has been deployed. It is also packaged with ultrasearch_query.jar under the name META-INF/taglib.tld.

Query Tag Descriptions

The following section describes each Ultra Search tag, its attributes, and action. Examples are shown without any static HTML, which can be inserted to format the output.

<instance> Tag: Connecting to the Ultra Search Instance

This tag establishes a connection to an Ultra Search instance. Some basic parameters must be established for this tag to work, such as JDBC connection string, schema username/password, Ultra Search instance name, and so on.

Attribute Name Description

instanceId="name"

This names the instance defined by this tag. This name is then used by other Ultra Search tags to specify the instance being searched.

username

This creates a database connection.

password

This creates a database connection.

url

Gets the URL used to create a JDBC connection. This attribute is optional if dataSourceName is specified.

dataSourceName

The JNDI name that identifies a JDBC data source. Users should set either the URL or data source name properties. This is optional if URL is specified.

instanceName

The name of the Ultra Search instance that is owned by the schema user. If the schema user owns only one Ultra Search instance, this is optional.

tablePagePath

The URL path of the Web application that renders the contents of a database table.

emailPagePath

The URL path of the Web application that renders the contents of an email.

filePagePath

The URL path of the Web application that renders the contents of a file.

This tag defines a scripting variable of the name set by the instanceId property. All the other tag properties correspond to a property in the oracle.ultrasearch.query.QueryInstance class. Either the URL or the dataSourceName attribute should be set. They are exclusive of each other.

The following example uses the URL property to connect to the database.

<US:instance 
 instanceId="mybookstore"
 url="oracle:jdbc:thin:@dbhost:1521:inst1"
 username="scott"
 password="tiger"
 tablePage="../display.jsp"
 emailPage="../mail.jsp"
 filePage="../display.jsp"
/>

<iterAttributes> Tag: Show All Search Attributes

When a user wants to perform an advanced query, the application needs to show the list of attributes that are available, the list of groups, and the list of languages defined in the instance. This can be done using some iteration tags that define script variables for page rendering.

Each attribute in Ultra Search has a name, a type, and a display name that is translated depending on the locale that is set for the QueryInstance tag. The attribute type should be used to determine which operators can be used on this attribute and how to parse the user's input.

Attribute Name Description

instance="name"

This is a mandatory attribute to refer to the object defined by the instance tag.

locale="locale"

This determines the display name fetched using this tag.

This tag is an iteration tag. It loops through all the search attributes in the instance referred to by the instance tag attribute. In each loop, it defines a scripting variable named "attribute", which is an oracle.ultrasearch.query.Attribute object. It also defines a string variable named "displayname", which is the localized name of the attribute.

The following example shows all the attributes in "mybookstore" instance, using their English display names.

<US:iterAttributes instance="mybookstore" locale="<%=Locale.ENGLISH%>" >
<%= attribute %>
<%= displayname %>
</US:iterAttributes>

<iterGroups> Tag: Show All Search Groups

Similar to the showAttributes tag, the showGroups tag iterates through all the groups defined in an instance.

Attribute Name Description

instance="name"

This a mandatory attribute to refer to the object defined by the instance tag.

locale="locale"

This determines the display name fetched using this tag.

This tag loops through all the search groups in the instance referred to by the instance tag attribute. In each loop, it defines a scripting variable named "group", which is an oracle.ultrasearch.query.Group object. It also defines a string variable named "displayname", which is the localized name of the group.

The following example shows all the groups in "mybookstore" instance, using their English display names.

<US:iterGroups instance="mybookstore" locale="<%=Locale.ENGLISH%>" >
<%= group %>
<%= displayname %>
</US:iterGroups >

<iterLanguages> Tag: Show All Search Languages

Similar to the showAttributes tag, the showLanguages tag iterates through all the languages defined in an instance. Because each language is defined by a java.util.Locale object, their display names are not handled by Ultra Search. Therefore, this tag does not define the displayname scripting variable.

Attribute Name Description

instance="name"

This is a mandatory attribute to refer to the object defined by the instance tag.

This tag is an iteration tag. It loops through all the search languages in the instance referred to by the instance tag attribute. In each loop, it defines a scripting variable named "language", which is a java.util.Locale object. The display name for the language is provided by Java as a property of the object itself (through the getDisplayName() method).

The following example shows all the languages in "mybookstore" instance, using their English display names.

<US:iterLanguages instance="mybookstore">
<%= language %>
<%= language.getDisplayName (Locale.ENGLISH) %>
</US:iterLanguages >

<iterLOV> Tag: Show All Values Defined for a Search Attribute

Attribute Name Description

instance="name"

This a mandatory attribute to refer to the object defined by the instance tag.

locale="locale"

This determines the display name fetched using this tag.

attributeName="attname"

The name of the attribute whose LOV is being fetched in this LOV.

attributeType="string | number | date"

The type of the attribute whose LOV is being fetched in this LOV. This is needed because attribute name does not uniquely identify an attribute in the instance.

This tag is an iteration tag. It loops through all the values in a search attribute's LOV. In each loop, it defines a scripting variable named "value", which is either a java.lang.String, java.util.Date, or java.math.BigDecimal object, depending on the attribute type. It also defines a string variable named "displayname", which is the localized display name of the value.

The following example shows all the values for a string attribute named "Dept" in "mybookstore" instance, using their English display names.

<US:iterLOV instance="mybookstore" attribute_name="Dept" attribute_type="String" 
>
<%= value %>
<%= displayname %>
</US:iterLOV >

Formulating the Query

Ultra Search supports a set of classes for building queries. Currently these classes do not have any tag equivalents.

<getResult> Tag: Perform Search

This tag performs the search and returns the result by defining a scripting variable of the type oracle.ultrasearch.query.Result.

Attribute Name Description

resultId="name"

This names the result generated by this tag. This name is then used by other tags to render the result on the page.

instance="name"

This is a mandatory attribute to refer to the object defined by the instance tag.

query="<%= expression %>"

This specifies a Query object to search with.

queryLocale="locale"

This specifies the locale of the Query object.

documentLanguage="locale"

This specifies the language of the documents to search for. This is optional. If it is not specified, then all languages are included in the search.

from="number"

This specifies the index of the first hit.

to="number"

This specifies the index of the last hit.

boostTerm="string"

This specifies the search term that will be used for relevance boosting. This is optional.

withCount="true | false"

This specifies whether the result will have an estimate of the total hit count. This is optional. If unspecified, the behavior is same as withCount=false.

The <getResult> tag corresponds to the getResult() method on the oracle.ultrasearch.query.Instance class. The attributes of tag map to the parameters of the method straightforwardly, with the exception that getResult() method can specify the attributes to fetch. The <getResult> tag require the use of the nested <fetchAttribute> tag to accomplish metadata selection.

The following example shows a search for the first 20 documents of a query in English that appears in French documents.

<US:getResult 
 resultId="searchresult"
 instance="mybookstore"
 query=""
 queryLocale=""
 documentLanguage=""
 from="1" to="20">
</US:getResult>

<fetchAttribute> Tag: Meta-data Selection

This tag is used as nested tag inside <getResult>. It specifies which attributes of each document should be fetched along with the query result. Each <getResult> can have any number of nested <fetchAttribute> tags.

Attribute Name Description

attributeName="attname"

The name of the attribute whose LOV is being fetched in this LOV.

attributeType="string | number | date"

The type of the attribute whose LOV is being fetched in this LOV. This is needed because attribute name does not uniquely identify an attribute in the instance.

Each occurrence of the <fetchAttribute> adds to the list of attributes passed to the getResult() invoked by the <getResult> tag.

The following example shows the same search in <getResult> tag, but fetching title and publication-date attributes of each book.

<US:getResult 
 resultId="searchresult"
 instance="mybookstore"
 query=""
 queryLocale=""
 documentLanguage=""
 from="1" to="20">
<US:fetchAttribute 
 attributeName="title"
 attributeType="string" />
<US:fetchAttribute 
 attributeName="publication-date"
 attributeType="date" />
</US:getResult>

<showHitCount> Tag: Show Estimated Hit Count

After the search is performed, the result must be rendered. If withCount=true is in the <US:getResult> tag, then the result contains a count of total hits, and <showHitCount> tag can be used to display it.

Attribute Name Description

result="name"

This refers to the resultId specified in the <US:getResult> tag.

This tag simply outputs the hit count to the page.

The following shows the hit count of the a search result.

<US:showHitCount result="searchresult" />

<iterResult> Tag: Render the Results

This tag is an iteration tag. It loops through all the documents in a search result.

Attribute Name Description

result="name"

This refers to the resultId specified in the <US:getResult> tag.

instance="name"

This used refers to the instanceId specified in the <US:instance> tag.

The tag loops through all the documents in a search result and defines a scripting variable "doc" that is a oracle.ultrasearch.query.Document object. In addition, it can have nested tags of <showAttributeValue>, which helps to render the document's attributes. It is an error if the result specified is not one obtained from search on the instance specified. In other words, the result must come from the instance.

The following example shows the URL of all documents in a search result.

<US:iterResult
result="searchresult" 
instance="mybookstore">
</US:iterResult>

<showAttributeValue> Tag: Render a Document Attribute

This tag shows an attribute of a document within the <US:iterResult> tag.

Attribute Name Description

attributeName="attname"

The name of the document attribute.

attributeType="string | number | date"

The type of the document attribute. This is needed because attribute name does not uniquely identify an attribute in the instance.

default="default string"

A value to output when the document has no value for this attribute. This is useful when a document has no title. The string "No Title" can be displayed as the default value.

This tag looks up the document attribute value and renders it on the page. If the attribute was not fetched as part of the search result, then nothing is output to the page.

The following example shows the title and publication dates of all documents in a search result.

<US:iterResult
result="searchresult" 
instance="mybookstore">
<US:showAttributeValue attributeName="title" attributeType="string" default="No 
Title" />
<US:showAttributeValue attributeName="publication-date" attributeType="date" />
</US:iterResult>

Customizing the Query Syntax Expansion

Ultra Search uses the Oracle Text engine to index and search documents. When an end user specifies a certain query string, Ultra Search takes that string and transforms it into an Oracle Text query expression. This process is called query syntax expansion.

You can customize Ultra Search to use your own implementation of the query syntax expansion. In previous releases, the default query syntax expansion implementation was contained in the WK_QUERYEXP PL/SQL package.

The Contains query lets you specify a query syntax similar to most internet search engines. The syntax boosts scores for documents that match the user's query in the 'title' StringAttribute. The syntax for Contains is the same when used on the document content and on StringAttributes.

Customize this syntax by subclassing the Contains query and overriding the expand() method with your own implementation. In fact, you can implement the Query interface and ignore the provided Contains query, because the query API accepts any object that implements the Query interface.

This document describes how to customize the query syntax expansion implementation to suit your organization's preferences.

Default Query Syntax Expansion Implementation

The default query syntax expansion implementation directly affects the following:

End User Query Syntax

The end user query syntax defined by the default query syntax expansion implementation is similar to the standard text query syntax employed by most search engines on the Web.

The following table summarizes the rules for the Ultra Search end user query syntax:


Note:

All end-user query strings are encased in square braces. For example, the end user query string Oracle Applications is notated as [Oracle Applications].


Rule Description

Single word search

Entering one word finds documents that contain that word.

For example, searching for [Oracle] finds all documents that contain the word "Oracle" anywhere in that document.

Multiple word search

Entering more than one word finds documents that each contain any of those words in any order.

For example, searching for [Oracle Applications] finds documents that contain "Oracle" or "Applications" or "Oracle Applications."

Compulsory inclusion [+]

Attaching a [+] in front of a word requires that the word be found in all matching documents.

For example, searching for [Oracle + Applications] only finds documents that contain the word "Applications." Note: In a multiple word search, you can attach a [+] in front of every token including the very first token.

Compulsory exclusion [-]

Attaching a [-] in front of a word requires that the word must not be found in all matching documents.

For example, searching for [Oracle - Applications] only finds documents that do not contain the word "Applications". Note: In a multiple word search, you can attach a [-] in front of every token except the very first token.

Phrase Matching ["..."]

Putting quotes around a set of words only finds documents that contain that precise phrase.

For example, searching for ["Oracle Applications"] finds only documents that contain the string "Oracle Applications."

Wildcard Matching [*]

Attaching a [*] to the right-hand side of a word returns left side partial matches.

For example, searching for the string [Ora*] finds documents that contain all words beginning with "Ora," such as "Oracle" and "Orator." You can also insert an asterisk in the middle of a word. For example, searching for the string [A*e] retrieves documents that contain words such as "Apple", "Ate", "Ape", and so on. Wildcard matching requires more computational processing power and is generally slower than other types of queries.

Scoring Classes

There are three ways documents are matched against an end user query string. These three ways are known as scoring "classes." Documents are scored and ranked higher if they satisfy the requirements for a higher class. Within each class, documents are also ranked differently depending on how well they match the conditions of that scoring class.

Class 1 is the most heavily weighted class. The score is derived from the number of occurrences of a precise phrase in a document. A document that has more instances of the precise phrase have a higher score than another document that has fewer occurrences of the precise phrase.

Class 2 is the next more heavily weighted class. In this class, the closer the tokens appear in a document, the higher the score becomes. For example, an end user query string [Oracle Applications Financials] can result in three documents found. None of the three documents contain the precise phrase "Oracle Applications Financials." However, document X contains the all three tokens "Oracle", "Applications", and "Financials" in the same sentence separated by other words. Document Y contains the individual tokens in the same paragraph but in different sentences. Document Z contains the same three tokens, but each token resides in different paragraphs. In this scenario, document X has the highest score, because the tokens are closest together. Likewise, Y has a higher score than Z.

Class 3 is the least weighted class. A document that has more tokens gets a higher score. For example, an end user query string [Oracle Applications Financials] can result in three documents found. Document X might contain all three tokens. Document Y might contain the tokens "Oracle" and "Applications" only. Document Z might contain only the token "Oracle." In this scenario, document X has a higher score than Y. Likewise, Y has a higher score than Z.

Expansion Rules

As mentioned previously, the end user query is expanded to an Oracle Text query. The expanded query string rules are captured in BNF (Backus Naur Form) notation. Again, these rules are the rules that Ultra Search uses as a default query syntax expansion implementation.

The rules that define an expanded query:

<expanded query> ::= (<expression> within <title section>)*2, <expression>

<expression> ::= <generic query expression> | <simple query expression>

<generic query expression> ::= (([ <plus expression>*100 & ]) (<main expression>)) [ <minus expression> ]

<simple query expression> ::= (<phrase expression>)*2, (<main expression>)

<main expression> ::= (<near expression>)*2, (<accum expression>)

Some terms and their meanings, which explain some of the terms used in the preceding rules:

A <plus expression> is an AND expression of all plus tokens.

A <minus expression> is a NOT expression of all minus tokens.

A <phrase expression> is a PHRASE formed by all tokens in the <main expression>

A <near expression> is a NEAR expression of all tokens but minus tokens.

An <accum expression> is an ACCUMULATE expression of all tokens but minus tokens.

A <simple query expression> is used only when the end user query

has multiple tokens and does not have any operator or a double quote.

Otherwise, a <generic query expression> is used.

If there is no token that is neither plus token or minus token,

then the <plus expression> and the <accum expression> are eliminated.

Examples of Applying the Rules

The following table illustrates how the default query syntax expansion implementation converts end user query strings to Oracle Text compatible query strings.

End User Query String Expanded Query String Understandable by Oracle Text

[Oracle]

((({Oracle}) within TITLE__31)*2,({Oracle}))

[Oracle + Applications]

((((({Applications})*10)*10&(({Oracle};{Applications}
)*2,({Oracle},{Applications
}))) within TITLE__
31)*2,((({Applications})*10)*10&(({Oracle};{Applicati
ons})*2,
({Oracle},{Applications}))))  

[Oracle - Applications]

(((({Oracle})~{Applications}) within TITLE__
31)*2,(({Oracle})~{Applications}))

["Oracle Applications"]

((({Oracle Applications}) within TITLE__
31)*2,({Oracle Applications}))

[Ora*]

((((Ora%)) within TITLE__31)*2,((Ora%)))

[Oracle Applications]

(((({Oracle 
Applications})*2,(({Oracle};{Applications})*2,({Oracl
e},{Application
s}))) within TITLE__31)*2,(({Oracle 
Applications})*2,(({Oracle};{Applications})*
2,({Oracle},{Applications}))))   

Customizing the Rules

Customize this expansion to suit your organization's purposes by defining and implementing your own query syntax expansion. To do so, you need to understand the requirements of Oracle Text queries. The details of Oracle Text queries are beyond the scope of this document.

See Also:

To customize Ultra Search to use your own implementation of the query syntax expansion, use the Contains query. This finds documents that contain some text within its content or its string attributes. The Contains query does not apply to date or number attributes. If no attribute is specified, then Contains operates on the document content, instead of any attribute. A match found in the title attribute of the document will have a higher score than a match in the document content.

Constructors

Methods