Skip Headers
Oracle® Secure Enterprise Search Administrator's Guide
11g Release 1 (11.1.2.0.0)

Part Number E14130-04
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
View PDF

5 Customizing the Search Results

This chapter explains the various ways available for customizing the search results. It contains the following topics:

Adding Suggested Content in Search Results

Suggested content lets you display real-time data content along with the result list in the default query application. Oracle SES retrieves data from content providers and applies a style sheet to the data to generate an HTML fragment. The HTML fragment is displayed in the result list and is available through the Web Services API. For example, when an end user searches for contact information on a coworker, Oracle SES can fetch the content from the suggested content provider and return the contact information (e-mail address, phone number, and so on) for that person with the result list. Suggested content results appear in tabbed panes above the query results. When the query returns no results, suggested content is not displayed.

Configure suggested content on the Search - Suggested Content page in the Oracle SES Administration GUI. Enter the maximum number of suggested content results (up to 20) to be included with the Oracle SES result list. The results are rendered on a first-come, first-served basis.

Suggested Content Providers

Regular expressions (as supported in the Java regular expression API java.util.regex) are used to define query patterns for suggested content providers. The regular expression-based pattern matching is case-sensitive. For example, a provider with the pattern dir\s(\S+) is triggered on the query dir james but not on the query Dir James. To trigger on the query Dir James, the pattern could be defined either as [Dd][Ii][Rr]\s+(\S+) or as (?i)dir\s+(\S+). A provider with a blank query pattern is triggered on all queries.

The URL you enter for the suggested content provider can contain the following variables: $ora:q, $ora:lang, $ora:q1 ... $ora:qn and $ora:username.

  • $ora:q is the end user full query.

  • $ora:lang is the two-letter code for the browser language.

  • $ora:qn is the nth regular expression match group from the end user query. n starts from 1. If no nth group is matched, then the empty string replaces the variable.

  • $ora:username is the end user name.

Enter an XSLT style sheet to define rules (for example, the size and style) for transforming XML content from a provider into an HTML fragment. This HTML fragment is displayed in the result list or returned over the Web Services API. If you do not enter an XSLT style sheet, then Oracle SES assumes that the suggested content provider returns HTML. If you do not enter an XSLT style sheet and the provider returns XML, then the result list displays the plain XML.

Note:

As an administrator, you are responsible for verifying that the suggested content providers return valid and safe content. Corrupted or incomplete content returned by a suggested content provider can affect the formatting of the default query application results page.

Security Options

There are three security options for how Oracle SES passes the end user's authentication information to the suggested content provider:

  • None: No security policy is used. (Default)

  • Cookie: The end user first must be authenticated by the suggested content provider. A cookie is set for the user to maintain a session. Oracle SES must know the cookie used by the provider for authentication, and it is made available during registration of the suggested content provider. When the user enters a query, Oracle SES grabs the cookies from the user's request header and passes them to the provider. The cookie scope must be set to the common domain of the provider site and the Oracle SES site by the provider.

    For example, suppose the provider site is http://provider.example.com and the Oracle SES site is http://ses.example.com. After the end user logs in to the provider site, the site could set the value of the security cookie loginCookie with domain scope .example.com. When the end user searches in Oracle SES, Oracle SES gets the loginCookie value from the end user browser and forwards it to the provider site to get the suggested content (without login to the provider site again). However, if the provider site is accessed as http://provider or if the Oracle SES site is accessed as http://SES, then no domain cookie is available for sharing between the two sites and this security mechanism does not work.

    You can decide what happens when suggested content is available but the user is not logged in to the suggested content provider or the cookie for the provider is not available. For Unauthenticated User Action, if you select Ignore content, then content from that provider is not displayed in the result list. If you select Display login message, then Oracle SES returns a message that there is content available from this provider but the user is not logged in. The message also provides a link to log in to that provider. Enter the link for the suggested content provider login in the Login URL field.

  • Service-to-Service: A one-way trusted relationship is established between Oracle SES and the suggested content provider. Any user logged in to Oracle SES does not need to be authenticated by the provider again. The provider only authenticates the Oracle SES application and trusts the Oracle SES application to act as the end user.

    The end user identity is sent from Oracle SES to the provider site in the HTTP header ORA_S2S_PROXY_USER. The trusted entity could be a proxy user configured in the identity management system used by the provider, or it could be a name-value pair.

If the secured content provider needs to authenticate the end user and it sets the domain level security cookie to maintain login information after the end user login, then use the cookie method for form authentication. The Oracle SES end user must login manually to the provider site, and the security cookie is stored in the browser. Oracle SES searches on the provider for the end user without additional login.

However, if the domain security cookie is not allowed for the provider, then the provider must support service-to-service security. The provider must allow an Oracle SES application account to search after passing HTTP basic or digest authentication. Also, if the provider has different secured content for different Oracle SES end users, then it must respect the end user security (in the HTTP header ORA_S2S_PROXY_USER) for the Oracle SES search request.

To register a provider that requires either HTTP basic or HTTP digest authentication, specify the authentication user name in the Entity Name field and specify the authentication password in the Entity Password field.

Example Configuring Google OneBox for Suggested Content

Existing OneBox providers can be configured as Oracle SES suggested content providers. For example, for a Google OneBox provider, the provider URL might be http://host.company.com/apps/directory.jsp and the trigger might be dir\s(\S+). When the user query is "dir james", the provider receives the request with a query string similar to the following: apiMaj=10&apiMin=1&oneboxName=app&query=james.

With a suggested content provider, set the URL template as http://host.company.com/apps/directory.jsp?apiMaj=10&apiMin=1&oneboxName=app&query=$ora:q1. The provider pattern is the same: dir\s(\S+). The XSLT used for Google OneBox can be re-used with a minor change. Look for the line:

<xsl:template name="apps">

and change that line in your template to

<xsl:template match="/OneBoxResults">

Customizing the Appearance of Search Results

You can customize the default look and feel of the search result list for the default query application.

To customize the appearance of search results: 

  1. On the Global Settings tab, select Configure Search Results List under Out-of-Box Query Application.

    The Configure Search Results List page is displayed.

  2. Select Use Advanced Configuration, then make the desired customizations.

  3. Select attributes to appear in the XML description of result documents.

    The available attributes are local attributes, federated attributes, and internal attributes. Each attribute name appears only one time. That is, the name of a federated attribute with the same name as another attribute or with an explicit mapping to a local search attribute appears only once. Table 5-1 describes Oracle SES internal attributes.

  4. Provide extensible stylesheet language transformations (XSLT) to operate on the selected attributes. The default XSLT is installed at

    ORACLE_HOME/search/webapp/defaults/advanced_config.xsl

    This XSLT transforms XML content into an HTML fragment to be displayed in the result list. If the XSLT is blank, then the untransformed result XML is displayed in the result list. You can use this output for debugging purposes while configuring the XSLT. Invalid XSLTs cannot be saved.

    The output of this transformation should be HTML. Enter the following in the XSLT:

    <xsl:output method="html" />
    
  5. Provide a cascading style sheet (CSS) to style the HTML generated in the XSLT. The default CSS is installed at

    ORACLE_HOME/search/webapp/defaults/advanced_config.css

    This CSS is used along with the included CSS files for the default query application. When there are conflicts between the advanced configuration CSS and the default CSS files, the advanced configuration definitions are used. Default XSLT and CSS style sheets are provided for Advanced Configuration.

Table 5-1 Oracle SES Internal Attributes

Name Type Description

eqcacheurl

String

The URL of the cached version of the document; that is, the value of the "Cached" link in the default result list.

eqcontentlength

Number

The size of the document in bytes.

eqdatasourcename

String

The (untranslated) name of the source where the document originated. This name is local to the instance that the document came from and not the instance that is receiving the document.

If a document comes from a federated source, then the value of this attribute is the name of the source on the federated instance, and not the name of the federated source on the local instance.

eqdatasourcetype

String

The (untranslated) type of source where the document originated. This type is local to the instance from which the document came. For example, Federated is not a valid value for this attribute.

eqdocid

Number

Document ID.

eqfedchain

String

The chain of instance names representing the path of a federated document. The instance names are delimited by a semi-colon (;).

eqfedid

String

The federated source ID chain delimited by an underscore (_).

eqgroupbrowseurl

String

The URL to browse the infosource group; that is, the value of the "Source Group" link in the default result list.

eqlinksurl

String

The URL of the page containing a list of links to the document; that is the value of the "Links" link in the default result list.

eqpathbrowseurl

String

The URL to browse the infosource path; that is, the value of the "Path" link in the default result list.

eqredirecturl

String

The redirect URL to the original document; that is, the value of the title link in the default result list.

eqsimilardoc

Boolean

A value of true indicates a similar document; otherwise, false.

eqsnippet

String

The excerpt or keyword in context of the document.

equserquery

String

The query string.


XML Result Schema

To apply the XSLT, each document is converted into an XML description at query-time with the following schema:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
   <xsd:element name="result">
   <xsd:complexType>
   <xsd:all>
      <!-- Internal attributes -->
      <xsd:element name="eqdatasourcename" type="xsd:string" maxOccurs=1 />
      <xsd:element name="eqdatasourcetype" type="xsd:string" maxOccurs=1 />
      <xsd:element name="eqsnippet" type="xsd:string" maxOccurs=1 />
      <xsd:element name="eqredirecturl" type="xsd:string" maxOccurs=1 />
      <xsd:element name="eqcacheurl" type="xsd:string" maxOccurs=1 />
      <xsd:element name="eqlinksurl" type="xsd:string" maxOccurs=1 />
      <xsd:element name="eqsimilardoc" type="xsd:boolean" maxOccurs=1 />
      <xsd:element name="eqcontentlength" type="xsd:integer" maxOccurs=1 />
      <xsd:element name="equserquery" type="xsd:string" maxOccurs=1 />
      <xsd:element name="eqgroupbrowseurl" type="xsd:string" maxOccurs=1 />
      <xsd:element name="eqpathbrowseurl" type="xsd:string" maxOccurs=1 />
      <xsd:element name="eqdocid" type="xsd:integer" maxOccurs=1 />
      <xsd:element name="eqfedid" type="xsd:string" maxOccurs=1 />
      <!-- Built-in search attributes -->
      <xsd:element name="author" type="xsd:string" maxOccurs=1 />
      <xsd:element name="description" type="xsd:string" maxOccurs=1 />
      <xsd:element name="headline1" type="xsd:string" maxOccurs=1 />
      <xsd:element name="headline2" type="xsd:string" maxOccurs=1 />
      <xsd:element name="headline3" type="xsd:string" maxOccurs=1 />
      <xsd:element name="host" type="xsd:string" maxOccurs=1 />
      <xsd:element name="infosource" type="xsd:string" maxOccurs=1 />
      <xsd:element name="infosourcepath" type="xsd:string" maxOccurs=1 />
      <xsd:element name="keywords" type="xsd:string" maxOccurs=1 />
      <xsd:element name="language" type="xsd:string" maxOccurs=1 />
      <xsd:element name="lastmodifieddate" type="xsd:date" maxOccurs=1 />
      <xsd:element name="mimetype" type="xsd:string" maxOccurs=1 />
      <xsd:element name="referencetext" type="xsd:string" maxOccurs=1 />
      <xsd:element name="subject" type="xsd:string" maxOccurs=1 />
      <xsd:element name="title" type="xsd:string" maxOccurs=1 />
      <xsd:element name="url" type="xsd:string" maxOccurs=1 />
      <xsd:element name="urldepth" type="xsd:integer" maxOccurs=1 />
      <!-- Custom search attributes -->
               .
               .
               .
   </xsd:all>
   </xsd:complexType>
   </xsd:element>
</xsd:schema>

XML has the following rules for element names:

  • Alphanumeric and non-English characters, numbers, and ideograms, are allowed

  • Limited punctuation is allowed: underscore, hyphen, and period

  • Names can only begin with letters, ideograms, and underscores

Custom attribute names must conform to these rules for advanced result rendering. To enforce these rules, the empty string replaces all characters that are not permitted by these rules. In addition, Oracle SES search attributes are case-insensitive, and therefore all attributes are converted to lowercase when used in XML format.

For example, suppose the raw XML result data is as follows.

<result>
   <eqdatasourcetype>WEB</eqdatasourcetype>
   <title>Oracle Secure Enterprise Search</title>
   <url>
      http://www.oracle.com/technology/products/oses/index.html
   </url>
      <author>Anonymous</author>
   <description>
      Oracle Secure Enterprise Search 10g, a standalone product from Oracle, enables a secure, high quality, easy-to-use search across all enterprise information assets.
   </description>
 </result>

The following XSLT extracts and formats the title, URL, and author for documents coming from Web sources:

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:template match="result[eqdatasourcetype='WEB']">
   <span class="title">
     <xsl:text> &quot;</xsl:text><xsl:value-of select="title" />
     <xsl:text>&quot;</xsl:text>
   </span>
   <span class="author">
     <xsl:text> By </xsl:text><xsl:value-of select="author" />
   </span>
   <br/>
   <span class="url">
     <a href="http://{url}"><xsl:value-of select="url" /></a>
   </span>
 </xsl:template>
</xsl:stylesheet>

A CSS style sheet for this output may be:

.title { font-weight: bold; }
.url { font-style: italic; }

These style sheets produce a final result of:

"Oracle Secure Enterprise Search" By Anonymous

http://www.oracle.com/technology/products/oses/index.html

Configuring Clustering in Search Results

Real-time clustering dynamically organizes search results into groups to provide end users with different views on the top results. Clustered documents within one group, called a cluster node, share the same common topics or property values. A cluster node with a large document set can be categorized into child cluster nodes, and a hierarchy is built. Users can navigate directly to a specific cluster node. Effective real-time clustering balances clustering quality and clustering time.

Note:

The 10.1.8.2 query application is certified with Internet Explorer versions 6 and 7 and Firefox versions 1.5 and 2.x. Existing 10.1.8.1 functionality is certified on all Oracle SES-supported browsers through the classic user interface: http://host:port/search/query/search-classic.jsp

Search attributes (String, Number or Date) are used to generate a cluster tree. The attributes can be local search attributes, federated attributes that are not explicitly mapped, and Oracle SES internal attributes.

Oracle SES supports two types of cluster trees: topic and metadata. Each tree can be enabled or disabled individually. Parameters that apply to all cluster trees for the default query application can be configured on the Global Settings - Clustering Configuration page. These include the following:

For customized Oracle SES applications, configure clustering with the Query Web Services API.

Topic Clustering

Topic clustering uses the most significant phrases (and optionally sentences) from documents to create relevant cluster nodes and hierarchies. The significant phrases are extracted both at query-time and by the Secure Enterprise Search Document Summarizer, which is a document service included by default for search result clustering.

Configure crawl-time extraction of top phrases with document services parameters on the Global Settings - Document Services page. Create a topic clustering tree on the Global Settings - Clustering Configuration - Create Topic Clustering Tree page.

Topic clustering can be configured with one or more search attributes of String type and with the following Oracle SES internal attributes:

  • eqsnippet: The excerpt of the document with keywords in context.

  • eqtopphrases: The most frequent phrases within one document among the phrases with the same number of words.

  • eqtopsentences: The significant sentences within one document based on the significant phrases.

By default, the attributes keywords, title, eqsnippet and eqtopphrases are configured for topic clustering. Keywords, eqtopphrases, and eqtopsentences contain pre-extracted words and phrases: no additional phrase extraction is performed on these attributes.

Parameters that control query-time word and phrase extraction for the default query application can be configured on the Global Settings - Clustering Configure page. These include the following:

Single Word Extraction 

  • Minimum occurrence: The minimum frequency for the word to be extracted.

  • Maximum number of words to extract: The maximum number of words to be extracted.

Phrase Extraction 

  • Minimum occurrence: Minimum frequency for a phrase to be extracted.

  • Maximum number of phrases to extract: Maximum number of phrases to be extracted.

  • Maximum phrase length: Maximum number of words for each phrase to be extracted.

Topic clustering uses a phrase stopword list and a blacklist to prevent words or phrases from becoming topic cluster result nodes.

The phrase stopword list is also used by the Document Summarizer document service. The stopword file is a language-specific file containing words that should not be considered during phrase extraction. The blacklist file is a language-specific file containing words and phrases that should not appear as cluster node names.

For example, if all indexed documents include the phrase "Oracle Corporation" and it does not make sense to have a cluster node for "oracle corporation", then this phrase could be added to the blacklist.

Note:

A separate stopword list contains index stop words. This is an Oracle SES internal file for words that should not be indexed. This list is not related to phrase extraction.

Both the stopword and blacklist files are in plain text format, with each line containing one word or phrase. The phrase stopwords file name should be "phrasestopwords" followed a period and the two-letter language code (for example, phrasestopwords.en for English). Similarly, the blacklist file name should be "blacklist" followed by a period and the two-letter language code.

By default, these files are located in the directory

ORACLE_HOME/search/lib/plugins/doc/extractor/phrasestopwords

Sample phrase stopword files for other languages are in

ORACLE_HOME/search/lib/plugins/doc/extractor/samples/phrasestopwords

If there are documents for these languages, then copy these files to

ORACLE_HOME/search/lib/plugins/doc/extractor/phrasestopwords

The order of word or phrase in the file does not affect the phrase extraction. For example, phrasestopwords.en may contain the following:

a
an
me
:
z

The blacklist.en file may contain the following:

site maps
oracle corporation
:
term of use

Notes:

  • The stopword and blacklist files are applicable to both the default query application and the Web services API. The other parameters are applicable to the default query application only.

  • During backup and recovery operations, if you recover an instance in a new location, then the stopword directory must to be updated to reflect the new location, because it is an absolute path.

Topic clustering currently works best in English. Both the document summarizer in the crawler and the clustering module in the query application use a stemmer to stem the word and merge the words and phrases with the same stems. The open source stemmer library Snowball is used for this purpose. The version included with Oracle SES supports the following languages:

  • Dutch

  • English

  • Finnish

  • French

  • German

  • Norwegian

  • Portuguese

  • Russian

  • Spanish

  • Swedish

The Egothor stemmer is included for Polish language support. The stemmer configuration is shared between the default query application and the Web Services API.

Note:

Topic clustering is not supported for Chinese and Japanese.

Metadata Clustering

Metadata clustering is performed on a single attribute of String, Date, or Number type. If there are multiple values for the same attribute in one document, then only the first value is used for clustering. By default, the entire value is passed in as is for clustering.

However, for String attributes only, a delimiter can be specified for tokenizing the attribute value. If no tokenization delimiter is entered (or if only white space is entered), then the delimiter defaults to white space. When tokenized, the single attribute value is divided into multiple segments and each segment can correspond to a hierarchy based on another delimiter called the hierarchy delimiter. White space is the default hierarchy delimiter; however, if both tokenization and hierarchy are selected, then the delimiters must be different. Parsing is done first by tokenization, and then by interpreting the hierarchy from the resulting tokens.

Create a metadata clustering tree on the Global Settings - Clustering Configuration - Create Metadata Clustering Tree page.

As an example where both tokenization and hierarchy are meaningful, a category attribute might consist of a comma-delimited list of fields, each representing a slash-separated hierarchical categorization (as in "java/j2ee/jdbc, oracle/search/connector").

The tokenization and hierarchy configuration is not applicable to Date or Number attributes. Metadata trees of Date type attributes use a fixed display format with year on the first level, month on the second, and day on the third. The year is sorted in descending order, and the month and day are sorted in ascending order.

Metadata trees for Number type attributes are range-based with a fixed number of ranges (5) and a fixed tree depth (3), that is, the maximum number of ranges for number clustering trees is five (5). The tree depth starts at the root node. For a range to be shown, it must satisfy the Minimum Documents Per Node parameter, which is set on the Query-time Clustering Configuration page. Empty ranges are not shown.

Using Clustering

Cluster nodes filter the top results but do not change the order of the documents. When users select a cluster node, the result view is limited to the documents in that cluster node. All operations, such as sorting or paging through results, are limited to the cluster node.

The real-time clustering sidebar is hidden by default. Users can display the sidebar by clicking an arrow icon on the left-hand side of the search results page. Within the sidebar, result clusters are shown. The cluster nodes are sorted by the number of documents in each node.

Users can expand or collapse the nodes within a cluster tree without affecting the rest of the interface. If users click a cluster node, then the search results are filtered. If a cluster tree contains no children nodes, it is disabled.

Configuring Clustering in the Web Services API

Methods in the Query Web Service API provide clustering for customized Oracle SES applications. The main interface is the method doOracleOrganizedSearch, which accepts query information, grouping and sorting options, and clustering requests. Based on the request variation, it returns the requested result. A second method doOracleFetchSearch is used when the set of documents is known.

The input for doOracleOrganizedSearch includes the following information

  • Query

  • TopN (the result set size used for grouping, sorting, and clustering)

  • Duplicate controls (removed, marked)

  • Data group list

  • Query and document language

  • Grouping and sorting options

  • Cluster tree configuration info (tree depth, children for each node, threshold, tree format type: JSON, XML; topic extraction configuration, metadata clustering configuration.)

  • Other query parameters (including Number startIndex, Boolean returnCount, String filterConnect, Filter[] filters)

The output is an object that contains the search result, grouping information, and the cluster tree string list. The search result list is in the order specified by the grouping and sorting option. If this is not specified, then it is sorted by the relevance score. The returned cluster tree string represents the clustering tree information: tree structure, node names, and document IDs.

Java Classes for Clustering

There are three classes to support the grouping and sorting options: GroupAttribute, SortAttribute, and GroupResult.

There are two classes to support the clustering request: ClusterConfig, which controls the clustering request, and ClusterTree, which contains the tree output.

The class OracleResultContainer is defined to wrap the search hit result, grouping result, and clustering result.

doOracleFetchSearch is used for fetching a selected list of documents identified by their document ID, federated source ID, or both.

If GroupAttribute is specified, then it is automatically added to the top of the sorting attribute. For example, if the query is grouped by host name and sorted by title, then the search hit is sorted by (hostname, title).

The sorting, grouping, or clustering option can be applied to this result. Sorting is based on the top N result, while grouping and clustering is based on the result window determined by (startIndex, docsRequested).

Cluster Result XML Schema

The main XML element, node, contains the following attributes:

  • id: ID for the node. The value represents the full path with the parent node paths.

  • name: The name of the node. This is actually the topic for the node.

  • level: The cluster node level started from 1 for the top node.

  • size: Number of documents under (directly and indirectly) this cluster node.

  • leaf: This is "1" if the cluster node only contains documents and no child cluster nodes. Otherwise, this is "0".

  • keywords: All keywords and phrases within the cluster node.

The node element contains the document IDs in the XML text element if the node is a simple node. The document ID in the XML file has the format docID.SES_InstanceID. If the document is from the local instance, then the SES_instance_ID is omitted.

<cluster>
   <nodeset>
      <node id="1" name="all" level="1" size="100" leaf="0" keywords="all"/>
      <node id="1.4" name="java" level="2" size="99" leaf="0" keywords="java"/>
      <node id="1.4.1" name="data warehousing" level="3" size="38" leaf="0"
         keywords="technologies bi,data warehousing,linux .net office 
            php security service"/>
      <node id="1.4.1.1" name="tutorials blogs" level="4" size="12" leaf="1"
         keywords="tutorials blogs">
         2773,8031,109,8033,806,26940,817,8024,8030,2862,8032,8028
      </node>
      <node id="1.4.1.2" name="stored procedure" level="4" size="4" leaf="1"
         keywords="stored procedure">
         4239,4243,2784,4335
      </node>
      <node id="1.4.1.3" name="miscellaneous" level="4"  size="22" leaf="1">
         4017,2836,8029,2767,1502,113814,11731,1138,392,2819,2763,1421,
         221,705,7739,2838,2749,2351,2802,1158,15751,15747
      </node>
   </nodeset>
</cluster>

Cluster Result JSON Format

To integrate with AJAX applications, the cluster results can be returned in JSON format. The JSON format directly reflects the tree structure of the cluster results. Each node has a child array, which is a list of nodes representing the direct children of that node, or a docs array representing the document in that node if the node is a leaf node. Nodes in the child array may have children, and so on.

Here is sample JSON output.

{"nodeset":
 
  {"id":"1",
  "name":"all",
  "level":1,
  "size":100,
  "leaf":false,
  "keywords":"all",
  "children":
     [{"id":"1.4",
     "name":"java",
     "level":2,
     "size":99,
     "leaf":false,
     "keywords":"java",
     "children":
         [{"id":"1.4.1",
         "name":"data warehousing",
         "level":3,
         "size":38,
         "leaf":false,
         "keywords":"technologies bi,data warehousing,linux .net office php security service",
         "children":
            [{"id":"1.4.1.1",
            "name":"tutorials blogs",
            "level":4,
            "size":12,
            "leaf":true,
            "keywords":"tutorials blogs", "docs":["2773","8031","10","803","806","26940","817","8024","8030","2862","803","8028"] },
            {"id":"1.4.1.2",
            "name":"stored procedure",
            "level":4,
            "size":4,
            "leaf":true,
            "keywords":"stored procedure",
            "docs":["4239","4243","2784","4335"]}]
         }]
     },
     {"id":"1.5",
     "name":"miscellaneous",
     "level":2,
     "size":1,
     "leaf":true,
     "docs":["265915"]
     }]
   }
}

Configuring Top-N Documents and Group/Sort Attributes

Modify the search.properties file to configure the number of documents to retrieve for top-N processing and clustering and also to control the attributes available for grouping and sorting. These settings affect the default query application. The search.properties file is located in the ORACLE_HOME/search/webapp/config directory.

When modifying search.properties, use only ASCII characters or Unicode escape characters. Non-ASCII characters are ignored.

For example, the Unicode escape characters for the Hiragana for "aiueo" are: \u3042\u3044\u3046\u3048\u304A. In search.properties, you would enter these characters for the group_tab_order property to specify the order of groups aiueo, ghi, def, and abc:

ses.qapp.group_tab_order=\u3042\u3044\u3046\u3048\u304A,ghi,def,abc

Unicode conversion tools are available on the World Wide Web.

Top-N Documents

The default top-N documents setting specifies the number of documents retrieved by default as part of the AJAX call for result clustering, grouping, and sorting:

ses.qapp.default_topn_docs=100

To page through a very large result set, say 500 documents, the user may view a page of results beyond the default top-N value. Suppose top-N is set to the default 100, and the user wants to view the results numbered 150-160. To provide result clustering and sorting/grouping, the browser must request 160 results. If the user views page 490-500, then the browser would be requesting 500 results through the AJAX call. This may result in reduced performance.

The maximum top-N documents setting represents a threshold above which the query application displays only a single page of results.

This mode does not provide any sorting, grouping, or result clustering. However, it lets a user to view the entire result set without the costly subsequent retrievals of top-N results.

Suppose max_topn_docs is to 200. If an end user is viewing results 30-40, then the browser would retrieve the default of 100 results. If the user views results 170-180, then the browser would request 180 documents. If the user views results above 200, then the query application would display only the current page of results. For example:

ses.qapp.max_topn_docs=300

Group By and Sort By Lists

The set of attributes available in the Group By and Sort By drop-down lists in the query page also can be configured in the search.properties file. The attributes available for grouping are configured by setting the ses.qapp.groupable_attrs property value, and the attributes available for sorting are configured by setting the ses.qapp.sortable_attrs property value.

The property value for either grouping or sorting is an ordered, alternating comma-delimited list of the search attribute name followed by the display name.

Table 5-2 lists the default grouping attributes:

Table 5-2 Grouping Attributes

Description Attribute Name Display

No grouping

ses_none

(none)

Source group

ses_sourceGroup

Source

Last modified date

lastModifiedDate

Date

Author

author

Author

File format

mimetype

File Format


The property value for this default set for grouping is the following:

ses.qapp.groupable_attrs=ses_none,-,ses_sourceGroup,-,lastModifiedDate,-,
     author,-,mimetype,-

Table 5-3 lists the default sorting attributes:

Table 5-3 Sorting Attributes

Description Attribute Name Display

Relevance

ses_score

Relevance

Last modified date

lastModifiedDate

Date

Author

author

Author

File format

mimetype

File Format

Document title

title

Title

URL

infosource path

Path

Language

language

Language


The property value for this default set for sorting is the following:

ses.qapp.sortable_attrs=ses_score,-,lastModifiedDate,-,author,-,
     mimetype,-,title,-,infosource path,-,language,-

To use the translated name of a search attribute for display instead of providing a fixed display name, insert a dash (-) for the display name. For example, if the search attribute "Test1" has translated names configured on the Global Settings page in the Oracle SES Administration GUI, then the following uses the translated names for display:

ses.qapp.sortable_attrs=ses_score,-,Test1,-,lastModifiedDate,-, ...

Customizing the Relevancy of Search Results

You can customize the default Oracle SES ranking to create a more relevant search result list for your enterprise. Ranking is determined by default and custom attributes. Default attributes include title, keywords, description, and others. Different weights indicate the importance of each attribute for document relevancy. For example, Oracle SES gives more weight to titles than to keywords.

To customize the relevancy of search results, you can use the Query Web Services API or ranking.xml to tune the weights of default attributes, or you can add custom attributes and set weights for those attributes. This topic discusses customizing relevancy in the Query Web Services API.

Customizing Relevancy in the Query Web Services API

The following is the signature of the method for advanced search:

public OracleSearchResult doOracleAdvancedSearch (
        String query,
        Integer startIndex,
        Integer docsRequested,
        Boolean dupRemoved,
        Boolean dupMarked,
        DataGroup groups[],
        String queryLang,
        String docLang,
        Boolean returnCount,
        String filterConnector,
        Filter filters[],
        Integer[] fetchAttributes,
        String searchControls)  throws Exception

The searchControls parameter accepts a XML string, which include the filter and ranking elements.

<searchControls>
     <filter>  ... </filter>
     <ranking> ... </ranking>
</searchControls>

Filter Element

Filters for attribute search are passed in the filter element. All the various AND and OR conditions on the attributes are specified in the XML. For example:

<filter>
   <operator type="and">
   <operator type="or">
      <attributefilter name="xxx" type="string" operation="equals" value="ttt"/>
      <attributefilter name="yyy" type="number" 
         operation="greaterthan" value="22"/>
...
   </operator>
...
      <attributefiler name="aaa" type="number" operation="equals" value="22"/>
...
   </operator>
</filter

If the parameter searchControls is null, then filters and filterConnector are used to create advanced search; otherwise, they are ignored.

Ranking Element

The ranking XML string is expressed as ranking element in searchControls. The following is an example of ranking element:

<ranking>
   <global-settings>
      <enable-all-default-factor>TRUE</enable-all-default-factor>
   </global-settings>
   <default-factor>
      <!--default ranking factor -- >
      ...
   </default-factor>
   <default-factor>
      <!--default ranking factor -- >
      ...
   </default-factor>
   <custom-factor>
      <!--default ranking factor -- >
      ...
   </custom-factor>
   <custom-factor>
      <!--default ranking factor -- >
      ...
   </custom-factor>
</ranking>

The following rules apply to the construction of ranking XML string:

  • The whole ranking XML can be null, in which case default ranking is used.

  • The ranking XML contains the elements default-factor and custom-factor. Both can be null or absent at the same time.

  • When default-factor is null or absent and when custom-factor is not null, default ranking is used with the effect of custom-factor.

  • When custom-factor is null or absent, it does not have any impact on the ranking.

  • The ranking scheme applies only for the function doOracleAdvancedSearch call with none-empty query parameter passed.

Global-Settings Element

The global-settings element contains parameter settings across ranking factors. It has the following two sub-elements:

  • enable-all-default-factor

    The ranking element has an attribute called enable-all-default-factor, which accepts two values: true or false. (When this attribute is absent, true is taken as the default value.)

    When enable-all-default-factor is true, all default attributes are included in ranking queries, unless some default attributes are explicitly excluded in default-factor elements.

    When enable-all-default-factor is false, all default attributes are excluded in ranking queries, unless some default attributes are explicitly included in default-factor elements.

Default-Factor Element

The default-factor element assigns a weight to an attribute.

<default-factor>
   <name>title</name>
   <weight>VERY HIGH</name>
</default-factor>

Default factor attribute names are case-insensitive.

When a default-factor does not appear in the ranking XML string, Oracle SES takes the default weight for this ranking factor, unless default factors are disabled by enable-all-default-factor.

Oracle SES supports the following values for weight element: empty (Oracle SES uses the default weight), none (this attributes is not used in the ranking query), very high, high, medium, low, and very low.

Table 5-4 lists the default-factor names and weights:

Table 5-4 Oracle SES Default Attributes and Weights

Attribute Weight

Title

High

Description

Medium

Reftext

High

Keywords

Medium

Subject

Low

Author

Medium

H1headline

Low

H2headline

Very low

Url

Low

Urldepth

High

Language Match

High

Linkscore

High


Custom-Factor Element

The custom-factor element lets you add more attributes for ranking. Any indexed search attribute can be a custom ranking attribute.

Note:

Adding custom attributes for relevancy ranking can downgrade search performance.

The custom-factor element has four elements: attribute-name, attribute-type, factor-type, and weight (or match depending on the factor-type).

<custom-factor>
            <attribute-name>author manager</attribute-name>
            <attribute-type>STRING</attribute-type>
            <factor-type>QUERY_FACTOR</factor-type>
            <weight>LOW</weight> 
</custom-factor>

or

<custom-factor>
            <attribute-name>document quality</attribute-name>
            <attribute-type>STRING</attribute-type>
            <factor-type>STATIC_FACTOR<factor-type>
            <match>
            <value>good</value>
            <weight>HIGH</weight> 
            </match>
            <match>
            <value>fair</value>
            <weight>MEDIUM</weight> 
            </match>
            <match>
            <value>bad</value>
            <weight>VERY LOW</weight> 
            </match>
</custom-factor>
  • The attribute-name values are literally matched against attribute name in Oracle SES. Any indexed search attribute name can be attribute-name value. The value of the attribute-name element is case-insensitive.

  • The attribute-type element defines the type of the attribute. Only String attribute type is supported. Attribute-name and attribute-type in combination define a valid Oracle SES attribute.

  • For factor-type, Oracle SES supports two types of ranking for custom ranking attributes.

    • QUERY_FACTOR: The attribute value is matched against query terms. A positive match boosts the document based on specified weight. QUERY_FACTOR is a query-based ranking factor; for example, title and reftext. The weight element should appear for this custom ranking factor. For example, with the query "Roger Federer", if a document has a custom attribute publisher with the value "Roger Federer", then it could be relevant.

    • STATIC_FACTOR: Attribute value is matched against fixed values specified in the custom ranking factor. (The match element should appear for this custom ranking factor.) STATIC_FACTOR is not a query-based ranking factor. The fixed values specify qualities of the documents, such as the link score and the sources of documents. For example, assume that documents have been classified based on quality. Well-written documents are classified as good, and poorly-written documents are classified as bad. A good document should be ranked higher than a bad document, even though they are both matched against a query. You can specify in the API that a document having a good quality should be boosted in relevancy by a specified weight.

  • The match element specifies the match values and corresponding match weights when the factor-type is STATIC_FACTOR. The following XML string is a example of match element:

    <match>
    <value>bad</value>
    <weight>VERY LOW</weight> 
    </match>
    
  • The value element is used to match the corresponding attribute value of this ranking factor. Only alphanumeric letters are allowed in the attribute value. The match is case-insensitive.

  • The weight element has the identical syntax with weight element for default ranking element.

Applying Ranking Factors

The XML ranking text can be applied in two places:

  • As a part of the searchControls element, the ranking factors can be used as an advanced control for each query execution through the Web services method. This is called per-query ranking control.

  • As a separate file in the ORACLE_HOME/search/webapp/config directory, the ranking.xml configuration file is read and applied each time OC4J is started. The ranking factors specified in the configuration file are applied to all queries. This is called instance-wide ranking control.

In federated search, instance-wide ranking controls only applies to one instance. You must configure each instance for ranking customization separately.

If a conflict arises, the per-query ranking control specified in Web services method overrides the settings specified in instance-wide ranking control. That can include the following cases:

  • Per-query and instance-wide ranking specify the same factor, the factor set by per-query is taken by Oracle SES.

  • Instance-wide ranking control sets a ranking factor, but per-query ranking control does not mention. Oracle SES takes the factor set by instance-wide ranking control.

  • Per-query ranking control sets a ranking factor, which instance-wide ranking controls does not mention. Oracle SES takes the factor set by per-query ranking control.

  • If instance-wide ranking control sets enable-all-default-factor as false and per-query ranking control sets enable-all-default-factor as true, then Oracle SES takes the default attributes set explicitly by instance-wide ranking control plus the attributes set by per-query ranking controls, with the latter overriding the former.