5 Customizing the Search Results

This chapter explains the various ways available for customizing the search results. It contains the following topics:

Adding Suggested Content in Search Results

Suggested content lets you display real-time data content along with the result list in the default query application. Oracle SES retrieves data from content providers and applies a style sheet to the data to generate an HTML fragment. The HTML fragment is displayed in the result list and is available through the Web Services API. For example, when an end user searches for contact information on a coworker, Oracle SES can fetch the content from the suggested content provider and return the contact information (e-mail address, phone number, and so on) for that person with the result list. Suggested content results appear in tabbed panes above the query results. When the query returns no results, suggested content is not displayed.

Configure suggested content on the Search - Suggested Content page in the Oracle SES Administration GUI. Enter the maximum number of suggested content results (up to 20) to be included with the Oracle SES result list. The results are rendered on a first-come, first-served basis.

Suggested Content Providers

Regular expressions (as supported in the Java regular expression API java.util.regex) are used to define query patterns for suggested content providers. The regular expression-based pattern matching is case-sensitive. For example, a provider with the pattern dir\s(\S+) is triggered on the query dir james but not on the query Dir James. To trigger on the query Dir James, the pattern could be defined either as [Dd][Ii][Rr]\s+(\S+) or as (?i)dir\s+(\S+). A provider with a blank query pattern is triggered on all queries.

The URL you enter for the suggested content provider can contain the following variables: $ora:q, $ora:lang, $ora:q1 ... $ora:qn and $ora:username.

  • $ora:q is the end user full query.

  • $ora:lang is the two-letter code for the browser language.

  • $ora:qn is the nth regular expression match group from the end user query. n starts from 1. If no nth group is matched, then the empty string replaces the variable.

  • $ora:username is the end user name.

Enter an XSLT style sheet to define rules (for example, the size and style) for transforming XML content from a provider into an HTML fragment. This HTML fragment is displayed in the result list or returned over the Web Services API. If you do not enter an XSLT style sheet, then Oracle SES assumes that the suggested content provider returns HTML. If you do not enter an XSLT style sheet and the provider returns XML, then the result list displays the plain XML.

Note:

As an administrator, you are responsible for verifying that the suggested content providers return valid and safe content. Corrupted or incomplete content returned by a suggested content provider can affect the formatting of the default query application results page.

Security Options

There are three security options for how Oracle SES passes the end user's authentication information to the suggested content provider:

  • None: No security policy is used. (Default)

  • Cookie: The end user first must be authenticated by the suggested content provider. A cookie is set for the user to maintain a session. Oracle SES must know the cookie used by the provider for authentication, and it is made available during registration of the suggested content provider. When the user enters a query, Oracle SES grabs the cookies from the user's request header and passes them to the provider. The cookie scope must be set to the common domain of the provider site and the Oracle SES site by the provider.

    For example, suppose the provider site is http://provider.example.com and the Oracle SES site is http://ses.example.com. After the end user logs in to the provider site, the site could set the value of the security cookie loginCookie with domain scope .example.com. When the end user searches in Oracle SES, Oracle SES gets the loginCookie value from the end user browser and forwards it to the provider site to get the suggested content (without login to the provider site again). However, if the provider site is accessed as http://provider or if the Oracle SES site is accessed as http://SES, then no domain cookie is available for sharing between the two sites and this security mechanism does not work.

    You can decide what happens when suggested content is available but the user is not logged in to the suggested content provider or the cookie for the provider is not available. For Unauthenticated User Action, if you select Ignore content, then content from that provider is not displayed in the result list. If you select Display login message, then Oracle SES returns a message that there is content available from this provider but the user is not logged in. The message also provides a link to log in to that provider. Enter the link for the suggested content provider login in the Login URL field.

  • Service-to-Service: A one-way trusted relationship is established between Oracle SES and the suggested content provider. Any user logged in to Oracle SES does not need to be authenticated by the provider again. The provider only authenticates the Oracle SES application and trusts the Oracle SES application to act as the end user.

    The end user identity is sent from Oracle SES to the provider site in the HTTP header ORA_S2S_PROXY_USER. The trusted entity could be a proxy user configured in the identity management system used by the provider, or it could be a name-value pair.

If the secured content provider must authenticate the end user and it sets the domain level security cookie to maintain login information after the end user login, then use the cookie method for form authentication. The Oracle SES end user must login manually to the provider site, and the security cookie is stored in the browser. Oracle SES searches on the provider for the end user without additional login.

However, if the domain security cookie is not allowed for the provider, then the provider must support service-to-service security. The provider must allow an Oracle SES application account to search after passing HTTP basic or digest authentication. Also, if the provider has different secured content for different Oracle SES end users, then it must respect the end user security (in the HTTP header ORA_S2S_PROXY_USER) for the Oracle SES search request.

To register a provider that requires either HTTP basic or HTTP digest authentication, specify the authentication user name in the Entity Name field and specify the authentication password in the Entity Password field.

Example Configuring Google OneBox for Suggested Content

Existing OneBox providers can be configured as Oracle SES suggested content providers. For example, for a Google OneBox provider, the provider URL might be http://host.company.com/apps/directory.jsp and the trigger might be dir\s(\S+). When the user query is "dir james", the provider receives the request with a query string similar to the following: apiMaj=10&apiMin=1&oneboxName=app&query=james.

With a suggested content provider, set the URL template as http://host.company.com/apps/directory.jsp?apiMaj=10&apiMin=1&oneboxName=app&query=$ora:q1. The provider pattern is the same: dir\s(\S+). The XSLT used for Google OneBox can be re-used with a minor change. Look for the line:

<xsl:template name="apps">

and change that line in your template to

<xsl:template match="/OneBoxResults">

Customizing the Appearance of Search Results

You can customize the default look and feel of the search result list for the default query application.

To customize the appearance of search results: 

  1. On the Global Settings tab, select Configure Search Results List under Out-of-Box Query Application.

    The Configure Search Results List page is displayed.

  2. Select Use Advanced Configuration, then make the desired customizations.

  3. Select attributes to appear in the XML description of result documents.

    The available attributes are local attributes, federated attributes, and internal attributes. Each attribute name appears only one time. That is, the name of a federated attribute with the same name as another attribute or with an explicit mapping to a local search attribute appears only once. Table 5-1 describes Oracle SES internal attributes.

  4. Provide extensible style sheet language transformations (XSLT) to operate on the selected attributes. The default XSLT is installed at

    ses_home/search/webapp/defaults/advanced_config.xsl

    This XSLT transforms XML content into an HTML fragment to be displayed in the result list. If the XSLT is blank, then the untransformed result XML is displayed in the result list. You can use this output for debugging purposes while configuring the XSLT. Invalid XSLTs cannot be saved.

    The output of this transformation should be HTML. Enter the following in the XSLT:

    <xsl:output method="html" />
    
  5. Provide a cascading style sheet (CSS) to style the HTML generated in the XSLT. The default CSS is installed at

    ses_home/search/webapp/defaults/advanced_config.css

    This CSS is used along with the included CSS files for the default query application. When there are conflicts between the advanced configuration CSS and the default CSS files, the advanced configuration definitions are used. Default XSLT and CSS style sheets are provided for Advanced Configuration.

Table 5-1 Oracle SES Internal Attributes

Name Type Description

eqcacheurl

String

The URL of the cached version of the document; that is, the value of the "Cached" link in the default result list.

eqcontentlength

Number

The size of the document in bytes.

eqdatasourcename

String

The (untranslated) name of the source where the document originated. This name is local to the instance that the document came from and not the instance that is receiving the document.

If a document comes from a federated source, then the value of this attribute is the name of the source on the federated instance, and not the name of the federated source on the local instance.

eqdatasourcetype

String

The (untranslated) type of source where the document originated. This type is local to the instance from which the document came. For example, Federated is not a valid value for this attribute.

eqdocid

Number

Document ID.

eqfedchain

String

The chain of instance names representing the path of a federated document. The instance names are delimited by a semicolon (;).

eqfedid

String

The federated source ID chain delimited by an underscore (_).

eqgroupbrowseurl

String

The URL to browse the infosource group; that is, the value of the "Source Group" link in the default result list.

eqlinksurl

String

The URL of the page containing a list of links to the document; that is the value of the "Links" link in the default result list.

eqpathbrowseurl

String

The URL to browse the infosource path; that is, the value of the "Path" link in the default result list.

eqredirecturl

String

The redirect URL to the original document; that is, the value of the title link in the default result list.

eqsimilardoc

Boolean

A value of true indicates a similar document; otherwise, false.

eqsnippet

String

The excerpt or keyword in context of the document.

equserquery

String

The query string.


XML Result Schema

To apply the XSLT, each document is converted into an XML description at query-time with the following schema:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
   <xsd:element name="result">
   <xsd:complexType>
   <xsd:all>
      <!-- Internal attributes -->
      <xsd:element name="eqdatasourcename" type="xsd:string" maxOccurs=1 />
      <xsd:element name="eqdatasourcetype" type="xsd:string" maxOccurs=1 />
      <xsd:element name="eqsnippet" type="xsd:string" maxOccurs=1 />
      <xsd:element name="eqredirecturl" type="xsd:string" maxOccurs=1 />
      <xsd:element name="eqcacheurl" type="xsd:string" maxOccurs=1 />
      <xsd:element name="eqlinksurl" type="xsd:string" maxOccurs=1 />
      <xsd:element name="eqsimilardoc" type="xsd:boolean" maxOccurs=1 />
      <xsd:element name="eqcontentlength" type="xsd:integer" maxOccurs=1 />
      <xsd:element name="equserquery" type="xsd:string" maxOccurs=1 />
      <xsd:element name="eqgroupbrowseurl" type="xsd:string" maxOccurs=1 />
      <xsd:element name="eqpathbrowseurl" type="xsd:string" maxOccurs=1 />
      <xsd:element name="eqdocid" type="xsd:integer" maxOccurs=1 />
      <xsd:element name="eqfedid" type="xsd:string" maxOccurs=1 />
      <!-- Built-in search attributes -->
      <xsd:element name="author" type="xsd:string" maxOccurs=1 />
      <xsd:element name="description" type="xsd:string" maxOccurs=1 />
      <xsd:element name="headline1" type="xsd:string" maxOccurs=1 />
      <xsd:element name="headline2" type="xsd:string" maxOccurs=1 />
      <xsd:element name="headline3" type="xsd:string" maxOccurs=1 />
      <xsd:element name="host" type="xsd:string" maxOccurs=1 />
      <xsd:element name="infosource" type="xsd:string" maxOccurs=1 />
      <xsd:element name="infosourcepath" type="xsd:string" maxOccurs=1 />
      <xsd:element name="keywords" type="xsd:string" maxOccurs=1 />
      <xsd:element name="language" type="xsd:string" maxOccurs=1 />
      <xsd:element name="lastmodifieddate" type="xsd:date" maxOccurs=1 />
      <xsd:element name="mimetype" type="xsd:string" maxOccurs=1 />
      <xsd:element name="referencetext" type="xsd:string" maxOccurs=1 />
      <xsd:element name="subject" type="xsd:string" maxOccurs=1 />
      <xsd:element name="title" type="xsd:string" maxOccurs=1 />
      <xsd:element name="url" type="xsd:string" maxOccurs=1 />
      <xsd:element name="urldepth" type="xsd:integer" maxOccurs=1 />
      <!-- Custom search attributes -->
               .
               .
               .
   </xsd:all>
   </xsd:complexType>
   </xsd:element>
</xsd:schema>

XML has the following rules for element names:

  • Alphanumeric and non-English characters, numbers, and ideograms, are allowed

  • Limited punctuation is allowed: underscore, hyphen, and period

  • Names can only begin with letters, ideograms, and underscores

Custom attribute names must conform to these rules for advanced result rendering. To enforce these rules, the empty string replaces all characters that are not permitted by these rules. In addition, Oracle SES search attributes are case-insensitive, and therefore all attributes are converted to lowercase when used in XML format.

For example, suppose the raw XML result data is as follows.

<result>
   <eqdatasourcetype>WEB</eqdatasourcetype>
   <title>Oracle Secure Enterprise Search</title>
   <url>
      http://www.oracle.com/technetwork/search/oses/overview/index.html
   </url>
      <author>Anonymous</author>
   <description>
      Oracle Secure Enterprise Search 10g, a standalone product from Oracle, enables a secure, high quality, easy-to-use search across all enterprise information assets.
   </description>
 </result>

The following XSLT extracts and formats the title, URL, and author for documents coming from Web sources:

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:template match="result[eqdatasourcetype='WEB']">
   <span class="title">
     <xsl:text> &quot;</xsl:text><xsl:value-of select="title" />
     <xsl:text>&quot;</xsl:text>
   </span>
   <span class="author">
     <xsl:text> By </xsl:text><xsl:value-of select="author" />
   </span>
   <br/>
   <span class="url">
     <a href="http://{url}"><xsl:value-of select="url" /></a>
   </span>
 </xsl:template>
</xsl:stylesheet>

A CSS style sheet for this output may be:

.title { font-weight: bold; }
.url { font-style: italic; }

These style sheets produce a final result of:

"Oracle Secure Enterprise Search" By Anonymous

http://www.oracle.com/technetwork/search/oses/overview/index.html

Configuring and Using Clustering in Search Results

Real-time clustering dynamically organizes search results into groups to provide end users with different views on the top results. Clustered documents within one group, called a cluster node, share the same common topics or property values. A cluster node with a large document set can be categorized into child cluster nodes, and a hierarchy is built. Users can navigate directly to a specific cluster node. Effective real-time clustering balances clustering quality and clustering time.

Search attributes (String, Number or Date) are used to generate a cluster tree. The attributes can be local search attributes, federated attributes that are not explicitly mapped, and Oracle SES internal attributes.

Oracle SES supports two types of cluster trees: topic and metadata. Each tree can be enabled or disabled individually. Parameters that apply to all cluster trees for the default query application can be configured on the Global Settings - Clustering Configuration page. These include the following:

  • Enable clustering: Select this option to enable clustering.

  • Maximum cluster tree depth: The maximum level of the cluster node hierarchy.

  • Maximum number of children per node: The maximum number of cluster nodes on each level. This does not apply to the miscellaneous node.

  • Minimum number of documents per node: The minimum number of the documents within one node. This does not apply to the miscellaneous node.

    Within each level of a cluster tree, documents that are not categorized into a node are placed in a special node named miscellaneous. The Minimum number of documents per node and Maximum number of children per node parameters do not apply to the miscellaneous node.

For customized Oracle SES applications, configure clustering with the Query Web Services API.

Topic Clustering

Topic clustering uses the most significant phrases (and optionally sentences) from documents to create relevant cluster nodes and hierarchies. The significant phrases are extracted both at query-time and by the Secure Enterprise Search Document Summarizer, which is a document service included by default for search result clustering.

Configure crawl-time extraction of top phrases with document services parameters on the Global Settings - Document Services page. Create a topic clustering tree on the Global Settings - Clustering Configuration - Create Topic Clustering Tree page.

Topic clustering can be configured with one or more search attributes of String type and with the following Oracle SES internal attributes:

  • eqsnippet: The excerpt of the document with keywords in context.

  • eqtopphrases: The most frequent phrases within one document among the phrases with the same number of words.

  • eqtopsentences: The significant sentences within one document based on the significant phrases.

By default, the attributes keywords, title, eqsnippet and eqtopphrases are configured for topic clustering. Keywords, eqtopphrases, and eqtopsentences contain pre-extracted words and phrases: no additional phrase extraction is performed on these attributes.

Parameters that control query-time word and phrase extraction for the default query application can be configured on the Global Settings - Clustering Configure page. These include the following:

Single Word Extraction 

  • Minimum occurrence: The minimum frequency for the word to be extracted.

  • Maximum number of words to extract: The maximum number of words to be extracted.

Phrase Extraction 

  • Minimum occurrence: Minimum frequency for a phrase to be extracted.

  • Maximum number of phrases to extract: Maximum number of phrases to be extracted.

  • Maximum phrase length: Maximum number of words for each phrase to be extracted.

Topic clustering uses a phrase stop words list and a blacklist to prevent words or phrases from becoming topic cluster result nodes.

The phrase stop words list is also used by the Document Summarizer document service. The stop words file is a language-specific file containing words that should not be considered during phrase extraction. The blacklist file is a language-specific file containing words and phrases that should not appear as cluster node names.

For example, if all indexed documents include the phrase "Oracle Corporation" and it does not make sense to have a cluster node for "oracle corporation", then this phrase could be added to the blacklist.

Note:

A separate stop words list contains index stop words. This is an Oracle SES internal file for words that should not be indexed. This list is not related to phrase extraction.

Both the stop words and the blacklist files are in plain text format, with each line of text containing one word or phrase. The phrase stop words file name should be "phrasestopwords" followed by period and two-letter language code (for example, phrasestopwords.en for English). Similarly, the blacklist file name should be "blacklist" followed by period and two-letter language code.

By default, these files are located in the directory:

ses_home/search/lib/plugins/doc/extractor/phrasestopwords

Sample phrase stop words files for other languages (non-English) are in the directory:

ses_home/search/lib/plugins/doc/extractor/samples/phrasestopwords

If there are documents in any of the non-English languages, then copy the related language files from the above mentioned directory to the default stop words directory:

ses_home/search/lib/plugins/doc/extractor/phrasestopwords

The order of words and phrases in the stop words file does not affect the phrase extraction. The following is a sample phrasestopwords.en file:

a
an
me
:
z

The following is a sample blacklist.en file:

site maps
oracle corporation
:
term of use

Note:

The stop words and the blacklist are used by both the default query application and the Web services API.

Topic clustering currently works best in English. Both the document summarizer in the crawler and the clustering module in the query application use a stemmer to stem the word and merge the words and phrases with the same stems. The open source stemmer library Snowball is used for this purpose. The version included with Oracle SES supports the following languages:

  • Dutch

  • English

  • Finnish

  • French

  • German

  • Norwegian

  • Portuguese

  • Russian

  • Spanish

  • Swedish

The Egothor stemmer is included for Polish language support. The stemmer configuration is shared between the default query application and the Web Services API.

Note:

Topic clustering is not supported for Chinese and Japanese.

Metadata Clustering

Metadata clustering is performed on a single attribute of String, Date, or Number type. If there are multiple values for the same attribute in one document, then only the first value is used for clustering. By default, the entire value is passed in as is for clustering.

However, for String attributes only, a delimiter can be specified for tokenizing the attribute value. If no tokenization delimiter is entered (or if only white space is entered), then the delimiter defaults to white space. When tokenized, the single attribute value is divided into multiple segments and each segment can correspond to a hierarchy based on another delimiter called the hierarchy delimiter. White space is the default hierarchy delimiter; however, if both tokenization and hierarchy are selected, then the delimiters must be different. Parsing is done first by tokenization, and then by interpreting the hierarchy from the resulting tokens.

Create a metadata clustering tree on the Global Settings - Clustering Configuration - Create Metadata Clustering Tree page.

As an example where both tokenization and hierarchy are meaningful, a category attribute might consist of a comma-delimited list of fields, each representing a slash-separated hierarchical categorization (as in "java/j2ee/jdbc, oracle/search/connector").

The tokenization and hierarchy configuration is not applicable to Date or Number attributes. Metadata trees of Date type attributes use a fixed display format with year on the first level, month on the second, and day on the third. The year is sorted in descending order, and the month and day are sorted in ascending order.

Metadata trees for Number type attributes are range-based with a fixed number of ranges (5) and a fixed tree depth (3), that is, the maximum number of ranges for number clustering trees is five (5). The tree depth starts at the root node. For a range to be shown, it must satisfy the Minimum Documents Per Node parameter, which is set on the Query-time Clustering Configuration page. Empty ranges are not shown.

Using Clustering

Cluster nodes filter the top results but do not change the order of the documents. When users select a cluster node, the result view is limited to the documents in that cluster node. All operations, such as sorting or paging through results, are limited to the cluster node.

The real-time clustering sidebar is hidden by default. Users can display the sidebar by clicking an arrow icon on the left-hand side of the search results page. Within the sidebar, result clusters are shown. The cluster nodes are sorted by the number of documents in each node.

Users can expand or collapse the nodes within a cluster tree without affecting the rest of the interface. If users click a cluster node, then the search results are filtered. If a cluster tree contains no children nodes, it is disabled.

Configuring Clustering in the Web Services API

Methods in the Query Web Service API provide clustering for customized Oracle SES applications. The main interface is the method doOracleOrganizedSearch, which accepts query information, grouping and sorting options, and clustering requests. Based on the request variation, it returns the requested result. A second method doOracleFetchSearch is used when the set of documents is known.

The input for doOracleOrganizedSearch includes the following information

  • Query

  • TopN (the result set size used for grouping, sorting, and clustering)

  • Duplicate controls (removed, marked)

  • Data group list

  • Query and document language

  • Grouping and sorting options

  • Cluster tree configuration info (tree depth, children for each node, threshold, tree format type: JSON, XML; topic extraction configuration, metadata clustering configuration.)

  • Other query parameters (including Number startIndex, Boolean returnCount, String filterConnect, Filter[] filters)

The output is an object that contains the search result, grouping information, and the cluster tree string list. The search result list is in the order specified by the grouping and sorting option. If this is not specified, then it is sorted by the relevance score. The returned cluster tree string represents the clustering tree information: tree structure, node names, and document IDs.

Java Classes for Clustering

There are three classes to support the grouping and sorting options: GroupAttribute, SortAttribute, and GroupResult.

There are two classes to support the clustering request: ClusterConfig, which controls the clustering request, and ClusterTree, which contains the tree output.

The class OracleResultContainer is defined to wrap the search hit result, grouping result, and clustering result.

doOracleFetchSearch is used for fetching a selected list of documents identified by their document ID, federated source ID, or both.

If GroupAttribute is specified, then it is automatically added to the top of the sorting attribute. For example, if the query is grouped by host name and sorted by title, then the search hit is sorted by (hostname, title).

The sorting, grouping, or clustering option can be applied to this result. Sorting is based on the top N result, while grouping and clustering is based on the result window determined by (startIndex, docsRequested).

Cluster Result XML Schema

The main XML element, node, contains the following attributes:

  • id: ID for the node. The value represents the full path with the parent node paths.

  • name: The name of the node. This is actually the topic for the node.

  • level: The cluster node level started from 1 for the top node.

  • size: Number of documents under (directly and indirectly) this cluster node.

  • leaf: This is "1" if the cluster node only contains documents and no child cluster nodes. Otherwise, this is "0".

  • keywords: All keywords and phrases within the cluster node.

The node element contains the document IDs in the XML text element if the node is a simple node. The document ID in the XML file has the format docID.SES_InstanceID. If the document is from the local instance, then the SES_instance_ID is omitted.

<cluster>
   <nodeset>
      <node id="1" name="all" level="1" size="100" leaf="0" keywords="all"/>
      <node id="1.4" name="java" level="2" size="99" leaf="0" keywords="java"/>
      <node id="1.4.1" name="data warehousing" level="3" size="38" leaf="0"
         keywords="technologies bi,data warehousing,linux .net office 
            php security service"/>
      <node id="1.4.1.1" name="tutorials blogs" level="4" size="12" leaf="1"
         keywords="tutorials blogs">
         2773,8031,109,8033,806,26940,817,8024,8030,2862,8032,8028
      </node>
      <node id="1.4.1.2" name="stored procedure" level="4" size="4" leaf="1"
         keywords="stored procedure">
         4239,4243,2784,4335
      </node>
      <node id="1.4.1.3" name="miscellaneous" level="4"  size="22" leaf="1">
         4017,2836,8029,2767,1502,113814,11731,1138,392,2819,2763,1421,
         221,705,7739,2838,2749,2351,2802,1158,15751,15747
      </node>
   </nodeset>
</cluster>

Cluster Result JSON Format

To integrate with AJAX applications, the cluster results can be returned in JSON format. The JSON format directly reflects the tree structure of the cluster results. Each node has a child array, which is a list of nodes representing the direct children of that node, or a docs array representing the document in that node if the node is a leaf node. Nodes in the child array may have children, and so on.

Here is sample JSON output.

{"nodeset":
 
  {"id":"1",
  "name":"all",
  "level":1,
  "size":100,
  "leaf":false,
  "keywords":"all",
  "children":
     [{"id":"1.4",
     "name":"java",
     "level":2,
     "size":99,
     "leaf":false,
     "keywords":"java",
     "children":
         [{"id":"1.4.1",
         "name":"data warehousing",
         "level":3,
         "size":38,
         "leaf":false,
         "keywords":"technologies bi,data warehousing,linux .net office php security service",
         "children":
            [{"id":"1.4.1.1",
            "name":"tutorials blogs",
            "level":4,
            "size":12,
            "leaf":true,
            "keywords":"tutorials blogs", "docs":["2773","8031","10","803","806","26940","817","8024","8030","2862","803","8028"] },
            {"id":"1.4.1.2",
            "name":"stored procedure",
            "level":4,
            "size":4,
            "leaf":true,
            "keywords":"stored procedure",
            "docs":["4239","4243","2784","4335"]}]
         }]
     },
     {"id":"1.5",
     "name":"miscellaneous",
     "level":2,
     "size":1,
     "leaf":true,
     "docs":["265915"]
     }]
   }
}

Configuring and Using Facets in Search Results

The faceted search provides the mechanism for filtering data in the search result based on the categorization and the sub-categorization of the data. The various categories of data can be considered as facets, and for each category, the sub-categories are called facet nodes. A sub-category again can have sub-sub-categories, that is, a facet node can have child nodes, and so on. A complete node hierarchy of a facet is called facet tree.

For example, all the books in a book store can be categorized based on the subject of the books, such as, history, art, literature, science, and so on. The science books can be again categorized into physics, chemistry, biology, and so on.

In the faceted search, books as a whole can be considered as a facet. Books can be broadly categorized based on a subject, such as, art, literature, science, and so on, and are considered as the root nodes of the books facet. The science category of books can be further sub-categorized into physics, chemistry, biology, and so on, and are considered as the child nodes of the science node. Thus, the user can refine the search result for the books facet, based on the science node, and further based on its child nodes, such as physics, chemistry, biology, and so on.

In Oracle SES, facets can be created based on the search attributes of String, Number, and Date data types.

Using Facets in Query Application

In the Oracle SES query application, the search result page shows the following facet navigation components that can be used for filtering the search result:

Facet Panel

A facet panel is a container for facets that displays all the facets configured in Oracle SES. A facet is a tree structure containing multiple facet nodes. For example, Books, Computer, and Location are various facets. The facet named Location can have various facet nodes, such as, Los Angeles, New York, and New Jersey. The facet named Computer can have various facet nodes, such as, Desktop, Laptop, and Netbook. You can select facet nodes from the facet panel to use it as a search criteria for filtering a search result.

Facet Navigation Bar

A facet navigation bar displays the complete facet node hierarchy currently selected by the user from the facet panel. It also shows the query term currently entered in the search text box. The search result is filtered based on the selected facet node hierarchy. Each selected facet node is used as a search criteria for filtering the search result.

Using Facets in Query API

Oracle SES query API provides the following methods for faceted search:

  • doOracleFacetSearch: This method returns faceted search results for a query. See "doOracleFacetSearch Message" for more information.

  • getFacetNodes: This method returns the facet child nodes for the specified facets having non-zero document count. See "getFacetNodes Message" for more information.

Configuring Facets using Administration GUI

The Oracle SES Administration GUI provides the following pages for configuring facets:

  • Global Settings - Facets

    This page lists all the facets defined in Oracle SES. Use this page to create, edit, and delete facets and facet nodes.

    Note:

    Oracle SES supports facets of String, Number, and Date types.
  • Global Settings - Translate Facet Name

    Use this page to translate facet names and facet node names in different languages.

    The language to which the facet name and facet node names need to be translated must be specified using two-letter language code adhering to ISO 639-1 standard, except for Chinese and Portuguese languages (use zh_CN for simplified Chinese, zh_TW for traditional Chinese, and pt_BR for Portuguese, Brazilian). Default is en, that is, English.

  • Global Settings - Configure Facets

    Use this page to enable or disable facets, and configure various ways to display facets in the query application, such as, ordering of facets, number of facets to display, number of facet nodes to display for each facet, ordering of facet nodes for each facet, and so on.

  • Global Settings - Configure Source Groups

    Use this page to configure about which facets to show in the query application page for a particular source group.

Configuring Facets using Administration API

The following Administration APIs can be used for configuring facets in Oracle SES:

  • facetTree: Use this object to create, update, delete, export, and translate facets and facet nodes.

  • queryUIFacets: Use this object to enable or disable facets, and to configure various display properties of facets, such as number of facets to display, sort order of facets, number of facet nodes to display for each facet, and sort order of facet nodes for each facet.

  • queryUISourceGroups: Use this object to configure about which facets to show in the query application page for a particular source group.

See Also:

Oracle Secure Enterprise Search Administration API Guide for more information about the Administration APIs for configuring facets.

Faceted Search in Federation Environment

A federation provides a unified framework to search different repositories that are crawled, indexed, and maintained separately. In a federation environment, Oracle SES aggregates the facet trees in a search result from all the Oracle SES instances, to show a unified facet search result. This is achieved using the concept of federation broker and federation endpoint. The facet nodes returned from all the Oracle SES instances (federation endpoints) in a federation are merged at the federation broker. Thus, in a faceted search in a federation environment:

  • the document count of the merged facet node is the sum of the document counts of all the facet nodes from all the federation endpoints.

  • the facet node names of all the merged facet nodes is the union of all the facet node names from all the federation endpoints.

  • the document count for the merged facet node is the sum of the document counts of all the facet nodes from all the federation endpoints that have the same facet node name.

Note:

For the faceted search to work in a federation environment:
  • facets defined at all the endpoints in a federation must also be defined at the broker.

  • a facet defined at a broker and at all the endpoints in a federation, must have the same facet tree paths, the same facet names and facet node names, and the same facet node values, that is, a facet defined in a federation environment must have the same configuration across all the Oracle SES instances. The facet node names are treated as case sensitive, for example, the facet node SMALLNUM(60-1000) defined at a broker and the facet node smallNum(60-1000) defined at an endpoint are treated as different facet nodes.

  • the infosource search functionality must be disabled, that is, infosource search and faceted search cannot be used together in a federation environment.

Faceted Search Result Count versus Normal Search Result Count

Oracle SES default query application shows search result count on the upper-right corner of the search result page. When faceted navigation is enabled, the facet tree nodes also show document count for each of the nodes. In some scenarios, these two counts will be different for the same query. The following are some of the main scenarios when the document count of the normal query search result will be different from the document count of the faceted query search result.

Note:

The search result count in the Oracle SES query application by default is calculated based on the Approximate count setting of the Hit Count Method parameter present on the Global Settings - Query Configuration page of the Administration GUI. When the faceted search is used, the search result count is calculated based on the Exact count setting of the Hit Count Method parameter.

Scenario 1

The faceted search result count supports multiple values for a given facet. It is possible that a document can be counted more than once for a faceted search result count, and in that case, the faceted search result count will be different from the normal search result count. For example, if author is a facet and there are two authors for a given document, then for that document the author facet search result count will be two, with one count for each author, whereas the normal search result count for that document will be one.

Scenario 2

The Oracle SES query application handles similar and duplicate documents based on the value of the Similar Document Handling parameter present on the Global Settings - Query UI Configuration page of the Administration GUI. When this parameter value is set to Detect, then on the normal search result page, duplicate documents are not shown, and similar documents are grouped together and are displayed only when the Similar Documents link is clicked. When this parameter value is set to Remove, then on the normal search result page, duplicate documents are not shown, and a single document is shown in place of multiple similar documents.

Thus, when the value of the Similar Document Handling parameter is either Detect or Remove, the normal search result count will exclude similar and duplicate documents. But, the faceted search result count will always include similar and duplicate documents. This will lead to the discrepancy between the normal search result count and the faceted search result count. To avoid this discrepancy, the value of the Similar Document Handling parameter should be set to Disabled, but in that case, similar as well as duplicate documents will be displayed in the normal search result.

Configuring and Using Tags in Search Results

By assigning tags to a document, you can classify a document into multiple categories. This enables searching relevant documents based on certain tags or categories. For example, a company's quarterly earnings report can be tagged as "financial report" or "company report". So, when you search using any of these two tags as query terms, you will get the company's quarterly earnings report as one of the top-N results.

Configuring Tags using Administration GUI

You can specify the following configuration parameters in the Search - Tagging page of Administration GUI related to the tagging functionality:

Tagging Mode

Select Enabled to enable the tagging functionality. You can enable the tagging functionality for all the users (anonymous users), or for a group of users, or only for all the logged-in users.

Note:

  • All types of users (Oracle SES admin user, Oracle SES non-admin user, and anonymous user) can add tags to documents.

  • An admin user can remove all the tags.

  • A non-admin user can remove only those tags that are added by that user.

  • An anonymous user (user who is not logged-in) cannot delete tags.

Maximum Tags per Document

Maximum number of tags that can be assigned to a document (not specific to a user).

Maximum Tags per Session

Maximum number of tags that can be added in a session.

Cut-off (days)

Number of days for which any tag should be available in the query application, even if it is not being used. When the number of days specified for Cut-off (days) elapse, the tags that are unused for these number of days are removed from the query application.

Configuring Tags using Administration API

You can use the tagging and tag objects of Administration API to configure and upload tags in Oracle SES.

See Also:

Oracle Secure Enterprise Search Administration API Guide for more information about the tagging and tag objects.

Configuring Tags using searchadminctl Tool

You can use the searchadmin command-line tool to perform the following operations related to tags:

Table 5-2 Tag operations using searchadmin tool

Operation Syntax Description

Bulk Upload Tags

searchadmin createAll tag --INPUT_FILE=<XML file containing tags to upload>

Uploads tags in bulk. Tags to upload are specified in an XML file.

Returns CREATE_SUCCEEDED for each tag that is successfully created.

Returns DUPLICATE_IGNORED for each tag that already exists.

Sample XML file for bulk upload:

<?xml version="1.0"?>
<search:config productVersion="11.2.2.2.0" xmlns:search="http://xmlns.oracle.com/search">
  <search:tags>
    <search:tag>

      <search:name>
        oses
      </search:name>

      <search:docUrl>
        http://www.oracle.com/xyz.html
      </search:docUrl>

      <search:owner>
         abc@oracle.com
      </search:owner>

    </search:tag>
  </search:tags>
</search:config>
 

The following is the description for various tag elements:

  • tag: Contains details for a tag.

  • name: Name of a tag.

  • docURL: Document to tag (in URL form).

  • owner: Owner of the document.

Delete All Tags

searchadmin deleteAll tag [-s]

Deletes all tags.

Use the -s option to ignore the invalid state errors during the delete operation.

Returns DELETE_SUCCEEDED for each tag that is successfully deleted.

Delete Specific Tags

searchadmin deleteList tag -k <Text file containing tags to delete> [-s] [-f]

Deletes specific tags.

Use -f option to ignore error, if a tag to be deleted is not found.

Returns NOT_FOUND_IGNORED, if a tag to be deleted is not found.

Sample text file containing tags to delete:

--NAME=oses23 --DOC_URL=http://stanc17:7777/testdata/seedurls/duplicate/near.html --OWNER=public

--NAME=oses24 --DOC_URL=http://stanc17:7777/testdata/seedurls/duplicate/identical.html --OWNER=public

Export All Tags

searchadmin exportAll tag -o <Text file to store exported tags>

Exports all the tags.

Export Specific Tags

searchadmin exportList tag -k <Text file containing tags to export> -o <Text file to store exported tags>

Exports the tags specified in an input file (using -k parameter). The input file can contain special operators to export tags based on regular expressions.

Sample text file containing tags to export (this file is similar to the delete tags file):

--NAME=oses23 --DOC_URL=http://stanc17:7777/testdata/seedurls/duplicate/near.html --OWNER=public

--NAME=oses24 --DOC_URL=http://stanc17:7777/testdata/seedurls/duplicate/identical.html --OWNER=public

Activate Tagging

searchadmin activate tagging

Enables the tagging functionality.

Default tagging mode is set to allow all the logged-in users to tag results.

If this command is executed when tagging is already activated, INVALID_STATE_IGNORED is returned.

Deactivate Tagging

searchadmin deactivate tagging

Disables the tagging functionality.

Update Tagging Configuration

searchadmin update tagging -i <XML file containing the tagging configuration>

Updates the tagging configuration specified in an XML file.

Returns UPDATE_SUCCEEDED, if the update operation is successful.

If you try to deactivate tagging using update, INVALID_STATE_IGNORED is returned.

Sample XML file containing tagging configuration:

<?xml version="1.0" encoding="UTF-8"?>
<search:config productVersion="11.2.2.2.0" xmlns:search="http://xmlns.oracle.com/search">
   <search:tagging>
<search:maxTagPerDoc>100</search:maxTagPerDoc>
<search:maxTagPerSession>1000</search:maxTagPerSession>
<search:tagCleanupInterval>32</search:tagCleanupInterval>
<search:authorizationMode><search:allUsers></search:authorizationMode>
   </search:tagging>
</search:config>
 

The following is the description for each tagging configuration:

  • maxTagPerDoc: Maximum number of tags that can be assigned to a document (not specific to a user).

  • maxTagPerSession: Maximum number of tags that can be added in a session.

  • tagCleanupInterval: Number of days for which any tag should be available in the query application, even if it is not being used. When the number of days specified in tagCleanupInterval elapse, the tags that are unused for specified number of days are removed from Oracle SES.

  • authorizationMode: This paraemeter controls who is allowed to use the tagging feature. The following are the valid values for the authorizationMode parameter:

    disabled - Tagging is disabled for all the users.

    loggedInUsers - Tagging is enabled only for the users who are logged-in.

    authorizedPrincipals - Tagging is enabled only for the specific users having tagging privilege specified using the authorizedPrincipal object.

    allUsers - Tagging is enabled for all the users (anonymous tagging).


Assigning Tags to Documents in Search Results Page

If the tagging feature is enabled, then a tag icon is displayed beside all the documents in the search result page, along with the tags that are already assigned to the documents. You can click the tag icon to assign or remove tags for a document.

Configuring Default Sort Conditions

The default sort conditions for search results can be configured in the Default Sort Conditions section on the Global Settings - Query Configuration page in the Administration GUI. In this page, multiple sort conditions are specified by specifying sortable attributes with their corresponding sort order, such as ascending or descending. The search results get sorted according to the order in which the sortable attributes are configured in this page.

The default sort conditions for search results for each source group can also be configured in the Administration GUI, in the Absolute Sorting tab on the Configure Source Groups page for a specific source group. In this page, you can either specify to use the same default sort conditions that are configured in the Default Sort Conditions section on the Global Settings - Query Configuration page, or specify different default sort conditions.

Note:

Source group specific default sort conditions specified on the Configure Source Groups page take precedence over the generic default sort conditions specified on the Query Configuration page.

Configuring Top-N Documents and Group/Sort Attributes

The default number of documents to retrieve for top-N results, the maximum number of documents to retrieve for top-N results, the groupable attributes, and the sortable attributes can be configured either in the Administration GUI, on the Global Settings - Query UI Configuration page, or using the Administration API object queryUIConfig.

Top-N Documents

The default top-N documents setting Default Number of Results on the Query UI Configuration page represents the number of documents retrieved by default as part of the AJAX call for result clustering, grouping, and sorting.

To page through a very large result set, say 500 documents, the user may view a page of results beyond the default top-N value. Suppose top-N is set to the default 100, and the user wants to view the results numbered 150-160. To provide result clustering and sorting/grouping, the browser must request 160 results. If the user views page 490-500, then the browser would be requesting 500 results through the AJAX call. This may result in reduced performance.

The maximum top-N documents setting Maximum Number of Results on the Query UI Configuration page represents a threshold above which the query application displays only a single page of results.

This mode does not provide any sorting, grouping, or result clustering. However, it lets a user to view the entire result set without the costly subsequent retrievals of top-N results.

Suppose maximum top-N is 200. If a user is viewing results 30-40, then the browser would retrieve the default of 100 results. If the user views results 170-180, then the browser would request 180 documents. If the user views results above 200, then the query application would display only the current page of results.

See Also:

Oracle Secure Enterprise Search Administration API Guide for more information about the Administration API object queryUIConfig for specifying default top-N documents and maximum top-N documents for search results.

Group By and Sort By Lists

Oracle SES query application displays a set of attributes by default in the Group by and Sort by lists.

Default grouping attributes:

  • (none)

  • Author

  • File Format

  • Source

  • Date

Default sorting attributes:

  • Author

  • File Format

  • Title

  • Relevance

  • Path

  • Language

  • Date

You can also use the Global Settings - Configure Source Groups page of Administration GUI to configure source group specific set of sortable attributes to display in the Group by and Sort by lists. Refer to "Configuring and Using Source Groups in Search Results" for more information.

Configuring Sort Criteria using Sortable Attributes (Absolute Sort)

You can define any of the Oracle SES attributes as sortable attributes. You can use these sortable attributes to define a global sort criteria as well as source group specific sort criteria. The sortable attributes can also be used in the Sort by list of the query application for performing global sort as well as source group specific sort.

Configuring Sortable Attributes for Search Criteria in Administration GUI

You can define the search attributes in the Global Settings - Search Attributes page. In this page you can specify whether the search attribute should be used as a sortable attribute.

You can use the sortable attributes for defining the default sort criteria for search results in the Default Sort Conditions section of the Global Settings - Query Configuration page. Apart from the sortable attributes, you can also specify Relevance (System) and Absolute Date (System) as part of the sort criteria.

You can specify the sortable attributes to be available in the Sort by list of the query application for Absolute sort, by configuring them in the Absolute Sorting section of the Global Settings - Query UI Configuration page. If at least one sortable attribute is specified, then by default, Relevance is also added to the Sort by list for Absolute sort.

You can also use the sortable attributes for defining source group specific sort criteria:

  • Default sort criteria: This can be done by updating the Default Sort Conditions section in the Absolute Sorting tab of the Edit Source Group page.

  • Sort by list sort criteria: This can be done by updating the Sortable Attributes section in the Absolute Sorting tab of the Edit Source Group page.

Configuring Sortable Attributes for Search Criteria in Administration API

The following are the Administration APIs where you can specify the sortable attributes for using them as a sort criteria for a query:

  • relevanceRanking object: You can specify a default sort criteria for the search results using the sortCondition element of the relevanceRanking object.

  • queryUIConfig object: You can specify the sortable attributes to show in the Sort by list box of the query application by using the absoluteSorting element of the queryUIConfig object.

  • queryUISourceGroups object: You can specify the source group specific default sort criteria as well as source group specific Sort by list sort criteria for the search results in the query application by using the defaultSortCondition element and the sortabeAttrs element of the queryUISourceGroups object.

See Also:

Oracle Secure Enterprise Search Administration API Guide for more information about the Administration APIs for using sortable attributes as search criteria.

Configuring and Using Source Groups in Search Results

You can use the Global Settings - Query UI Configuration page of Administration GUI to specify the source groups to display in the query application.

You can also use the Global Settings - Configure Source Groups page of Administration GUI to configure the display properties of facets, cluster trees, top-n sortable attributes, top-n groupable attributes, and Absolute sorting criteria (default sort conditions and sortable attributes) for a source group that is enabled to be displayed in the query application.

The following are the various configuration settings on the Global Settings - Configure Source Groups page.

Facets

You can specify particular facets or all the facets to display in the query application for a source group.

Cluster Trees

You can specify particular cluster trees or all the cluster trees to display in the query application for a source group.

Top-N Sortable Attributes

You can specify particular sortable attributes or all the default sortable attributes to display in the Sort by list in the query application for a source group.

Top-N Groupable Attributes

You can specify particular groupable attributes or all the default groupable attributes to display in the Group by list in the query application for a source group.

Absolute Sorting

You can specify the default sort conditions to apply on search results for a source group. When multiple sort conditions are specified, the search results are ordered according to the order in which the sortable attributes are specified in the configuration page.

You can either specify the same default sort conditions that are configured in the Query Configuration page, or specify new sort conditions by adding multiple sortable attributes from the list of sortable attributes. The sortable attributes list also contains Relevance (System) to allow sorting based on relevance, and Absolute Date (System) to allow sorting based on the last modified date of the documents. You can also specify the order of sorting for each sortable attribute, that is, either ascending or descending.

You can also specify the sortable attributes to be displayed in the Sort by list on the query page for a source group for Absolute sorting. You can either specify the same sortable attributes that are configured in the Query UI Configuration page, or specify new sort conditions by selecting multiple sortable attributes from the list of available sortable attributes. When new sort conditions are defined, the first attribute from the sort conditions for absolute sort and Relevance is displayed in the Sort by list on the query page. If no sortable attributes are specified, the Sort by list for absolute sort is not displayed on the query page.

Note:

The Sort by list that is displayed for Absolute sorting is different from the Sort by list that is displayed for the Top-N sorting on the query page. Both these lists have the same name, that is, Sort by.

The Sort by list that is displayed near to the query box is for Absolute sorting, that is, it allows you to sort all the entries in a search result.

The Sort by list that is displayed just above the search results (and beside Group by list) is for Top-N sorting, that is, it allows you to sort only the top N entries in a search result.

Customizing the Relevancy of Search Results

You can customize the default Oracle SES ranking to create a more relevant search result list for your enterprise. Ranking is determined by default and custom attributes. Default attributes include title, keywords, description, and others. Different weights indicate the importance of each attribute for document relevancy. For example, Oracle SES gives more weight to titles than to keywords.

To customize the relevancy of search results, you can use the Administration API or the Query Web Services API.

Using Relevancy Boosting for URLs and Query Terms

Relevancy boosting lets the administrator influence the order of documents in the result list for a particular search. You might want to override the default results for the following reasons:

  • For a highly popular search, direct users to the best results

  • For a search that returns no results, direct users to some results

  • For a search that has no click-throughs, direct users to better results

In a search, each result is assigned a score that indicates how relevant the result is to the search; that is, how good a result it is. Sometimes you know the documents that are highly relevant to some search. For example, your company Web site could have a home page for XML (http://example.com/XML-is-great.htm), which you want to appear high in the results of any search for XML. You would boost the score of the XML home page to 100 for an XML search.

To boost the relevancy of search results for URLs and query terms, you can do the required configuration using::

  • the Relevancy page in the Search tab of Administration GUI

    Note:

    The Locate by Search operation on the Relevancy page works only with the unsecured sources.
  • the boostedUrl object of Administration API

See Also:

The boostedUrl object in the Oracle Secure Enterprise Search Administration API Guide