Customizing the Relevancy of Search Results

You can customize the default Oracle SES ranking to create a more relevant search result list for your enterprise. Ranking is determined by default and custom attributes. Default attributes include title, keywords, description, and others. Different weights indicate the importance of each attribute for document relevancy. For example, Oracle SES gives more weight to titles than to keywords.

To customize the relevancy of search results, you can use the Query Web Services API or ranking.xml to tune the weights of default attributes, or you can add custom attributes and set weights for those attributes. This topic discusses customizing relevancy in the Query Web Services API.

Customizing Relevancy in the Query Web Services API

The following is the signature of the method for advanced search:

public OracleSearchResult doOracleAdvancedSearch (
        String query,
        Integer startIndex,
        Integer docsRequested,
        Boolean dupRemoved,
        Boolean dupMarked,
        DataGroup groups[],
        String queryLang,
        String docLang,
        Boolean returnCount,
        String filterConnector,
        Filter filters[],
        Integer[] fetchAttributes,
        String searchControls)  throws Exception

The searchControls parameter accepts a XML string, which include the filter and ranking elements.

<searchControls>
     <filter>  ... </filter>
     <ranking> ... </ranking>
</searchControls>

Filter Element

Filters for attribute search are passed in the filter element. All the various AND and OR conditions on the attributes are specified in the XML. For example:

<filter>
   <operator type="and">
   <operator type="or">
      <attributefilter name="xxx" type="string" operation="equals" value="ttt"/>
      <attributefilter name="yyy" type="number" 
         operation="greaterthan" value="22"/>
...
   </operator>
...
      <attributefiler name="aaa" type="number" operation="equals" value="22"/>
...
   </operator>
</filter

If the parameter searchControls is null, then filters and filterConnector are used to create advanced search; otherwise, they are ignored.

Ranking Element

The ranking XML string is expressed as ranking element in searchControls. The following is an example of ranking element:

<ranking>
   <global-settings>
      <enable-all-default-factor>TRUE</enable-all-default-factor>
   </global-settings>
   <default-factor>
      <!--default ranking factor -- >
      ...
   </default-factor>
   <default-factor>
      <!--default ranking factor -- >
      ...
   </default-factor>
   <custom-factor>
      <!--default ranking factor -- >
      ...
   </custom-factor>
   <custom-factor>
      <!--default ranking factor -- >
      ...
   </custom-factor>
</ranking>

The following rules apply to the construction of ranking XML string:

  • The whole ranking XML can be null, in which case default ranking is used.

  • The ranking XML contains the elements default-factor and custom-factor. Both can be null or absent at the same time.

  • When default-factor is null or absent and when custom-factor is not null, default ranking is used with the effect of custom-factor.

  • When custom-factor is null or absent, it does not have any impact on the ranking.

  • The ranking scheme applies only for the function doOracleAdvancedSearch call with none-empty query parameter passed.

Global-Settings Element

The global-settings element contains parameter settings across ranking factors. It has the following two sub-elements:

  • enable-all-default-factor

    The ranking element has an attribute called enable-all-default-factor, which accepts two values: true or false. (When this attribute is absent, true is taken as the default value.)

    When enable-all-default-factor is true, all default attributes are included in ranking queries, unless some default attributes are explicitly excluded in default-factor elements.

    When enable-all-default-factor is false, all default attributes are excluded in ranking queries, unless some default attributes are explicitly included in default-factor elements.

Default-Factor Element

The default-factor element assigns a weight to an attribute.

<default-factor>
   <name>title</name>
   <weight>VERY HIGH</weight>
</default-factor>

Default factor attribute names are case-insensitive.

When a default-factor does not appear in the ranking XML string, Oracle SES takes the default weight for this ranking factor, unless default factors are disabled by enable-all-default-factor.

Oracle SES supports the following values for weight element: empty (Oracle SES uses the default weight), none (this attributes is not used in the ranking query), very high, high, medium, low, and very low.

Table 5-4 lists the default-factor names and weights:

Table 5-4 Oracle SES Default Attributes and Weights

Attribute Weight

Title

High

Description

Medium

Reftext

High

Keywords

Medium

Subject

Low

Author

Medium

H1headline

Low

H2headline

Very low

Url

Low

Urldepth

High

Language Match

High

Linkscore

High


Custom-Factor Element

The custom-factor element lets you add more attributes for ranking. Any indexed search attribute can be a custom ranking attribute.

Note:

Adding custom attributes for relevancy ranking can downgrade search performance.

The custom-factor element has four elements: attribute-name, attribute-type, factor-type, and weight (or match depending on the factor-type).

<custom-factor>
            <attribute-name>author manager</attribute-name>
            <attribute-type>STRING</attribute-type>
            <factor-type>QUERY_FACTOR</factor-type>
            <weight>LOW</weight> 
</custom-factor>

or

<custom-factor>
            <attribute-name>document quality</attribute-name>
            <attribute-type>STRING</attribute-type>
            <factor-type>STATIC_FACTOR<factor-type>
            <match>
            <value>good</value>
            <weight>HIGH</weight> 
            </match>
            <match>
            <value>fair</value>
            <weight>MEDIUM</weight> 
            </match>
            <match>
            <value>bad</value>
            <weight>VERY LOW</weight> 
            </match>
</custom-factor>
  • The attribute-name values are literally matched against attribute name in Oracle SES. Any indexed search attribute name can be attribute-name value. The value of the attribute-name element is case-insensitive.

  • The attribute-type element defines the type of the attribute. Only String attribute type is supported. Attribute-name and attribute-type in combination define a valid Oracle SES attribute.

  • For factor-type, Oracle SES supports two types of ranking for custom ranking attributes.

    • QUERY_FACTOR: The attribute value is matched against query terms. A positive match boosts the document based on specified weight. QUERY_FACTOR is a query-based ranking factor; for example, title and reftext. The weight element should appear for this custom ranking factor. For example, with the query "Roger Federer", if a document has a custom attribute publisher with the value "Roger Federer", then it could be relevant.

    • STATIC_FACTOR: Attribute value is matched against fixed values specified in the custom ranking factor. (The match element should appear for this custom ranking factor.) STATIC_FACTOR is not a query-based ranking factor. The fixed values specify qualities of the documents, such as the link score and the sources of documents. For example, assume that documents have been classified based on quality. Well-written documents are classified as good, and poorly-written documents are classified as bad. A good document should be ranked higher than a bad document, even though they are both matched against a query. You can specify in the API that a document having a good quality should be boosted in relevancy by a specified weight.

  • The match element specifies the match values and corresponding match weights when the factor-type is STATIC_FACTOR. The following XML string is a example of match element:

    <match>
    <value>bad</value>
    <weight>VERY LOW</weight> 
    </match>
    
  • The value element is used to match the corresponding attribute value of this ranking factor. Only alphanumeric letters are allowed in the attribute value. The match is case-insensitive.

  • The weight element has the identical syntax with weight element for default ranking element.

Applying Ranking Factors

The XML ranking text can be applied in two places:

  • As a part of the searchControls element, the ranking factors can be used as an advanced control for each query execution through the Web services method. This is called per-query ranking control.

  • As a separate file in the ORACLE_HOME/search/webapp/config directory, the ranking.xml configuration file is read and applied each time OC4J is started. The ranking factors specified in the configuration file are applied to all queries. This is called instance-wide ranking control.

In federated search, instance-wide ranking controls only applies to one instance. You must configure each instance for ranking customization separately.

If a conflict arises, the per-query ranking control specified in Web services method overrides the settings specified in instance-wide ranking control. That can include the following cases:

  • Per-query and instance-wide ranking specify the same factor, the factor set by per-query is taken by Oracle SES.

  • Instance-wide ranking control sets a ranking factor, but per-query ranking control does not mention. Oracle SES takes the factor set by instance-wide ranking control.

  • Per-query ranking control sets a ranking factor, which instance-wide ranking controls does not mention. Oracle SES takes the factor set by per-query ranking control.

  • If instance-wide ranking control sets enable-all-default-factor as false and per-query ranking control sets enable-all-default-factor as true, then Oracle SES takes the default attributes set explicitly by instance-wide ranking control plus the attributes set by per-query ranking controls, with the latter overriding the former.