Using Faceted Navigation

14.1 About Faceted Navigation

This feature implements group counts, also known as facets, which are frequently used in e-commerce or catalog applications. In various applications, it is preferable not only to display the list of hits returned by a query, but also to categorize the results.

For example, an e-commerce application wants to display all products matching a query for the term management along with faceting information. The facets include ‘type of product’ (books or DVDs), ‘author’, and ‘date’. For each facet, the application displays the unique values (books or DVDs) and their counts. You can quickly assess that most of the product offerings of interest fall under the ‘books’ category. You can further refine the search by selecting the ‘books’ value under ‘type of product’.

A group count is defined as the number of documents that have a certain value. If a value is repeated within the same document, the document contributes a count of 1 to the total group count for the value. Group counts or facets are supported for SDATA sections that use optimized_for search SDATA. To request a computation of facets for a query, use the Result Set Interface.

14.2 Defining Sections As Facets

SDATA refers to structured data. Group counts or facets are supported for SDATA sections that you create with the optimized_for attribute set to either ‘search’ or ‘sort and search’. In the MULTI_COLUMN_DATASTORE preference, when data appears between tags or columns that are specified as optimized_for search SDATA, the data is automatically indexed as the facet data. Any data that does not match its declared type is handled according to the same framework that currently handles indexing errors for a specific row.

Examples

In the following statements, some tagged data is inserted into a VARCHAR2 column of a table. You can later define SDATA sections to collect the data based on the tags used here.

Binary float or binary double with tag price:
insert into mytab values (1, 'red marble' <price>1.23</price>');
Time stamp with tag T:
insert into mytab values (1,'blue marbles <T>2012-12-05T05:20:00</T>');

In the following statements, a section group is created and various SDATA section groups are added. The section definition includes the section group to which it belongs, the name of the section, the tag to be looked for, and the data type.

exec ctx_ddl.create_section_group('sg','BASIC_SECTION_GROUP')
exec ctx_ddl.add_SDATA_section('sg','sec01','name', 'varchar2')
exec ctx_ddl.add_SDATA_section('sg','sec02','count', 'number')
exec ctx_ddl.add_SDATA_section('sg','sec03','date', 'date')
exec ctx_ddl.add_SDATA_section('sg','sec04','timestamp', 'timestamp')
exec ctx_ddl.add_SDATA_section('sg','sec05','new price', 'binary_double')
exec ctx_ddl.add_SDATA_section('sg','sec06','old price','binary_float')
exec ctx_ddl.add_SDATA_section('sg','sec07','timestamp','timestamp with time zone')

The name given to the facet is ‘sec01’ and the ‘name’ tag is the actual tag name that occurs inside the document that is to be indexed. The ‘date’, ‘timestamp’, and ‘timestamp with time zone’ data types require the input data to be in the standard ISO format.

See Also:

Oracle Database Globalization Support Guide for more information about the standard ISO formats

Example 14-1 Using Faceted Navigation

The following statements create a table named products:

drop table products;

create table products(name varchar2(60), vendor varchar2(60), rating number, price number, mydate date);

The following statement inserts values into products:

insert all
 into products values ('cherry red shoes', 'first vendor', 5, 129, sysdate)
 into products values ('bright red shoes', 'first vendor', 4, 109, sysdate)
 into products values ('more red shoes', 'second vendor', 5, 129, sysdate)
 into products values ('shoes', 'third vendor', 5, 109, sysdate)
select * from dual;

The following statements create a MULTI_COLUMN_DATASTORE preference named ds to bring various other columns into the index (name) to be used as facets:

/*A MULTI_COLUMN_DATASTORE automatically adds tags by default so that the text to be indexed looks like 
'<name>cherry red shoes</name><vendor>first vendor</vendor><rating> .... '*/

exec ctx_ddl.drop_preference  ('ds')
exec ctx_ddl.create_preference('ds', 'MULTI_COLUMN_DATASTORE')
exec ctx_ddl.set_attribute    ('ds', 'COLUMNS', 'name, vendor, rating, price, mydate')

Note:

Oracle does not allow table columns with binary_float, binary_double, timestamp, and timestamp with timezone data types. It is therefore difficult to use such data types with MULTI_COLUMN_DATASTORE. You can still create facets if the document contains tagged data for these data types. Alternatively, you can convert 'timestamp' columns to 'date' and you can store binary_float and binary_double as 'number'.

The following statements create a section group named sg and enable the optimized_for search attribute for each column to be treated as a facet:

/* A Section Group is created to specify the data type of each column (varchar2 is the default) and 
how each column that is brought into the index should be used.*/


exec ctx_ddl.drop_section_group   ('sg')
exec ctx_ddl.create_section_group ('sg', 'BASIC_SECTION_GROUP')

exec ctx_ddl.add_sdata_section    ('sg', 'vendor', 'vendor', 'VARCHAR2')
exec ctx_ddl.add_sdata_section    ('sg', 'rating', 'rating', 'NUMBER')
exec ctx_ddl.add_sdata_section    ('sg', 'price', 'price', 'NUMBER')
exec ctx_ddl.add_sdata_section    ('sg', 'mydate', 'mydate', 'DATE')

exec ctx_ddl.set_section_attribute('sg', 'vendor', 'optimized_for', 'SEARCH')
exec ctx_ddl.set_section_attribute('sg', 'rating', 'optimized_for', 'SEARCH')
exec ctx_ddl.set_section_attribute('sg', 'price',  'optimized_for', 'SEARCH')
exec ctx_ddl.set_section_attribute('sg', 'mydate', 'optimized_for', 'SEARCH')

The following statement creates an index on name and specifies the preferences by using the PARAMETERS clause:

CREATE INDEX product_index ON products (name)
INDEXTYPE IS ctxsys.context
PARAMETERS ('datastore ds section group sg');

The following statements query for a product name, ‘red shoes’ and the facets for computation can be specified. The count attribute shows the total number of items that match the query for the product. The Result Set Interface specifies various requirements, such as the top vendors that have the largest number of matching items, the lowest available prices, and the latest arrivals:

set long 500000
set pagesize 0

variable displayrs clob;

declare
  rs clob;
begin
   ctx_query.result_set('product_index', 'red shoes', '<ctx_result_set_descriptor>
	 	<count/>
	 	<group sdata="vendor" topn="5" sortby="count" order="desc">
	 	<count exact="true"/>
	 	</group>
	 	<group sdata="price" topn="3" sortby="value" order="asc">
	 	<count exact="true"/>
	 	</group>
	 	<group sdata="mydate" topn="3" sortby="value" order="desc">
	 	<count exact="true"/>
	 	</group>
	 	</ctx_result_set_descriptor>',
	 	rs);

/* Pretty-print the result set (rs) for display purposes. 
It is not required if you are going to manipulate it in XML.*/

   select xmlserialize(Document XMLType(rs) as clob indent size=2) into :displayrs from dual;
   dbms_lob.freetemporary(rs);
end;
/
select :displayrs from dual;

The following is output:

<ctx_result_set>                                                                
  <count>3</count>                                                              

  <groups sdata="VENDOR">                                                       
    <group value="first vendor">                                                  
      <count>2</count>                                                          
    </group>                                                                    
    <group value="second vendor">                                                    
      <count>1</count>                                                          
    </group>                                                                    
  </groups>                                                                     

  <groups sdata="PRICE">                                                        
    <group value="109">                                                         
      <count>1</count>                                                          
    </group>                                                                    
    <group value="129">                                                         
      <count>2</count>                                                          
    </group>                                                                    
  </groups>                                                                     

  <groups sdata="MYDATE">                                                       
    <group value="2017-12-06 05:44:54">                                         
      <count>3</count>                                                          
    </group>                                                                    
  </groups>                                                                     
</ctx_result_set>

14.3 Querying Facets by Using the Result Set Interface

Starting with Oracle Database Release 18c, the group-counting operation for a specified list of facets is provided. You can obtain the group counts for each single value by using the bucketby attribute with its value set to single. The topn, sortby, and order attributes are also supported. Starting with Oracle Database Release 21c, you can obtain the group counts for a range of numeric and variable character facet values by using the range element, which is a child element of the group element.

bucketby Attribute

Valid attributes are single and custom.

The 'single' mode produces a list of all unique values for the facet and a document count for each value.
The 'custom' mode produces document counts for a range of numeric values.

count Element (Single Count)

In the following example, a few rows are inserted into the mytab table. Some rows have two values for the facet <B>, and some rows have a single value.

begin
 	insert into mytab values (1, '<B>1.234</B><B>5</B>');
 	insert into mytab values (2, '<B>1.432</B>');
 	insert into mytab values (3, '<B>2.432</B><B>6</B>');
 	insert into mytab values (4, '<B>2.432</B>');
end;

Single counts show each unique value and the number of documents that have this value:

<ctx_result_set>
	<groups sdata="SEC01">
		<group value="2.432"><count>2</count></group>
		<group value="1.234"><count>1</count></group>
		<group value="5"><count>1</count></group>
		<group value="6"><count>1</count></group>
		<group value="1.432"><count>1</count></group>
	</groups>
</ctx_result_set>

If document 1 is deleted, you see the following result:

<ctx_result_set>
	<groups sdata="SEC01">
		<group value="2.432"><count>2</count></group>
		<group value="6"><count>1</count></group>
		<group value="1.432"><count>1</count></group>
	</groups>
</ctx_result_set>

range Element

The range element supports start, greaterthan, end, and lessthan attributes. The start and greaterthan attributes specify the beginning value for the range. The end and lessthan attributes specify the ending value for the range.

Ranges can overlap each other. For example, <range start="1" end="2"/> and <range start="2" end="3"/>. Ranges can also be open ended. For example, you can specify only the start value or the end value. If you do not specify the attributes of the range element, all results are returned.

Example 14-2 Obtaining Group Counts for a Range of Facets

Create a table named products and populate it:

drop table products;

create table products(name varchar2(60), vendor varchar2(60), rating number, price number);

insert all
 into products values ('cherry red shoes', 'first vendor', 5, 129)
 into products values ('bright red shoes', 'first vendor', 4, 109)
 into products values ('more red shoes', 'second vendor', 5, 129)
 into products values ('shoes', 'third vendor', 5, 109)
 into products values ('dark red shoes', 'fourth vendor', 3, 98)
 into products values ('light red shoes', 'fifth vendor', 2, 49)
select * from dual;

Create a MULTI_COLUMN_DATASTORE preference named ds to bring various other columns into the index (name) to be used as facets:

exec ctx_ddl.drop_preference  ('ds')
exec ctx_ddl.create_preference('ds', 'MULTI_COLUMN_DATASTORE')
exec ctx_ddl.set_attribute    ('ds', 'COLUMNS', 'name, vendor, rating, price')

Create a section group named sg and enable the optimized_for search attribute for each column to be treated as a facet:

exec ctx_ddl.drop_section_group   ('sg')
exec ctx_ddl.create_section_group ('sg', 'BASIC_SECTION_GROUP')

exec ctx_ddl.add_sdata_section    ('sg', 'rating', 'rating', 'NUMBER')
exec ctx_ddl.add_sdata_section    ('sg', 'price', 'price', 'NUMBER')
exec ctx_ddl.add_sdata_section    ('sg', 'vendor', 'vendor', 'VARCHAR2')

exec ctx_ddl.set_section_attribute('sg', 'rating', 'optimized_for', 'SEARCH')
exec ctx_ddl.set_section_attribute('sg', 'price',  'optimized_for', 'SEARCH')
exec ctx_ddl.set_section_attribute('sg', 'vendor',  'optimized_for', 'SEARCH')

Create an index on name and specify the preferences by using the parameters clause:

create index mytab_idx on products (name)
indextype is ctxsys.context
parameters ('datastore ds section group sg');

Query for a product name, ‘red shoes’ by setting the bucketby attribute to custom and provide the values for the range element:

set long 500000
set pagesize 0

variable displayrs clob;

declare
  rs clob;
begin
     ctx_query.result_set('mytab_idx', 'red shoes', '<ctx_result_set_descriptor>
	 	<group sdata="rating" bucketby="custom">
          <range start="1" lessthan="10"/>
          <range start="10" lessthan="20"/>
          <range start="20"/>
        </group>
        <group sdata="price" bucketby="custom">
          <range end="1"/>
          <range greaterthan="1" end="10"/>
          <range greaterthan="10" end="100"/>
          <range greaterthan="100"/>
        </group>
        <group sdata="vendor" bucketby="custom">
          <range greaterthan="a"/>
          <range start="s"/>
          <range end="f"/>
        </group>
	</ctx_result_set_descriptor>',
	 	rs);

   select xmlserialize(Document XMLType(rs) as clob indent size=2) into :displayrs from dual;
   dbms_lob.freetemporary(rs);
end;
/
select :displayrs from dual;

The following is output:

<ctx_result_set>
  <groups sdata="RATING">
    <group value="range" start="1" lessthan="10">
      <count>5</count>
    </group>
    <group value="range" start="10" lessthan="20">
      <count>0</count>
    </group>
    <group value="range" start="20" end="5">
      <count>0</count>
    </group>
  </groups>
  <groups sdata="PRICE">
    <group value="range" start="49" end="1">
      <count>0</count>
    </group>
    <group value="range" greaterthan="1" end="10">
      <count>0</count>
    </group>
    <group value="range" greaterthan="10" end="100">
      <count>2</count>
    </group>
    <group value="range" greaterthan="100" end="129">
      <count>3</count>
    </group>
  </groups>
  <groups sdata="VENDOR">
    <group value="range" greaterthan="a" end="second vendor">
      <count>5</count>
    </group>
    <group value="range" start="s" end="second vendor">
      <count>1</count>
    </group>
    <group value="range" start="fifth vendor" end="f">
      <count>0</count>
    </group>
  </groups>
</ctx_result_set>

topn Attribute

Valid attribute values are non-negative numbers greater than zero.
This attribute specifies that only top n facet values and their counts are returned.
Group count determines the top n values to return unless the sortby attribute is set to value. In that case, the values are sorted according to the data type and the top n results of the sort are returned. The order attribute is respected for the sort.
By default, the results are sorted by the group count in descending order.
If a tie occurs in the count, the ordering of the facet values within this tie is not guaranteed.

sortby and order Attributes

sortby supports count and value attributes.

count sorts by group counts (numbers). This is the default.
value sorts by value depending on the data type.

order supports ASC (ascending order) and DESC (descending order), which is the default.

If there is no selection, the default is count DESC.

This example shows the grouping of a number facet if bucketby is set to single, where mytab_idx is the name of the index, text is the query, and group SDATA requests the facets:

begin
	 ctx_query.result_set('mytab_idx', 'text',
                        '<ctx_result_set_descriptor>
	 <group sdata="sec01" topn = "4" sortby = "value" order="asc" bucketby="single">
	 <count/>    
	 </group>
	 </ctx_result_set_descriptor>'
	 :rs);
end;

The following is a sample output showing that the values are listed in alphabetical order because the sortby attribute is set to value instead of count. The values are also displayed in ascending order (ABC to XYZ) because the order attribute is set to asc. Only four values are displayed because the topn attribute is set to 4.

<ctx_result_set>
	 <group SDATA="SEC01"> 
	 	 <group value="ABC"><count>2</count>
	 </group>
	 	 <group value="DEF"><count>1</count>
	 </group>
	 	 <group value="GHI"><count>10</count>
	 </group>
	 	 <group value="XYZ"><count>1</count>
	 </group>
</ctx_result_set>

14.4 Refining Queries by Using Facets As Filters

The facet implementation now supports CONTAINS queries with the standard set of database comparison operators available for SDATA. The following example is based on the ‘name’ varchar2 section. When you use it with numbers, do not use quotation marks around the numeric term to be searched.

contains (text, 'SDATA(sec01 = "run")', 1) > 0
contains (text, 'SDATA(sec01 > "run")', 1) > 0
contains (text, 'SDATA(sec01 >= "run")', 1) > 0
contains (text, 'SDATA(sec01 < "run")', 1) > 0
contains (text, 'SDATA(sec01 <= "run")', 1) > 0
contains (text, 'SDATA(sec01 <> "run")', 1) > 0
contains (text, 'SDATA(sec01 != "run")', 1) > 0
contains (text, 'SDATA(sec01 between "run1" and "run2")', 1) > 0
contains (text, 'SDATA(sec01 not between "run1" and "run2")', 1) > 0
contains (text, 'SDATA(sec01 is null)', 1) > 0
contains (text, 'SDATA(sec01 is not null)', 1) > 0
contains (text, 'SDATA(sec01 like "%run")', 1) > 0
contains (text, 'SDATA(sec01 like "run%")', 1) > 0
contains (text, 'SDATA(sec01 like "%run%")', 1) > 0
contains (text, 'SDATA(sec01 not like "%run")', 1) > 0
contains (text, 'SDATA(sec01 not like "run%")', 1) > 0
contains (text, 'SDATA(sec01 not like "%run%")', 1) > 0

contains (text, 'SDATA(sec02 = 9)', 1) > 0
contains (text, 'SDATA(sec02 < 10)', 1) > 0
contains (text, 'SDATA(sec02 between 2 and  20)', 1) > 0

The comparison operators behave according to the current optimized_for search SDATA behavior for the various data types.

14.5 Multivalued Facets

If multiple values are in an optimized for search SDATA section within the same document, then each value is indexed if the value is enclosed in its own tag corresponding to the SDATA section. Values that are not enclosed within separate section tags, but that appear together within the same section tag, are treated as a single value.

For example, in a document, <car>First Car, Second Car</car> is treated as a single string of value ‘First Car, Second Car’. However, <car>First Car</car><car>Second Car</car> is treated as two separate values for the document.