15.2 Defining Sections As Facets

SDATA refers to structured data. Group counts or facets are supported for SDATA sections that you create with the optimized_for attribute set to either ‘search’ or ‘sort and search’. In the MULTI_COLUMN_DATASTORE preference, when data appears between tags or columns that are specified as optimized_for search SDATA, the data is automatically indexed as the facet data. Any data that does not match its declared type is handled according to the same framework that currently handles indexing errors for a specific row.

Examples

In the following statements, some tagged data is inserted into a VARCHAR2 column of a table. You can later define SDATA sections to collect the data based on the tags used here.

  • Binary float or binary double with tag price:

    insert into mytab values (1, 'red marble' <price>1.23</price>');
  • Time stamp with tag T:

    insert into mytab values (1,'blue marbles <T>2012-12-05T05:20:00</T>');

In the following statements, a section group is created and various SDATA section groups are added. The section definition includes the section group to which it belongs, the name of the section, the tag to be looked for, and the data type.

exec ctx_ddl.create_section_group('sg','BASIC_SECTION_GROUP')
exec ctx_ddl.add_SDATA_section('sg','sec01','name', 'varchar2')
exec ctx_ddl.add_SDATA_section('sg','sec02','count', 'number')
exec ctx_ddl.add_SDATA_section('sg','sec03','date', 'date')
exec ctx_ddl.add_SDATA_section('sg','sec04','timestamp', 'timestamp')
exec ctx_ddl.add_SDATA_section('sg','sec05','new price', 'binary_double')
exec ctx_ddl.add_SDATA_section('sg','sec06','old price','binary_float')
exec ctx_ddl.add_SDATA_section('sg','sec07','timestamp','timestamp with time zone')

The name given to the facet is ‘sec01’ and the ‘name’ tag is the actual tag name that occurs inside the document that is to be indexed. The ‘date’, ‘timestamp’, and ‘timestamp with time zone’ data types require the input data to be in the standard ISO format.

See Also:

Oracle Database Globalization Support Guide for more information about the standard ISO formats

Example 15-1 Using Faceted Navigation

The following statements create a table named products:

drop table products;

create table products(name varchar2(60), vendor varchar2(60), rating number, price number, mydate date);

The following statement inserts values into products:

insert all
 into products values ('cherry red shoes', 'first vendor', 5, 129, sysdate)
 into products values ('bright red shoes', 'first vendor', 4, 109, sysdate)
 into products values ('more red shoes', 'second vendor', 5, 129, sysdate)
 into products values ('shoes', 'third vendor', 5, 109, sysdate)
select * from dual;

The following statements create a MULTI_COLUMN_DATASTORE preference named ds to bring various other columns into the index (name) to be used as facets:

/*A MULTI_COLUMN_DATASTORE automatically adds tags by default so that the text to be indexed looks like 
'<name>cherry red shoes</name><vendor>first vendor</vendor><rating> .... '*/

exec ctx_ddl.drop_preference  ('ds')
exec ctx_ddl.create_preference('ds', 'MULTI_COLUMN_DATASTORE')
exec ctx_ddl.set_attribute    ('ds', 'COLUMNS', 'name, vendor, rating, price, mydate')

Note:

Oracle does not allow table columns with binary_float, binary_double, timestamp, and timestamp with timezone data types. It is therefore difficult to use such data types with MULTI_COLUMN_DATASTORE. You can still create facets if the document contains tagged data for these data types. Alternatively, you can convert 'timestamp' columns to 'date' and you can store binary_float and binary_double as 'number'.

The following statements create a section group named sg and enable the optimized_for search attribute for each column to be treated as a facet:

/* A Section Group is created to specify the data type of each column (varchar2 is the default) and 
how each column that is brought into the index should be used.*/


exec ctx_ddl.drop_section_group   ('sg')
exec ctx_ddl.create_section_group ('sg', 'BASIC_SECTION_GROUP')

exec ctx_ddl.add_sdata_section    ('sg', 'vendor', 'vendor', 'VARCHAR2')
exec ctx_ddl.add_sdata_section    ('sg', 'rating', 'rating', 'NUMBER')
exec ctx_ddl.add_sdata_section    ('sg', 'price', 'price', 'NUMBER')
exec ctx_ddl.add_sdata_section    ('sg', 'mydate', 'mydate', 'DATE')

exec ctx_ddl.set_section_attribute('sg', 'vendor', 'optimized_for', 'SEARCH')
exec ctx_ddl.set_section_attribute('sg', 'rating', 'optimized_for', 'SEARCH')
exec ctx_ddl.set_section_attribute('sg', 'price',  'optimized_for', 'SEARCH')
exec ctx_ddl.set_section_attribute('sg', 'mydate', 'optimized_for', 'SEARCH')

The following statement creates an index on name and specifies the preferences by using the PARAMETERS clause:

CREATE INDEX product_index ON products (name)
INDEXTYPE IS ctxsys.context
PARAMETERS ('datastore ds section group sg');

The following statements query for a product name, ‘red shoes’ and the facets for computation can be specified. The count attribute shows the total number of items that match the query for the product. The Result Set Interface specifies various requirements, such as the top vendors that have the largest number of matching items, the lowest available prices, and the latest arrivals:

set long 500000
set pagesize 0

variable displayrs clob;

declare
  rs clob;
begin
   ctx_query.result_set('product_index', 'red shoes', '<ctx_result_set_descriptor>
	 	<count/>
	 	<group sdata="vendor" topn="5" sortby="count" order="desc">
	 	<count exact="true"/>
	 	</group>
	 	<group sdata="price" topn="3" sortby="value" order="asc">
	 	<count exact="true"/>
	 	</group>
	 	<group sdata="mydate" topn="3" sortby="value" order="desc">
	 	<count exact="true"/>
	 	</group>
	 	</ctx_result_set_descriptor>',
	 	rs);

/* Pretty-print the result set (rs) for display purposes. 
It is not required if you are going to manipulate it in XML.*/

   select xmlserialize(Document XMLType(rs) as clob indent size=2) into :displayrs from dual;
   dbms_lob.freetemporary(rs);
end;
/
select :displayrs from dual;

The following is output:

<ctx_result_set>                                                                
  <count>3</count>                                                              

  <groups sdata="VENDOR">                                                       
    <group value="first vendor">                                                  
      <count>2</count>                                                          
    </group>                                                                    
    <group value="second vendor">                                                    
      <count>1</count>                                                          
    </group>                                                                    
  </groups>                                                                     

  <groups sdata="PRICE">                                                        
    <group value="109">                                                         
      <count>1</count>                                                          
    </group>                                                                    
    <group value="129">                                                         
      <count>2</count>                                                          
    </group>                                                                    
  </groups>                                                                     

  <groups sdata="MYDATE">                                                       
    <group value="2017-12-06 05:44:54">                                         
      <count>3</count>                                                          
    </group>                                                                    
  </groups>                                                                     
</ctx_result_set>