8
Searching XML Data with Oracle Text

This chapter describes the following aspects of Oracle Text (interMedia Text/Context):

How to create a section group and index your XML document(s)
How to build an XML query application with Oracle Text, to search and retrieve data from your XML document(s)

This chapter contains the following sections:

Introducing Oracle Text

Oracle Text is strictly a server-based implementation.

http://otn.oracle.com/products/text

Oracle Text can be used to search XML documents. It extends Oracle9i by indexing any text or document stored in Oracle. It can also search documents in the operating system (flat files) and URLs.

Oracle Text enables the following:

Content-based queries, such as, finding text and documents which contain particular words, using familiar, standard SQL.
File-based text applications to use Oracle9i to manage text and documents in an integrated fashion with traditional relational information.
Concept searching of English language documents.
Theme analysis of English language documents using the theme/gist package.
Highlighting hit words. With Oracle Text, you can render a document in different ways. For example, you can present documents with query terms highlighted, either the "words" of a word query or the "themes" of an ABOUT query in English. Use the CTX_DOC.MARKUP or HIGHLIGHT procedures for this.
With Oracle Text PL/SQL packages for document presentation and thesaurus maintenance.

Oracle Text is packaged with the other interMedia products, namely, image, audio, video, and geographic location services for web content management applications.

Users can query XML data stored in the database directly, without using Oracle Text. However, Oracle Text is useful for boosting query performance.

See Also::

Accessing Oracle Text

interMedia, including Oracle Text, is a standard feature that comes with every Oracle9i Standard, Enterprise, and Personal edition license. It needs to be selected during installation. No special installation instructions are required.

Oracle Text is essentially a set of schema objects owned by CTXSYS. These objects are linked to the Oracle kernel. The schema objects are present when you perform an Oracle9i installation.

Further Oracle Text Examples

You can find more examples for Oracle Text and for creating section group indexes at the following site: http://otn.oracle.com/products/text

Assumptions Made in this Chapter's Examples

XML text is aVARCHAR2 or CLOB type in an Oracle9i database table with character semantics. Oracle Text can also deal with documents in a file system or in URLs, but we are not considering these document types in this chapter.

To simplify the examples included in this chapter we consider a subset of the Oracle Text options. In this chapter's examples, w made the following assumptions:

All XML data here is represented using US-ASCII, a 7 bit character set.
Issues about whether a character such as "*" is treated as white space or as part of a word are not included.
Storage characteristics of the Oracle schema object that implement the TEXT index are not considered.
We focus here on the SECTION GROUP parameter in the CREATE INDEX or ALTER INDEX statement. The other parameter types available for CREATE INDEX and ALTER INDEX, are DATASTORE, FILTER, LEXER, STOPLIST, and WORDLIST.

See Also:
Oracle Text Reference , for more information on these parameter types.

Here is an example of using SECTION GROUP in CREATE INDEX:
```
CREATE INDEX my_index 
  ON my_table ( my_column ) 
  INDEXTYPE IS ctxsys.context 
  PARAMETERS ( 'SECTION GROUP my_section_group' ) ; 
```
Specifically, we focus on using AUTO_SECTION_GROUP and XML_SECTION_GROUP, and PATH_SECTION_GROUP.
Tagged or marked up data. In this chapter, we focus on how to handle XML data. Oracle Text handles many other kinds of data besides XML data.

See Also:
Oracle Text Application Developer's Guide

Oracle Text Users and Roles

With Oracle Text you can use the following users/roles:

user CTXSYS to administer users
role CTXAPP to create and delete Oracle Text preferences and use Oracle Text PL/SQL packages

User CTXSYS

This user is created at install time. Administer Oracle Text users as this user. It has the following privileges:

Modify system-defined preferences
Drop and modify other user preferences
Call procedures in the CTX_ADM PL/SQL package to start servers and set system-parameters
Start a ctxsrv server
Query all system-defined views
Perform all the tasks of a user with the CTXAPP role

Role CTXAPP

Any user can create an Oracle Text index and issue a Text query. For additional tasks, use the CTXAPP role. This is a system-defined role that allows you to perform the following tasks:

Create and delete Oracle Text preferences
Use Oracle Text PL/SQL packages, such as the CTX_DDL package

Querying with the CONTAINS Operator

Oracle Text's main purpose is to provide an implementation for the CONTAINS operator. The CONTAINS operator is used in the WHERE clause of a SELECT statement to specify the query expression for a Text query.

CONTAINS Syntax

Here is the CONTAINS syntax:

...WHERE CONTAINS([schema.]column,text_query VARCHAR2,[label NUMBER])

where:

Syntax	Description
[schema.]column	Specifies the text column to be searched on. This column must have a Text index associated with it.
text_query	Specifies the query expression that defines your search in column.
label	Optionally specifies the label that identifies the score generated by the CONTAINS operator.

Table 8-1 CONTAINS Operator: Syntax Description

Syntax Description

[schema.]column

Specifies the text column to be searched on. This column must have a Text index associated with it.

text_query

Specifies the query expression that defines your search in column.

label

Optionally specifies the label that identifies the score generated by the CONTAINS operator.

For each row selected, CONTAINS returns a number between 0 and 100 that indicates how relevant the document row is to the query. The number 0 means that Oracle found no matches in the row. You can obtain this score with the SCORE operator.

Note:

You must use the SCORE operator with a label to obtain this number.

Using a Simple SELECT Statement

The following example illustrates how the CONTAINS operator is used in a SELECT statement:

SELECT id FROM my_table
   WHERE
   CONTAINS (my_column, 'receipts') > 0

The 'receipts' parameter of the CONTAINS function is called the "Text Query Expression".

Note:

The SQL statement with the CONTAINS function requires a text index in order to run.

Using the Score Operator with a Label to Obtain the Relevance

The following example searches for all documents in the text column that contain the word Oracle. The score for each row is selected with the SCORE operator using a label of 1:

SELECT SCORE(1), title from newsindex 
           WHERE CONTAINS(text, 'oracle', 1) > 0 ORDER BY SCORE(1) DESC;

The CONTAINS operator must always be followed by the > 0 syntax. This specifies that the score value calculated by the CONTAINS operator must be greater than zero for the row selected.

When the SCORE operator is called, such as in a SELECT clause, the operator must reference the label value as shown in the example.

Using the WITHIN Operator to Narrow Query Down to Document Sections

When documents have internal structure such as in HTML and XML, you can define document sections using embedded tags before you index. This enables you to query within the sections using the WITHIN operator.

Note:

This is only true for XML_SECTION_GROUP, but not true for AUTO_ or PATH_SECTION_GROUP.

You define sections as part of a section group. Use the WITHIN operator to narrow queries down into document sections. Document sections can be any of the following:

Zone sections
Field sections
Attribute sections
Special sections (sentence or paragraph)

WITHIN Syntax for Section Querying

Here is the WITHIN syntax for querying sections:

expression WITHIN section

This searches for expression within a section. If you are using XML_SECTION_GROUP the following restrictions apply to the pre-defined zone, field, or attribute section:

If section is a zone, expression can contain one or more WITHIN operators (nested WITHIN) whose section is a zone or special section.
If section is a field or attribute section, expression cannot contain another WITHIN operator.

You can combine and nest WITHIN clauses. For finer grained searches of XML sections, you can use WITHIN clauses inside CONTAINS select statements.

WITHIN Operator Limitations

The WITHIN operator has the following limitations:

You cannot embed the WITHIN clause in a phrase. For example, you cannot write: term1 WITHIN section term2
You cannot combine WITHIN with expansion operators, such as $ ! and *.
Since WITHIN is a reserved word, you must escape the word with braces to search on it.
See Also:

Oracle Text Reference

You can query within attribute sections when you index with either XML_SECTION_GROUP, AUTO_SECTION_GROUP, or PATH_SECTION_GROUP your section group type.

Consider the following XML document:

<book title="Tale of Two Cities">It was the best of times.</book>

XML_SECTION_GROUP

If you use XML_SECTION_GROUP, you can name attribute sections anything with CTX_DDL.ADD_ATTR_SECTION.

To define section, title@book, as the attribute section booktitle, you can use either of the following methods:

CTX_DLL.ADD_ATTR_SECTION procedure. The syntax for this is:
```
CTX_DDL.ADD_ATTR_SECTION(
  group_name     in    varchar2,
  section_name   in    varchar2,
  tag            in    varchar2);
```
To define the title attribute as an attribute section, create an XML_SECTION_GROUP and define the attribute section as follows:
```
EXEC ctx_ddl_create_section_group('myxmlgroup', 'XML_SECTION_GROUP');
ctx_ddl.add_attr_section('myxmlgroup', 'booktitle', 'book@title');
```
When you define the TITLE attribute section as such and index the document set, you can query the XML attribute text as follows:
```
'Cities within booktitle' 
```

Dynamically, after indexing, using the ALTER INDEX statement. The syntax for ALTER INDEX is:

ALTER INDEX [schema.]index REBUILD [ONLINE] [PARAMETERS (paramstring)];

where

paramstring = 'replace [datastore datastore_pref] 
                       [filter filter_pref] 
                       [lexer lexer_pref] 
                       [wordlist wordlist_pref] 
                       [storage storage_pref] 
                       [stoplist stoplist] 
                       [section group section_group]
                          [memory memsize]
|    ... add attr section section_name tag tag@attr
|    add stop section tag'

Dynamically the clause add attr section section_name tag tag@attr adds an attribute section section_name to the existing index. You must specify the XML tag and attribute in the form tag@attr. You can only add attribute sections to XML section groups.

The added section, section_name, applies only to documents indexed after this operation. Thus for the change to take effect, you must manually re-index any existing documents that contain the tag. The index is not rebuilt by this statement.

AUTO_ or PATH_SECTION_GROUP

When you use the AUTO_SECTION_GROUP or PATH_SECTION_GROUP to index XML documents, the system automatically creates attribute sections and names them in the form attribute@tag.

To search on Tale within the attribute section booktitle, include the following WITHIN clause in your SELECT statement:

If you are using XML_SECTION_GROUP:

... WHERE CONTAINS ('Tale WITHIN booktitle')>0;

If you are using AUTO_ or PATH_SECTION_GROUP
```
... WHERE CONTAINS ('Tale WITHIN title@book')>0;
```
See Also:
"Distinguishing Tags Across DocTypes" .

Constraints for Querying Attribute Sections

The following constraints apply to querying within attribute sections:

Regular queries on attribute text will not work unless qualified in a WITHIN clause. Using the following XML document:
```
<book title="Tale of Two Cities">It was the best of times.</book>
```
querying on Tale will not work unless qualified with 'WITHIN title@book'.
You cannot use attribute sections in a nested WITHIN query.
Phrases ignore attribute text. For example, if the original document looked like:
```
....Now is the time for all good <word type="noun"> men </word> to come to 
the aid......
```
The search would result in a regular query's, "good men", and ignore the intervening attribute text.

Using INPATH or HASPATH Operators for Query Searching With XPath-like Expressions

Use the INPATH and HASPATH operators only when your index has been created with PATH_SECTION_GROUP.

Use of PATH_SE CTION_GROUP enables path searching. Path searching extends the syntax of the WITHIN operator so that the section name operand (right-hand-side) is a path instead of a section name.

Table 8-2 lists the different ways you can use the INPATH operator for path searching.

Table 8-2 Path Searching XML Documents Using the INPATH Operator

Path Search Feature	Syntax	Description
Simple Tag Searching	virginia INPATH (STATE) virginia INPATH (//STATE)	Finds all documents where the word "virginia" appears between <STATE> and </STATE>. The STATE element can appear at any level of the document structure.
Case-sensitivity	virginia INPATH (STATE) virginia INPATH (State)	Tags and attribute names in path searching are case-sensitive. virginia INPATH STATE -- finds <STATE>virginia</STATE> but NOT <State>virginia</State>. To find the latter you must do virginia INPATH State.
Top-Level Tag Searching	virginia INPATH (Legal) virginia INPATH (/Legal)	Finds all documents where "virginia" appears in a Legal element which is the top-level tag.'Legal' MUST be the top-level tag of the document.'virginia' may appear anywhere in this tag - regardless of other intervening tags. For example: <?xml version="1.0" standalone="yes"?> <!-- <?xml-stylesheet type="text/xsl" href="./xsl/vacourtfiling(html).xsl"?> --> <Legal> <CourtFiling> <Filing ID="f001" FilingType="Civil"> <LeadDocument> <CaseCaption> <CourtInformation> <Location> <Address> <AddressState>VIRGINIA</AddressState> </Address> ... </Legal>
Any Level Tag Searching	virginia INPATH (//Address)	'Virginia' can appear anywhere within an 'Address' tag, which may appear within any other tags. for example: <?xml version="1.0" standalone="yes"?> <!-- <?xml-stylesheet type="text/xsl" href="./xsl/vacourtfiling(html).xsl"?> --> <Legal> <CourtFiling> <Filing ID="f001" FilingType="Civil"> <LeadDocument> <CaseCaption> <CourtInformation> <Location> <Address> <AddressState> VIRGINIA </AddressState>... </Legal>
Direct Parentage Path Searching	virginia INPATH (//CourtInformation/Location)	Finds all documents where "virginia" appears in a Location element which is a direct child of a CourtInformation element. For example: <?xml version="1.0" standalone="yes"?> <!-- <?xml-stylesheet type="text/xsl" href="./xsl/vacourtfiling(html).xsl"?> --> <Legal> <CourtFiling> <Filing ID="f001" FilingType="Civil"> <LeadDocument> <CaseCaption> <CourtInformation> <Location> <Address> <AddressState> VIRGINIA </AddressState> </Address>... </CourtInformation>
Single-Level Wildcard Searching	virginia INPATH(A//B) 'virginia INPATH (//CaseCaption//Location)'	Finds all documents where "virginia" appears in a B element which is a grandchild of an A element. For instance, <A><D><B>virginia</B></D></A>. The intermediate element does not need to be an indexed XML tag. For example: <?xml version="1.0" standalone="yes"?> <!-- <?xml-stylesheet type="text/xsl" href="./xsl/vacourtfiling(html).xsl"?> --> <Legal> <CourtFiling> <Filing ID="f001" FilingType="Civil"> <LeadDocument> <CaseCaption> <CourtInformation> <Location> <Address> <AddressState>VIRGINIA</AddressState>... </Legal>
Multi-level Wildcard Searching	'virginia INPATH (Legal//Filing//*/CourtInformation)'	'Legal' must be a top-level tag, and there must be exactly one tag-level between 'Legal' and 'Filing', and two between 'Filing' and 'CourtInformation'. 'Virginia' may then appear anywhere within 'CourtInformation'. For example: <?xml version="1.0" standalone="yes"?> <!-- <?xml-stylesheet type="text/xsl" href="./xsl/vacourtfiling(html).xsl"?> --> <Legal> <CourtFiling> <Filing ID="f001" FilingType="Civil"> <LeadDocument> <CaseCaption> <CourtInformation> <Location> <Address> <AddressState>VIRGINIA</AddressState> </Address> </Location> <CourtName> IN THE CIRCUIT COURT OF LOUDOUN COUNTY </CourtName> </CourtInformation>....
Descendant Searching	virginia INPATH(A//B)	Finds all documents where "virginia" appears in a B element which is some descendant (any level) of an A element.
Attribute Searching	virginia INPATH(A/@B)	Finds all documents where "virginia" appears in the B attribute of an A element.
Descendant/Attribute Existence Testing	virginia INPATH (A[B])	Finds all documents where "virginia" appears in an A element which has a B element as a direct child. virginia INPATH A[.//B] -- Finds all documents where "virginia" appears in an A element which has a B element as a descendant (any level). virginia INPATH A[@B] -- Finds all documents where "virginia" appears in an A element which has a B attribute
Attribute Value Testing Within Equality	virginia INPATH A[@B = "foo"] That means that: virginia INPATH (A[@B = "pot of gold"]), would, with the default lexer and stoplist, match any of the following: <A B="POT OF GOLD">virginia</A> By default, lexing is case-independent, so "pot" matches "POT", <A B="POT BLACK GOLD">virginia</A> By default, "of" is a stopword, and, in a query, would match any word in that position, <A B=" Pot OF Gold ">virginia</A>	Finds all documents where "virginia" appears in an A element which has a B attribute whose value is "foo". Only equality is supported as a test. Range operators and functions are not supported. The left-hand-side of the equality MUST be an attribute or tag. Literals here are not allowed. The right-hand-side must be a literal. Tags and attributes here are not allowed. Within equality (See "Using the HASPATH Operator for Path Searching" ) is used to evaluate the test. Whitespace is mainly ignored in text indexing. Again, lexing is case-independent: <A B="pot_of_gold">virginia</A> Underscore is a non-alphabetic character, and is not a join character by default. As a result, it is treated more or less as whitespace and breaks up that string into three words.
Numeric Equality	virginia INPATH (A[@B = 5])	Numeric literals are allowed. But they are treated as text. The within equality is used to evaluate.This means that the query does NOT match. Thst is, <A B="5.0">virginia</A> does not match A[@B=5] where "5.0", a decimal is not considered the same as 5, an integer.
Conjunctive Testing	virginia INPATH (A[B AND C]) virginia INPATH (A[B AND @C = "foo"])...	Predicates can be conjunctively combined.
Combining Path and Node Tests	virginia INPATH (A[@B = "foo"]/C/D) virginia INPATH(A//B[@C]/D[E])...	Node tests can be applied to any node in the path.

Using the HASPATH Operator for Path Searching

Note:

The HASPATH operator functions in a similar fashion to the Existsnode() operator in XMLType. See Also Chapter 5, "Database Support for XML".

Only use the HASPATH operator when your index has been created with the PATH_SECTION_GROUP. The syntax for the HASPATH operator is:

WHERE CONTAINS(column, 'HASPATH(path)'...): Here HASPATH searches an XML document set and returns a score of 100 for all documents where path exists. Parent and child paths are separated with the / character, for example, A/B/C. For example, the query:
```
...WHERE CONTAINS (col,'HASPATH(A/B/C)')>0;
```
finds and returns a score of 100 for the document:
```
<A><B><C>Virginia</C></B></A>
```
without having to reference Virginia at all.
WHERE CONTAINS(column, 'HASPATH(A="value")'...): Here the HASPATH clause searches an XML document set and returns a score of 100 for all documents that have element A with content value and only that value. HASPATH is used to test equality. This is the "Section Equality Testing" feature of the HASPATH operator. The query:
```
...WHERE CONTAINS virginia INPATH A
```
finds <A>virginia</A>, but it also finds <A>virginia state</A>. To limit the query to the term virginia and nothing else, you can use a section equality test with the HASPATH operator. For example:
```
... WHERE CONTAINS (col,'HASPATH(A="virginia")'
```
finds and returns a score of 100 only for the first document, and not the second.

Using Oracle Text to Search XML Documents

To use Oracle Text to search and retrieve data from XML documents you must do the following overall tasks:

Create a section group
Create an Oracle Text index based on the section group you created
Build your query application using the CONTAINS operator

Before you create a section group and Oracle text index you must first determine the role you will need and grant the appropriate privilege. See "Oracle Text Users and Roles" , and grant the appropriate privilege.

After creating and preparing your data, you are ready to perform the next step. See Step 1. Create a Section Preference.

Using the section preference created, you then create an Oracle Text index. See Step 2. Create an Index Using the Section Preference Created in Step 1.

Now you can finish building your query application. See "Using Oracle Text to Search XML Documents".

First determine the role you need. See Oracle Text Reference and "Oracle Text Users and Roles" , and grant the appropriate privilege as follows:

CONNECT system/manager
GRANT ctxapp to scott;
CONNECT scott/tiger

Step 1. Create a Section Preference

Here we describe the basics of how to create section preferences using PATH_SECTION_GROUP, XML_SECTION_GROUP, and AUTO_SECTION_GROUP.

Table 8-3 describes the section groups you can use when indexing XML documents.

Table 8-3 Comparing Oracle Text Section Groups

Section Group	Description
XML_SECTION_GROUP	Use this group type for indexing XML documents and for defining sections in XML documents.
AUTO_SECTION_GROUP	Use this group type to automatically create a zone section for each start-tag/end-tag pair in an XML document. The section names derived from XML tags are case-sensitive as in XML. Attribute sections are created automatically for XML tags that have attributes. Attribute sections are named in the form attribute@tag. Stop sections, empty tags, processing instructions, and comments are not indexed. The following limitations apply to automatic section groups: You cannot add zone, field or special sections to an automatic section group. Automatic sectioning does not index XML document types (root elements.) However, you can define stop-sections with document type. The length of the indexed tags including prefix and namespace cannot exceed 64 characters. Tags longer than this are not indexed.
PATH_SECTION_GROUP	Use this group type to index XML documents. Behaves like the AUTO_SECTION_GROUP. With this section group you can do path searching with the INPATH and HASPATH operators. Queries are case-sensitive for tag and attribute names. How is PATH_SECTION_GROUP Similar to AUTO_SECTION_GROUP? Documents are assumed to be XML, Every tag and every attribute is indexed by default, Stop sections can be added to prevent certain tags from being indexed, Only stop sections can be added -- ZONE, FIELD, and SPECIAL sections cannot be added, When indexing XML document collections, you do not need to explicitly define sections as Oracle automatically does this for you. How Does PATH_SECTION_GROUP Differ From AUTO_SECTION_GROUP? Path Searching is allowed at query time (see "Case Study: Searching an Online FAQ List Using Oracle Text" and "Using the HASPATH Operator for Path Searching" ) with the new INPATH and HASPATH operators, Tag and attribute names are case-sensitive in queries.

Section Group

Description

XML_SECTION_GROUP

Use this group type for indexing XML documents and for defining sections in XML documents.

AUTO_SECTION_GROUP

Use this group type to automatically create a zone section for each start-tag/end-tag pair in an XML document. The section names derived from XML tags are case-sensitive as in XML. Attribute sections are created automatically for XML tags that have attributes. Attribute sections are named in the form attribute@tag. Stop sections, empty tags, processing instructions, and comments are not indexed. The following limitations apply to automatic section groups:

You cannot add zone, field or special sections to an automatic section group.
Automatic sectioning does not index XML document types (root elements.) However, you can define stop-sections with document type.
The length of the indexed tags including prefix and namespace cannot exceed 64 characters. Tags longer than this are not indexed.

PATH_SECTION_GROUP

Use this group type to index XML documents. Behaves like the AUTO_SECTION_GROUP. With this section group you can do path searching with the INPATH and HASPATH operators. Queries are case-sensitive for tag and attribute names.

How is PATH_SECTION_GROUP Similar to AUTO_SECTION_GROUP?

Documents are assumed to be XML, Every tag and every attribute is indexed by default, Stop sections can be added to prevent certain tags from being indexed, Only stop sections can be added -- ZONE, FIELD, and SPECIAL sections cannot be added, When indexing XML document collections, you do not need to explicitly define sections as Oracle automatically does this for you.

How Does PATH_SECTION_GROUP Differ From AUTO_SECTION_GROUP?

Path Searching is allowed at query time (see "Case Study: Searching an Online FAQ List Using Oracle Text" and "Using the HASPATH Operator for Path Searching" ) with the new INPATH and HASPATH operators, Tag and attribute names are case-sensitive in queries.

Note:

If you are using the AUTO_SECTION_GROUP or PATH_SECTION_GROUP to index an XML document collection, you need not explicitly define sections since the system does this for you during indexing.

Deciding Which Section Group to Use

How do you determine which section groups is best for your application? This depends on your application. Table 8-4 lists some general guidelines to help you decide which of the XML_, AUTO_, or PATH_ section groups to use when indexing your XML documents, and why.

Table 8-4 Guidelines for Choosing XML_, AUTO_, or PATH_ Section Groups

Application Criteria	XML_section_...	AUTO_section_...	PATH_section_...
You are using XPATH search features			X
You know the layout and structure of your XML documents, and you can predefine the sections on which users are most likely to search.	X
You do not know which tags users are most likely to search.		X
Query performance, in general	Fastest	Little slower than XML_section_...	Little slower than AUTO_section_...
Indexing performance, in general	Fastest	Little slower than XML_section_...	Little slower than AUTO_section_...
Index size	Smallest	Little larger than XML_section_...	Little larger than AUTO_section_...
Other features	Mappings can be defined so that tags in one or different DTDs can be mapped to one section. Good for DTD evolution and data aggregation.	Simplest. No need to define mapping, add_stop_section can be used to ignore some sections.	Designed for more sophisticated XPATH- like queries

Creating a Section Preference with XML_SECTION_GROUP

The following command creates a section group called, xmlgroup, with the XML_SECTION_GROUP group type.

EXEC ctx_ddl.create_section_group('myxmlgroup', 'XML_SECTION_GROUP');

You can add sections to this group using CTX_DDL.ADD_SECTION. Consider an XML file that defines the BOOK tag with a TITLE attribute as follows:

<BOOK TITLE="Tale of Two Cities">  It was the best of times. </BOOK>

To define the title attribute as an attribute section, create an XML_SECTION_GROUP and define the attribute section as follows:

EXEC ctx_ddl.add_attr_section('myxmlgroup', 'booktitle', 'book@title');

When you define the TITLE attribute section as such and index the document set, you can query the XML attribute text as follows:

'Cities within booktitle'

Creating a Section Preference with AUTO_SECTION_GROUP

You can set up your indexing operation to automatically create sections from XML documents using the section group AUTO_SECTION_GROUP. Here, Oracle creates zone sections for XML tags. Attribute sections are created for those tags that have attributes, and these attribute sections are named in the form "tag@attribute."

The following command creates a section group called autogroup with the AUTO_SECTION_GROUP group type. This section group automatically creates sections from tags in XML documents.

EXEC ctx_ddl.create_section_group('autogroup', 'AUTO_SECTION_GROUP');

Note:

You can add attribute sections only to XML section groups. When you use AUTO_SECTION_GROUP, attribute sections are created automatically. Attribute sections created automatically are named in the form tag@attribute.

Creating a Section Preference with PATH_SECTION_GROUP

To enable path section searching, index your XML document with PATH_SECTION_GROUP. For example:

EXEC ctx_ddl.create_section_group('xmlpathgroup', 'PATH_SECTION_GROUP');

Step 2. Create an Index Using the Section Preference Created in Step 1

Create an index depending on which section group you used to create a preference:

Creating an Index Using XML_SECTION_GROUP

To index your XML document when you have used XML_SECTION_GROUP, you can use the following statement:

CREATE INDEX myindex ON docs(htmlfile) INDEXTYPE IS ctxsys.context 
     parameters('section group xmlgroup');

Creating an Index Using AUTO_SECTION_GROUP

The following statement creates the index, myindex, on a column containing XML files using the AUTO_SECTION_GROUP:

CREATE INDEX myindex ON xmldocs(xmlfile) INDEXTYPE IS ctxsys.context PARAMETERS 
('section group autogroup');

Creating an Index Using PATH_SECTION_GROUP

To index your XML document when you have used PATH_SECTION_GROUP, you can use the following statement:

CREATE INDEX myindex ON xmldocs(xmlfile) INDEXTYPE IS ctxsys.context PARAMETERS 
('section group xmlpathgroup');

See Also:

Oracle Text Reference for detailed notes on CTX_DDL.

Oracle Text Example 1: Creating an Index Using XML_SECTION_GROUP

EXEC ctx_ddl_create_section_group('myxmlgroup', 'XML_SECTION_GROUP');

/* ADDING A FIELD SECTION */
EXEC ctx_ddl.Add_Field_Section /* THIS IS KEY */
  (  group_name   =>'my_section_group',
     section_name =>'author',/* do this for EVERY tag used after "WITHIN" */
     tag          =>'author'
  );

EXEC ctx_ddl.Add_Field_Section /* THIS IS KEY */
      (  group_name   =>'my_section_group',
         section_name =>'document',/*do this for EVERY tag after "WITHIN" */
         tag          =>'document'
      );

  ...
/
/* ADDING AN ATTRIBUTE SECTION */
EXEC ctx_ddl.add_attr_section('myxmlgroup', 'booktitle', 'book@title');

/* The more sections you add to your index, the longer your search will take.*/
/* Useful for defining attributes in XML documents as sections. This allows*/
/* you to search XML attribute text using the WITHIN operator.*/ 
/*  The section name:
/* ** Is used for WITHIN queries on the attribute text. 
   ** Cannot contain the colon (:) or dot (.) characters. 
   ** Must be unique within group_name. 
   ** Is case-insensitive. 
   ** Can be no more than 64 bytes. 
   **  The tag specifies the name of the attribute in tag@attr format. This is
       case-sensitive. */
/* Names used as arguments of the keyword WITHIN can be different from the
   actual XML tag names. Many tags can be mapped to the same name at query
   time.*/ 

/* ADDING A ZONE SECTION */
/* If You have an XML document that contains the <book> tag declared for */
/* different document types. You can create a distinct book section for each */ 
/* document type. If mydocname is declared in your DTD as an XML document */
/* type (root element) as follows: */

<!DOCTYPE mydocname ... [...

/* Within mydocname, element <book> is declared. For this tag, you can create */ 
/* a section named, mybooksec, that's sensitive to the tag's document type as */ 
/* follows: */

EXEC ctx_ddl.add_zone_section('myxmlgroup', 'mybooksec', 'mydocname(book)');

/* Call CTX_DDL.Add_Zone_Section for each tag in your XML document that you need 
to search on. */

CREATE INDEX my_index ON my_table ( my_column )
  INDEXTYPE IS ctxsys.context
  PARAMETERS ( 'SECTION GROUP my_section_group' );

SELECT my_column FROM my_table
  WHERE CONTAINS(my_column, 'smith WITHIN author') > 0;

Oracle Text Example 2: Creating an Index Using AUTO_SECTION_GROUP

ctx_ddl_create_section_group('auto', 'AUTO_SECTION_GROUP');

CREATE INDEX myindex ON docs(xmlfile_column) 
    INDEXTYPE IS ctxsys.context 
    PARAMETERS ('filter ctxsys.null_filter SECTION GROUP auto');

SELECT xmlfile_column FROM docs
    WHERE CONTAINS (xmlfile_column, 'virginia WITHIN title')>0;

Oracle Text Example 3: Creating an Index Using PATH_SECTION_GROUP

EXEC ctx_ddl.create_section_group('xmlpathgroup', 'PATH_SECTION_GROUP');



CREATE INDEX myindex ON xmldocs(xmlfile_column) 
    INDEXTYPE IS ctxsys.context 
    PARAMETERS ('section group xmlpathgroup');



SELECT xmlfile_column FROM xmldocs
... WHERE CONTAINS (column, 'Tale WITHIN title@book')>0;

Building XML Query Applications with Oracle Text

Building XML query applications with Oracle Text includes the following topics:

Querying XML Documents

Distinguishing Tags Across DocTypes

In previous releases, the XML section group was unable to distinguish between tags in different DTD's. For instance, perhaps you have a DTD for storing contact information:

  <!DOCTYPE contact>
  <contact>
    <address>506 Blue Pool Road</address>
    <email>dudeman@radical.com</email>
  </contact>

Appropriate sections might look like:

  ctx_ddl.add_field_section('mysg','email',  'email');
  ctx_ddl.add_field_section('mysg','address','address');

This is fine until you have a different kind of document in the same table:

<!DOCTYPE mail>
 <mail>
   <address>dudeman@radical.com</address>
</mail>

Now your address section, originally intended for street addresses, starts picking up email addresses, because of tag collision.

Specifying Doctype Limiters to Distinguish Between Tags

Oracle8i release 8.1.5 and higher allow you to specify doctype limiters to distinguish between these tags across doctypes. Simply specify the doctype in parentheses before the tag as follows:

  ctx_ddl.add_field_section('mysg','email','email');
  ctx_ddl.add_field_section('mysg','address','(contact)address');
  ctx_ddl.add_field_section('mysg','email','(mail)address');

Now when the XML section group sees an address tag, it will index it as the address section when the document type is contact, or as the email section when the document type is mail.

Doctype-Limited and Unlimited Tags in a Section Group

If you have both doctype-limited and unlimited tags in a section group:

  ctx_ddl.add_field_section('mysg','sec1','(type1)tag1');
  ctx_ddl.add_field_section('mysg','sec2','tag1');

Then the limited tag applies when in the doctype, and the unlimited tag applies in all other doctypes.

Querying is unaffected by this -- the query is done on the section name, not the tag, so querying for an email address would be done like:

  radical WITHIN email

which, since we have mapped two different kinds of tags to the same section name, finds documents independent of which tags are used to express the email address.

Querying Within Attribute Sections

You can query within attribute sections when you index with either of the following as your section group type:

XML_SECTION_GROUP
AUTOMATIC_SECTION_GROUP
PATH_SECTION_GROUP

Consider the following XML document:

<book title="Tale of Two Cities">It was the best of times.</book>

You can define the section title@book to be the attribute section title. You can do so with the CTX_DLL.ADD_ATTR_SECTION procedure or dynamically after indexing with ALTER INDEX.

Note:

When you use the AUTO_SECTION_GROUP to index XML documents, the system automatically creates attribute sections and names them in the form attribute@tag.
If you use the XML_SECTION_GROUP, you can name attribute sections anything with CTX_DDL.ADD_ATTR_SECTION.

To search on Tale within the attribute section title, you issue the following query:

'Tale WITHIN BOOK@TITLE'

You cannot use attribute sections in a nested WITHIN query.

Phrases ignore attribute text. For example, if the original document looked like:

Now is the time for all good <word type="noun"> men </word> to come to the aid.

The WITHIN operator requires you to know the name of the section you search. A list of defined sections can be obtained using the CTX_SECTIONS or CTX_USER_SECTIONS views.

See Also:

XML_SECTION_GROUP Attribute Sections

In Oracle8i Release 1(8.1.5) and higher, XML_SECTION_GROUP offers the ability to index and search within attribute values. Consider a document with the following lines:

<comment author="jeeves">
    I really like Oracle Text
  </comment>

Now XML_SECTION_GROUP offers an attribute section. This allows the inclusion of attribute values to index. For example:

  ctx_ddl.add_attr_section('mysg','author','comment@author');

The syntax is similar to other add_section calls. The first argument is the name of the section group, the second is the name of the section, and the third is the tag, in the form <tag_name>@<attribute_name>. This tells Oracle Text to index the contents of the author attribute of the comment tag as the section "author".

Query syntax is just like for any other section:

WHERE CONTAINS ( ... ,'jeeves WITHIN author...',...)...

and finds the document.

Attribute Value Sensitive Section Search

Attribute sections allow you to search the contents of attributes. They do not allow you to use attribute values to specify sections to search. For instance, given the document:

 <comment author="jeeves">
    I really like Oracle Text
 </comment>

You can find this document by asking:

jeeves within comment@author

which is equivalent to "find me all documents which have a comment element whose author attribute's value includes the word jeeves".

However, there you cannot currently request the following:

interMedia within comment where (@author = "jeeves")

in other words, "find me all documents where interMedia appears in a comment element whose author is jeeves". This feature -- attribute value sensitive section searching -- is planned for future versions of the product.

Dynamic Add Section

Because the section group is defined before creating the index, Oracle8i Release 1 (8.1.5) is limited in its ability to cope with changing structured document sets; if your documents start coming with new tags, or you start getting new doctypes, you have to re-create the index to start making use of those tags.

In Oracle8i Release 2 (8.1.6) and higher allows you to add new sections to an existing index without rebuilding the index, using alter index and the new add section parameters string syntax:

add zone  section <section_name> tag <tag>
add field section <section_name> tag <tag> [ visible | invisible ]

For instance, to add a new zone section named tsec using the tag title:

alter index <indexname> rebuild
parameters ('add zone section tsec tag title')

To add a new field section named asec using the tag author:

alter index <indexname> rebuild
parameters ('add field section asec tag author')

This field section would be invisible by default, just like when using add_field_section. To add it as visible field section:

alter index <indexname> rebuild
parameters ('add field section asec tag author visible')

Dynamic add section only modifies the index meta-data, and does not rebuild the index in any way. This means that these sections take effect for any document indexed after the operation, and do not affect any existing documents -- if the index already has documents with these sections, they must be manually marked for re-indexing (usually with an update of the indexed column to itself).

This operation does not support addition of special sections. Those would require all documents to be re-indexed, anyway. This operation cannot be done using rebuild online, but it should be a fairly quick operation.

Constraints for Querying Attribute Sections

The following constraints apply to querying within attribute sections:

Regular queries on attribute text do not hit the document unless qualified in a within clause. Assume you have an XML document as follows:
```
 <book title="Tale of Two Cities">It was the best of times.</book>
```
A query on Tale by itself does not produce a hit on the document unless qualified with WITHIN title@book. This behavior is like field sections when you set the visible flag set to false.
You cannot use attribute sections in a nested WITHIN query.
Phrases ignore attribute text. For example, if the original document looked like:
```
Now is the time for all good <word type="noun"> men </word> to come to the 
aid.
```
Then this document would hit on the regular query good men, ignoring the intervening attribute text.

WITHIN queries can distinguish repeated attribute sections. This behavior is like zone sections but unlike field sections. For example, for the following document:

<book title="Tale of Two Cities">It was the best of times.</book>
<book title="Of Human Bondage">The sky broke dull and gray.</book>

Assume the book is a zone section and book@author is an attribute section. Consider the query:

'(Tale and Bondage) WITHIN book@author'

This query does not hit the document, because tale and bondage are in different occurrences of the attribute section book@author.

Procedure for Building a Query Application with Oracle Text

To build the query application with Oracle Text carry out the indexing steps first..

The next step is to build your query application. To do so follow these steps:

Create a preference using the procedure, CTX_DDL.create_preference. See "Step 1. Create a Preference"
Set preference's attributes using CTX_DDL.Add_Attr_Section and so on. See "Step 2. Set the Preference's Attributes".
Create your query syntax

You can query within attribute sections when you index with either XML_SECTION_GROUP or AUTOMATIC_SECTION_GROUP as your section group type.

Note:

Not everything in your document may be searchable. You must first state what is searchable using the.......add_....._section
The more sections you add to your index the longer the search will take!

Nested tag searching is supported in Oracle Text.

Using Table CTX_OBJECTS and CTX_OBJECT_ATTRIBUTES View

The CTX_OBJECT_ATTRIBUTES view displays attributes that can be assigned to preferences of each object. It can be queried by all users.

Check out the structure of CTX_OBJECTS and CTX_OBJECT_ATTRIBUTE view, with the following DESCRIBE commands. Because we are only interested in querying XML documents in this chapter, we focus on XML_SECTION_GROUP and AUTO_SECTION_GROUP.

Describe ctx_objects
  SELECT obj_class, obj_name FROM ctx_objects
  ORDRR BY obj_class, obj_name;

The result is:

...
SECTION_GROUP                  AUTO_SECTION_GROUP    <<==
SECTION_GROUP                  BASIC_SECTION_GROUP
SECTION_GROUP                  HTML_SECTION_GROUP
SECTION_GROUP                  NEWS_SECTION_GROUP
SECTION_GROUP                  NULL_SECTION_GROUP
SECTION_GROUP                  XML_SECTION_GROUP     <<==

...

Describe ctx_object_attributes
SELECT oat_attribute FROM ctx_object_attributes
  WHERE oat_object = 'XML_SECTION_GROUP';

The result is:

OAT_ATTRIBUTE
-------------
ATTR
FIELD
SPECIAL
ZONE

SELECT oat_attribute FROM ctx_object_attributes
  WHERE oat_object = 'AUTO_SECTION_GROUP';

The result is:

OAT_ATTRIBUTE
-------------
STOP

Step 1. Create a Preference

The first thing you must do is create a preference. To do this, use the CTX_DDL.Create_Preference procedure. For example:

CTX_DDL.Create_Preference

CTX_DDL.Create_Preference (
 preference_name  => 'books' /* or whatever you want to call it */
 object_name      => 'XML_SECTION GROUP' /* either XML_SECTION_GROUP or AUTO_
SECTION_GROUP */);

To drop this preference use the following syntax:

CTX_DDL.Drop_Preference (
 preference_name => 'books');

Step 2. Set the Preference's Attributes

To set the preference's attributes for XML_SECTION_GROUP, use the following procedures:

Add_Zone_Section
Add_Attr_Section
Add_Field_Section
Add_Special_Section

To set the preference's attributes for AUTO_SECTION_GROUP, use the following procedures:

Add_Zone_Section
Add_Attr_Section
Add_Field_Section

There are corresponding CTX_DDL.drop sections and CTX_DDL.remove section syntax.

2.1 Using CTX_DDL.add_zone_section

The syntax for CTX_DDL.add_zone_section follows:

CTX_DDL.Add_Zone_Section (
 group_name      => 'my_section_group' /* whatever you called it above */
 section_name    => 'author' /* what you want to call this section */
 tag             => 'my_tag' /* what represents it in XML */ );

where 'my_tag' implies opening with <my_tag> and closing with </my_tag>.

add_zone_section Guidelines

add_zone_section guidelines are listed here:

Call CTX_DDL.Add_Zone_Section for each tag in your XML document that you need to search on.

2.2 Using CTX_DDL.Add_Attr_Section

The syntax for CTX_DDL.add_attr_section follows:

CTX_DDL.Add_Attr_Section ( /* call this as many times as you need to describe
                             the attribute sections */
 group_name      => 'my_section_group' /* whatever you did call it above */
 section_name    => 'author' /* what you want to call this section */
 tag             => 'my_tag' /* what represents it in XML */ );

where 'my_tag' implies opening with <my_tag> and closing with </my_tag>.

add_attr_section Guidelines

add_attr_section guidelines are listed here:

Consider meta_data attribute author:

<meta_data author = "John Smith" title="How to get to Mars">

The more sections you add to your index, the longer your search will take.

add_attr_section adds an attribute section to an XML section group. This procedure is useful for defining attributes in XML documents as sections. This allows you to search XML attribute text with the WITHIN operator.

The section_name:

Is the name used for WITHIN queries on the attribute text.
Cannot contain the colon (:) or dot (.) characters.
Must be unique within group_name.
Is case-insensitive.
Can be no more than 64 bytes.

The tag specifies the name of the attribute in tag@attr format. This is case-sensitive.

Note:

In the add_attr_section procedure, you can have many tags all represented by the same section name at query time. Explained in another way, the names used as the arguments of the keyword WITHIN can be different from the actual XML tag names. That is many tags can be mapped to the same name at query time. This feature enhances query usability.

2.3 Using CTX_DDL.add_field_section

The syntax for CTX_DDL.add_field_section follows:

CTX_DDL.Add_Field_Section (
 group_name     => 'my_section_group' /* whatever you called it above */
 section_name   => 'qq' /* what you want to call this section */
 tag            => 'my_tag' /* what represents it in XML */ );
 visible        => TRUE or FALSE );

add_field_section Guidelines

add_field_section guidelines are listed here:

add_field_section and add_zone_section attributes differ in performance.
In a document, tags can be repeated two or more times, however some tags, such as "title", occur only once. A DTD (or XML Schema) define how many times the tags occur.

Visible attribute: This is available in add_field_section but not available in the add_zone_section. If VISIBLE is set to TRUE then a query such as "... CONTAINS virginia... becomes irrelevant if cat occurs in title or paragraph.

Consider again the query, "... CONTAINS virginia...". You may not get a hit if you use VISIBLE=TRUE. If VISIBLE=FALSE, the index will be smaller. You may lose some functionality but your performance will be improved, compared to if you set VISIBLE =TRUE.

If you set up your index using the add_zone_section....

If you set up your index using the add_field_section....

Note:

Constructing an index is harder if you have to cater for the fact that there could be more than one occurrence of any one tag.

If the tag in your XML document occurs only once, use the single add_field_section procedure. For example, ".... CONTAINS virginia and state WITHIN title......
If the tag in your XML document occurs more than once, use the add_zone_section procedure. For example, ".... CONTAINS virginia and state WITHIN paragraph....". This has many possibilities.

How Attr_Section Differs From Field_Section

Attribute section differs from field section in the following ways:

Attribute text is considered invisible, though, so the following:
```
WHERE CONTAINS (..., '... jeeves',...)...
```
does NOT find the document, somewhat like field sections. Unlike field sections, however, attribute section within searches can distinguish between occurrences. Consider the document:
```
  <comment author="jeeves">
    I really like Oracle Text
  </comment>
  <comment author="bertram">
    Me too
  </comment>
```
the query:
```
WHERE CONTAINS (...,'(cryil and bertram) WITHIN author', ...)...
```
will NOT find the document, because "jeeves" and "bertram" do not occur within the SAME attribute text.
Attribute section names cannot overlap with zone or field section names, although you can map more than one tag@attr to a single section name. Attribute sections do not support default values. Given the document:
```
  <!DOCTYPE foo [
    <!ELEMENT foo (bar)>
    <!ELEMENT bar (#PCDATA)>
```
```
      <!ATTLIST bar
        rev CDATA "8i">

  ]>
  <foo>
    <bar>whatever</bar>
  </foo>
```
and attribute section:
```
  ctx_ddl.add_attr_section('mysg','barrev','bar@rev');
```
the query:

8i within barrev does not hit the document, although in XML semantics, the "bar" element has a default value for its "rev" attribute.

2.5 Using CtX_DDL.Add_Stop_Section

CtX_DDL.Add_Stop_Section (
 group_name     => 'my_section_group' /* whatever you called it above */
 section_name   => 'qq' /* what you want to call this section */ );

Step 3. Create Your Query Syntax

See the section, "Querying with the CONTAINS Operator" for information about how to use the CONTAINS operator in query statements.

Querying Within Attribute Sections

You can query within attribute sections when you index with either XML_SECTION_GROUP or AUTO_SECTION_GROUP as your section group type.

Assume you have an XML document as follows:

<book title="Tale of Two Cities">It was the best of times.</book>

You can define the section title@book as the attribute section title. You can do so with the CTX_DLL.Add_Attr_Section procedure or dynamically after indexing with ALTER INDEX.

Note:

When you use the AUTO_SECTION_GROUP to index XML documents, the system automatically creates attribute sections and names them in the form attribute@tag.

If you use the XML_SECTION_GROUP, you can name attribute sections anything with CTX_DDL.Add_Attr_Section.

To search on Tale within the attribute section title, issue the following query:

WHERE CONTAINS (...,'Tale WITHIN title', ...)

Using XML_SECTION_GROUP and add_attr_section to Aid Querying

Consider an XML file that defines the BOOK tag with a TITLE attribute as follows:

<BOOK TITLE="Tale of Two Cities"> 
It was the best of times. </BOOK> 
<Author="Charles Dickens">
Born in England in the town, Stratford_Upon_Avon </Author>

Recall the CTX_DDL.Add_Attr_Section syntax is:

CTX_DDL.Add_Attr_Section ( group_name, section_name, tag );

To define the title attribute as an attribute section, create an XML_SECTION_GROUP and define the attribute section as follows:

ctx_ddl_create_section_group('myxmlgroup', 'XML_SECTION_GROUP');
ctx_ddl.add_attr_section('myxmlgroup', 'booktitle', 'book@title');
ctx_ddl.add_attr_section('myxmlgroup', 'authors', 'author');
end;

When you define the TITLE attribute section as such and index the document set, you can query the XML attribute text as follows:

... WHERE CONTAINS (...,'Cities WITHIN booktitle', ....)...

When you define the AUTHOR attribute section as such and index the document set, you can query the XML attribute text as follows:

... WHERE 'England WITHIN authors'

Oracle Text Example 4: Querying a... Document

This example does the following:

Creates and populates table res_xml
Creates an index, section_group, and preferences
Paramaterizes the preferences

Runs a test query against res_xml

drop table res_xml;

CREATE TABLE res_xml (
  pk            NUMBER PRIMARY KEY ,
  text          CLOB
  ) ;

insert into res_xml values(111,
  'ENTITY chap8 "Chapter 8, <q>Keeping it Tidy: the XML Rule Book </q>"> this is 
the document section');
commit;

---
--- script to create index on res_xml
---

--- cleanup, in case we have run this before
DROP INDEX res_index ;
EXEC CTX_DDL.DROP_SECTION_GROUP ( 'res_sections' ) ;

--- create a section group
BEGIN
  CTX_DDL.CREATE_SECTION_GROUP ( 'res_sections', 'XML_SECTION_GROUP' ) ;
  CTX_DDL.ADD_FIELD_SECTION ( 'res_sections', 'chap8', '<q>') ;
END ;
/

begin
  ctx_ddl.create_preference
    (
      preference_name => 'my_basic_lexer',
      object_name     => 'basic_lexer'
    );
  ctx_ddl.set_attribute
    (
      preference_name => 'my_basic_lexer',
      attribute_name  => 'index_text',
      attribute_value => 'true'
    );
  ctx_ddl.set_attribute
    (
      preference_name => 'my_basic_lexer',
      attribute_name  => 'index_themes',
      attribute_value => 'false');
end;
/

CREATE INDEX res_index
  ON res_xml(text)
  INDEXTYPE IS ctxsys.context 
  PARAMETERS ( 'lexer my_basic_lexer SECTION GROUP res_sections' ) ;

Test the above index with a test query, such as:

SELECT pk FROM res_xml WHERE CONTAINS( text, 'keeping WITHIN chap8' )>0 ;

Oracle Text Example 5: Creating an Index and Performing a Text Query

Creating Table explain_ex to Use in this Example

drop table explain_ex; 
 
create table explain_ex 
  ( 
    id        number primary key,
    text      varchar(2000)
  ); 
 
insert into explain_ex ( id, text )
  values ( 1, 'thinks thinking thought go going goes gone went' || chr(10) ||
              'oracle orackle oricle dog cat bird'              || chr(10) ||
              'President Clinton' ); 
insert into explain_ex ( id, text )
  values ( 2, 'Last summer I went to New England'               || chr(10) ||
              'I hiked a lot.'                                  || chr(10) ||
              'I camped a bit.' ); 
commit;

Text Query Using "ABOUT" in the Text Query Expression

Set Define Off
select text 
  from explain_ex
  WHERE CONTAINS ( text,
  '( $( think & go ) , ?oracle ) & ( dog , ( cat & bird ) ) & about(mammal
                                                    during Bill Clinton)' ) > 0;

select text 
  from explain_ex
  WHERE CONTAINS ( text, 'about ( camping and hiking in new england )'  ) > 0;

Creating Sections in XML Documents that are Document Type Sensitive

Consider an XML document set that contains the <book> tag declared for different document types. You need to create a distinct book section for each document type. Assume that mydocname is declared as an XML document type (root element) as follows:

<!DOCTYPE mydocname ... [...

Within mydocname, the element <book> is declared. For this tag, you can create a section named mybooksec that is sensitive to the tag's document type as follows:

begin
ctx_ddl.create_section_group('myxmlgroup', 'XML_SECTION_GROUP');
ctx_ddl.add_zone_section('myxmlgroup', 'mybooksec', 'mydocname(book)');
end;

Note:

Oracle knows what the end tags look like from the group_type parameter you specify when you create the section group. The start tag you specify must be unique within a section group.
Section names need not be unique across tags. You can assign the same section name to more than one tag, making details transparent to searches.

Repeated Sections

Zone sections can repeat. Each occurrence is treated as a separate section. For example, if <H1> denotes a heading section, they can repeat in the same documents as follows:

<H1> The Brown Fox </H1> 
<H1> The Gray Wolf </H1>

Assuming that these zone sections are named Heading.

The query:

WHERE CONTAINS (..., 'Brown WITHIN Heading', ...)...

returns this document.

But the query:

WHERE CONTAINS (...,' (Brown and Gray) WITHIN Heading',...)...

does not.

Overlapping Sections

Zone sections can overlap each other. For example, if <B> and <I> denote two different zone sections, they can overlap in document as follows:

plain <B> bold <I> bold and italic </B> only italic </I>  plain

Nested Sections

Zone sections can nest, including themselves as follows:

<TD> 
  <TABLE>
  <TD>nested cell</TD>
  </TABLE>
</TD>

Using the WITHIN operator, you can write queries to search for text in sections within sections.

Nested Section Query Example

For example, assume the BOOK1, BOOK2, and AUTHOR zone sections occur as follows in documents doc1 and doc2:

doc1:

<book1><author>Scott Tiger</author> This is a cool book to read.</book1>

doc2:

<book2> <author>Scott Tiger</author> This is a great book to read.</book2>

Consider the nested query:

'Scott WITHIN author WITHIN book1'

This query returns only doc1.

Presenting the Results of Your Query

A Text query application allows you to view the documents returned by a query. You typically select a document from the hitlist and then your application presents the document in some form.

With Oracle Text, you can render a document in different ways. For example, with the query terms highlighted. Highlighted query terms can be either the words of a word query or the themes of an ABOUT query in English. This rendering uses the CTX_DOC.HIGHLIGHT or CTX_DOC.MARKUP procedures.

You can also obtain theme information from documents with the CTX_DOC.THEMES PL/SQL package. Besides these there are several other CTX_DOC procedures for presenting your query results.

See Also:

Oracle Text Reference for more information on the CTX_DOC package.

Case Study: Searching an Online FAQ List Using Oracle Text

Note:

You can download this sample application from http://otn.oracle.com/products/text.

Consider the scenario where your company has several FAQs for each product. Each FAQ is an XML document similar to the following:

<?xml version="1.0"?> 
    <FAQ OWNER="Billy Text"> 
      <TITLE>Oracle Text FAQ</TITLE> 
        <DESCRIPTION>Everything you always wanted to know ...</DESCRIPTION> 
        <QUESTION>What is Oracle Text?</QUESTION> 
        <ANSWER>Oracle Text uses standard SQL to index, ...</ANSWER> 
    </FAQ>

In this sample FAQ case study, we are only using 3 FAQs, with titles: Text, Performance, and XML. The sample FAQ program searches for information within the FAQs' XML tags using the WITHIN operator. The pull-down menu for the FAQ user interface is generated at run-time. Figure 8-1 shows the online FAQ Search user interface. The pull down menu shows the XML elements selected for use in searching the FAQ data:

Title
Question
FAQ
Description
Answer

"FAQ Owner" is actually an element attribute. Attributes are also searchable. These tags or elements are used here to assist users in fine tuning their keyword search for desired FAQs.

Figure 8-1 Online FAQ Search User Interface: Search Options

Text description of text1mail6P.jpeg follows.

Text description of the illustration text1mail6P.jpeg

Figure 8-2 shows to enter a search for keyword "XML", in all TITLE elements of the FAQs. The result of the search here is one FAQ with title, "XML..."

Figure 8-2 Creating an Online FAQ Search User Interface with Oracle Text: Searching for "XML" Within TITLE

Text description of textnsma.jpg follows.

Text description of the illustration textnsma.jpg

Figure 8-3 shows how you can enter an attribute search for "Billy" within FAQ OWNER by using TITLE, where TITLE is the attribute of the element FAQ OWNER. The syntax is "...WHERE Billy WITHIN faq@owner".

Figure 8-3 Creating an Online FAQ Search User Interface with Oracle Text: Attribute Searching for "Billy" Within "FAQ OWNER"

Text description of text3mailKD.jpeg follows.

Text description of the illustration text3mailKD.jpeg

To create your online FAQ search program for your company, follow these steps:

1 Create and Populate Your FAQ Table. Create an Auto Section Group and Oracle Text Index

2 Compile showxml.psp

3 Compile faqsearch.psp

1 Create and Populate Your FAQ Table. Create an Auto Section Group and Oracle Text Index

Run faqsearch_install.sql. This script does the following:

Creates the table faq.
Populates table faq with three records of data.
Creates the section group and the Text index.

faqsearch_install.sql

insert into faq(tk,xml_desc)
 values(1,'<?xml version="1.0"?>
<FAQ OWNER="Billy Text">
<TITLE>Oracle Text FAQ</TITLE><DESCRIPTION>Everything you always wanted to know 
about Text but were afraid to ask</DESCRIPTION>
<QUESTION>What is Oracle Text?</QUESTION>
    <ANSWER>Oracle Text uses standard SQL to index, search, and analyze text and 
documents stored in the database, files or websites</ANSWER>
<QUESTION>What is ABOUT?</QUESTION>
    <ANSWER>ABOUT queries increase the number of relevant documents returned by 
a query.</ANSWER>');

insert into faq(tk,xml_desc)
 values(2,'<?xml version="1.0"?><FAQ OWNER="Jack Performance">
<TITLE>Text Performance Guide</TITLE>
<DESCRIPTION>Oracle Text and interMedia Text performance guide</DESCRIPTION>
<QUESTION>What do we mean by query performance anyway?</QUESTION>
    <ANSWER>There are generally two measures of query performance - response 
time (the time to get an answer to an individual query), and throughput (the 
number of queries that can be run in any time period, eg queries per 
second).</ANSWER></FAQ>');

insert into faq(tk,xml_desc)
 values(3,'<?xml version="1.0"?><FAQ OWNER="John XML">
<TITLE>XML FAQ</TITLE><DESCRIPTION>Oracle XML FAQ</DESCRIPTION>
<QUESTION>What is XML?</QUESTION>
    <ANSWER>XML stands for eXtensible Markup Language. XML s quickly becoming 
the standard way to identify and describe data on the web because it has proved 
broadly implementable and easy to deploy</ANSWER>
<QUESTION>What is the Oracle Kit?</QUESTION>
    <ANSWER>The Oracle XML Developer Kit (XDK) contains the basic building 
blocks for reading, manipulating, transforming and viewing XML documents. To 
provide a broad variety of deployment options, the Oracle XDK is available for 
Java, C, C++ and PL/SQL.</ANSWER></FAQ>');

commit;

begin
 ctx_ddl.create_section_group('faq_auto_section_group','auto_section_group');
end;
/

create index faq_idx on faq(xml_desc) indextype is ctxsys.context 
 parameters('section group faq_auto_section_group')
/

2 Compile showxml.psp

To develop and deploy PL/SQL Server Pages (PSP), you need the Oracle server at version 8.1.6 or later, together with a PL/SQL web gateway. Currently, the web gateways are Oracle Internet Application Server (iAS), the WebDB PL/SQL Gateway, and the OAS PL/SQL Cartridge. Before you start with PSP, you should have access to both the database server and the web server for one of these gateways.

To compile showxml.psp,use the command:

loadpsp -replace -user username/passwd showxml.psp

See Also:

Oracle9i Application Developer's Guide - Fundamentals for information about using PSP programs.

showxml.psp


<%@ plsql procedure="showxml" %>
<%@ plsql parameter="id" default="null" %>
<%! v_text      varchar2(32767); %>
<%! v_text_xml  varchar2(32767); %>

<% 
  select xml_desc into v_text from faq where tk=id;
%>
  
<html>
<title>Show xml </title>
<body>
<h3>XML Content</h3>
<hr>

  <pre> 
  <% v_text_xml := replace(v_text,'<','&lt;'); %>

  <%= v_text_xml %>
  </pre>

</body>
</html>

3 Compile faqsearch.psp

Open a URL in your browser to access faqsearch.psp, as follows:

http://myserver_and_directory/faqsearch

faqsearch.psp


<%@ plsql parameter="query" default="null" %>
<%@ plsql parameter="tagvalue" default="null"  %>

<%! v_results numeric := 0; %>

<html>
<head>
   <title>FAQ Search </title>
</head>
<body>

<%
  If query is null Then
%>

  <center>
  <h3>FAQ Search </h3>
  <form method=post action="faqsearch">
  <p>
  Search for <input type=Text size=15 maxlength=25 name="query">
  within     
  <select name="tagvalue">

  <%
-- generates the pull-down menu with the following select
      for c in (select token_text from DR$FAQ_IDX$I
                where token_type=2)
        loop
  %>
          <option value="<%= c.token_text %>"><%= c.token_text %>

  <%  end loop; %>
     
     <option value="faq@Owner">FAQ Owner
  </select>

  <input type=submit value="Search">
  </form>


<% Else %>
   <%!
     color varchar2(6) := 'ffffff';
   %>
   <center>
   <h3>FAQ Search</h3>
   <form method=post action="faqsearch">
   <p>
   Search for 
   <input type=text size=15 maxlength=25 name="query" value=<%= query %>>
   within
   <select name="tagvalue">

   <%
-- generates the pull-down menu with the following select
      for c in (select token_text from DR$FAQ_IDX$I
                where token_type=2)
      loop
   %>
          <option value="<%= c.token_text %>"><%= c.token_text %>

   <% end loop; %>

      <option value="faq@Owner">FAQ Owner
   </select>

   <input type=submit value="Search">
   </form>

   <p>

   <%! v_query        varchar2(400);  %>
   <%! v_desc_substr  varchar2(1000); %>
   <%! v_desc_item    varchar2(1000); %>
   <%! v_desc_start   number;         %>
   <%! v_desc_final   number;         %>
   <%! v_title_substr varchar2(1000); %>
   <%! v_title_start  number;         %>
   <%! v_title_final  number;         %>


   <% 
     
     v_query := query || ' WITHIN '|| tagvalue;

-- Text query using WITHIN for XML documents
     for doc in (select tk, xml_desc 
                    from faq where contains(xml_desc,v_query) >0 
                ) 
        loop
          v_results := v_results + 1;
          if v_results = 1 then 
   %>

             <table border="1" cellpadding="4" cellspacing="0">
               <tr bgcolor="#6699CC">
                 <th>Title</th>
                 <th>Description</th>
               </tr>
   <%     end if; %>

          <tr bgcolor="#<%= color %>">

   <% 
           v_title_start  := instr(doc.xml_desc,'<TITLE>');
           v_title_final  := instr(doc.xml_desc,'</TITLE>');
           v_title_substr := substr(doc.xml_desc,v_title_
start+length('<TITLE>'),v_title_final - length('</TITLE>') - v_title_start+1);

           v_desc_start   := instr(doc.xml_desc,'<DESCRIPTION>');
           v_desc_final   := instr(doc.xml_desc,'</DESCRIPTION>');
           v_desc_substr  := substr(doc.xml_desc,v_desc_
start+length('<DESCRIPTION>'),v_desc_final - length('</DESCRIPTION>') - v_desc_
start+1); 
   %>

           <td>
             <a href="showxml?id=<%= doc.tk %>"><%= v_title_substr %></a>
           </td>

   <%      v_desc_item := replace(v_desc_substr,'<','&lt;'); %>

           <td>
             <%= v_desc_item %> 
           </td>
          </tr>

   <%
          if (color = 'ffffff') then /* alternate row color */
             color := 'eeeeee';
           else
             color := 'ffffff';
          end if;
        end loop;
   %>

        </table>
     </center>

  <% 
     if v_results = 0 then %>
       <center><h3>Found 0 records for your query</h3></center>
  <% end if; %>

<% End if;%>
<p>
<hr>
<p>
</body>
</html>

Frequently Asked Questions (FAQs): Oracle Text

This FAQ section is divided into the following categories:

Searching Attribute Values

Can I Build Indexes on Attribute Values?

Question

Currently Oracle Text (intermedia Text) has the option to create indexes based on the content of a section group. But most XML Elements are of the type of Element. So, the only option for searching would be attribute values. Can I build indexes on attribute values?

Answer

Releases from 8.1.6 and higher allow attribute indexing.

See the following site:

http://otn.oracle.com/products/intermedia/htdocs/text_training_816/Samples/imt_816_techover.html#SCN

General Oracle Text Questions

Can XML Documents Be Queried Like Table Data?

Question

I know that an intact XML document can be stored in a CLOB in ORACLE's XML solution.

Can XML documents stored in a CLOB/BLOB be queried like table schema? For example:
```
[XML document stored in BLOB]...<name id="1111"><first>lee</first>            
<sencond>jumee</second></name>...
```
Can value(lee, jumee) be queried by elements, attributes and the structure of XML document?
If some element or attribute is inserted/updated/deleted, must the whole document be updated? Or can insert/update/delete function as in table schema?
About locking, if we manage an XML document stored in a CLOB/BLOB, can nobody access the same XML document?

Answer

Using Oracle Text (intermedia Text), you can find this document with a query such as:
```
lee within first or this:jumee within second or this:1111 within name@id
```
you can combine these like this:
```
lee within first and jumee within secondor this:(lee within first) within 
name.
```
For more information, please read the "interMedia Text Technical Overview" for 8.1.5 and 8.1.6 available on OTN.
Oracle Text (intermedia Text) indexes CLOB/BLOB, and this has no knowledge about XML specifically, so you cannot really change individual elements. You have to edit the document as a whole.
Just like any other CLOB, if someone is writing to the CLOB, they have it locked and nobody else can write to the CLOB. Other users can READ it, but not write. This is basic LOB behavior.

Another alternative is to decompose the XML document and store the information in relational fields. Then you could modify individual elements, have element-level simultaneous access, and so on. In this case, using something called the USER_DATASTORE, you can use PL/SQL to reconstitute the document to XML for text indexing. Then, you get text search as if it were XML, but data management as if it were relational data. Again, see interMedia Text Technical Overview for more information.

http://otn.oracle.com/products/text.

Can we Search Based on Structural Conditions?

Question

Is it possible for Oracle Text (intermedia Text) to index XML such as:

2/7/1968

and then process a query such as:

Who has brown hair, that is, select name from person where hair.color = "BROWN"

Answer

Searches based on structural conditions are not yet available through Oracle Text (intermedia Text). Attribute searches are supported fromOracle8i Release 2 (8.1.6). For reference you should not put data in attributes as that will not be compliant with XML Schema when it becomes a recommendation.

How Can I Searching XML Documents and Return a Zone?

Question

I need to store a large XML file in Oracle8i, search it, and return a specific tagged area.Using Oracle Text (intermedia Text) some of this is possible:

I can store an XML file in a CLOB field
I can index it with ctxsys.context
I can create <Zones> and <Fields> to represent the Tags in my XML fileEx. ctx_ddl.add_zone_section(xmlgroup,"dublincore", dc);
I can search for text within a Zone or fieldEx. Select title from mytable where CONTAINS(textField,"some words WITHIN doubleness")

How do I return a zone or a field based on a text search?

Answer

Oracle Text (intermedia Text) will only return the "hits". You will need to subsequently parse the CLOB to extract a section.

Loading XML Documents into the Database and Searching with Oracle Text

Question

How do I insert XML documents into a database? I need to insert the XML document "as is" in column of datatype CLOB into a table.

Answer

Oracle's XML SQL Utility for Java offers a command-line utility that can be used for loading XML data. More information can be found on the XML SQL Utility at: http://otn.oracle.com/tech/xml and here in Chapter 7, "XML SQL Utility (XSU)".

You can insert the XML documents as you would any text file. There is nothing special about an XML-formatted file from a CLOB perspective.

Question 2

Oracle Text (intermedia Text) can be used to index and search XML stored in CLOBs? How can we get started?

Answer 2

Versions of Oracle Text (intermedia Text) before Oracle8i Release 2 (8.1.6) only allowed tag-based searching. The current version allows for XML structure and attribute based searching. There is documentation on how to have the index built and the SQL usage in the Oracle Text documentation.

See Also:

Oracle Text Reference.

How Do I Search XML using the WITHIN Operator?

Question

I have this xml:

<person>
  <name>efrat</name> 
 <childrens> 
     <child> 
         <id>1</id> 
         <name>keren</name> 
     </child>
  </childrens>
</person>

How do I find the person who has a child name keren but not the person's name keren? Assuming I defined every tag with the add_zone_section that can be nested and can include themselves.

Answer

Use selectSingleNode or selectNodes with XPATH string as a parameter.eg. selectSingleNode("//child/name[.='keren'])Also, I recommend making id as an attribute instead of a tag.

Oracle Text (intermedia Text) and XML

Question

Where can I get good samples of searching XML with Oracle Text.

Answer

See the following manuals:

Oracle Text (intermedia Text) and XML: Add_field_section

Question

Can Oracle Text (intermedia Text) recognize the tags in my XML document or do I have to use the add_field_section command for each tag in the XML document? My XML documents have hundreds of tags. Is there an easy way to do this?

Answer

Which version of the database are you using? I believe you need to do it for 8.1.5 but not in Oracle8i Release 2(8.1.6). You can use AUTO_SECTION_GROUP in 8.1.6

XSQL Servlet ships with a complete (albeit simple from the interMedia standpoint) example of a SQL script that creates a complex XML Datagram out of Object Types, and then creates an Oracle Text (intermedia Text) index on the XML Document Fragment stored in the "Insurance Claim" type.

If you download the XSQL Servlet, and look at the file ./xsql/demo/insclaim.sql you'll be able to see the interMedia stuff at the bottom of the file. One of the key new features in interMedia in Oracle8i Release 2(8.1.6) is the AUTO Sectioner for XML.

Can I Do Range Searching with Oracle Text?

Question

I have an XML document that I have stored in CLOB. I have also created the indexes on the tags using section_group, and so on. One of the tags is <SALARY> </SALARY> I want to write an SQL statement so as to select all the records that have salary lets say > 5000. How do I do this? I cannot use WITHIN operator. I want to interpret the value present in this tag as a number. This could be floating point number also since this is salary.

Answer

You cannot do this in Oracle Text. Range search is not really a text operation. The best solution is to use the other Oracle XML parsing utilities to extract the salary into a NUMBER field -- then you can use Oracle Text (intermedia Text) for text searching, and normal SQL operators for the more structured fields, and achieve the same results.

Can Oracle Text Do Section Extraction?

Question

We are storing all our documents in XML format in a CLOB. Are there utilities available in Oracle perhaps interMedia to retrieve the contents a field at a time, that is given a field name, get the text between tags, as opposed to retrieving the whole document and traversing the structure?

Answer

interMedia does not do section extraction. See the XML SQL Utility for this in Chapter 7, "XML SQL Utility (XSU)".

Can I Create a Text Index on Three Columns?

Question

I have created a view based on 7-8 tables and it has columns like, custordnumber, product_dscr, qty, prdid,shipdate, ship_status, and so on. I need to create an Oracle Text index on the three columns:

custordnumber
product_dsc
ship_status

Is there a way to create a Text index on these columns?

Answer

The short answer is yes. You have two options:

Use the USER_DATASTORE object to create a concatenated field on the fly during indexing;
Concatenate your fields and store them in an extra CLOB field in one of your tables. Then create the index on the CLOB field. If you're using Oracle8i Release 2(8.1.6) or higher, then you also have the option of placing XML tags around each field prior to concatenation. This gives you the capability of searching WITHIN each field.

How Fast is Oracle9i at Indexing Text and Can I Just Enable Boolean Searches?

Question

We are using mySQL to do partial indexing of 9 million Web pages a day. We are running on a 4-processor Sparc 420 and unable to do full text indexing. Can Oracle8i or Oracle9i do this?

We are not interested in transactional integrity, applying any special filters to the text pages, or in doing any other searching other than straight boolean word searches (no scoring, no stemming, no fuzzy searches, no proximity searches, and so on).

I have are the following questions:

Will Oracle8i or oracle9i be any faster at indexing text than mySQL?
If so, is there a way to disable all the features of text indexing except for boolean word searches?

Answer

Yes. Oracle Text (interMedia Text) can create a full-text index on 9 million web pages - and pretty quickly. In a benchmark on a large Sun box, we indexed 100Gig of web pages (about 15 million) in 7 hours. We can also do partial indexing via regular DML or (in 9i) via partitioning.

You can do "indexing light" to some extent - you can disable theme indexing, you do not need to filter documents if they are already ASCII/HTML/XML, and most common expansions - fuzzy, stemming, proximity - are done at query time.

How Can We Index XML Documents in Different Languages?

Question

We know that Oracle 8i Release 2 (8.1.6) allows multiple language records to be stored in the same table (and column) and interMedia handles the index appropriately based on the language setting for each row (using the multi-lexer feature).

Currently we use one CLOB column in the table and it is indexed using Oracle Text. The column content is in XML (tagged) format and we use fields, sections and zone functions for indexing and search. This works perfectly as we only have a single language data in the database (and we have different database for different languages and sites) and we are currently using Oralce8i Release 1 (8.1.5) so we have NLS_LANG appropriately set for indexing and searches work correctly for individual languages.

However, we now have to store multi-lingual data in the same table (and column). Individual data elements may also be in different languages. For example, this is a record for a book that has a French title but Spanish authors. At present we have all this information in the CLOB column separated by fields/sections. My questions are:

I presume there is no way to specify language for individual sections within an index in Oracle8i Release 2 (8.1.6) Is this correct?
We could separate out all the fields that could potentially be in different language into different columns in the same table and then have a corresponding language column for each of those columns and use the multi-lexer functionality to build separate indexes. Is this assumption correct or recommended?
If we do as described above, then we need to have multiple CONTAINS clauses when searching across columns, which can adversely affect performance.
How best we can approach this issue?

Answer

Correct.

2) - 3) You have correctly identified the potential problem.

Searching XML Documents in CLOBs

How Do I Search CLOBs Using Oracle Text?

Question

How would I define interMedia parameters so that I would be able to search my CLOB column for records that contained "aorta" and "damage". For example using the following XML (DTD implied):

WellKnownFileName.gif   echocardiogram   aorta

This is an image of the vessel damage. It would be nice to see a simple (or complicated) example of an XML interMedia implementation.

I assume there is no need to setup the ZONE or FIELDS.....Is this the case?

Answer

If you save an XML Document fragment in a CLOB, and enable an Oracle Text (intermedia Text) XML index on it, then you can do an SQL query which uses the CONTAINS() operator as the following query does:

Assume you have a document like an insurance claim:

77804
1999-01-01 00:00:00.0                     8895                           1044               
Paul               Astoria
123 Cherry Lane                  SF                  CA                  94132                                                                        
1999-01-05 00:00:00.0               7600               JCOX                                          
It was becase of Faulty Brakes

If you store the content as a document fragment in a CLOB, then you can do a query like the following (assuming everything else you store in relational tables):

REM Select the SUM of the amounts of
REM  all settlement payments approved by "JCOX" 
REM  for claims whose  relates to Brakes.
select sum(n.amount) as TotalApprovedAmount  
  from insurance_claim_view v, TABLE(v.settlements) n 
  where n.approver = 'JCOX'  
  and contains(damageReport,'Brakes within Cause') >

How Can I Search Different XML Documents Stored in CLOBs With Different DTDs?

Question

If I store XML in CLOBs and use the DOM or SAX to reparse the XML later as needed.How can I search this document repository? Oracle Text (intermedia Text) seems ideal. Do you have an example of setting this up using intermedia in Oracle8i, demonstrating how to define the XML_SECTION_GROUP and where to use a ZONE as opposed to a FIELD, and so on? For example:

How would I define Intermedia parameters so that I would be able to search my CLOB column for records that had the "aorta" and "damage" in the using the following XML (DTD implied) WellKnownFileName.gif echo cardiogram aorta This is an image of the vessel damage.

Question

Oracle8i Release 2 (8.1.6) and higher allow searching within attribute text. That's something like: state within book@author. Oracle now offers attribute value sensitive search, more like the following:

state within book[@author = "Eric"]:

begin  ctx_ddl.create_section_group('mygrp','basic_section_group');
  ctx_ddl.add_field_section('mygrp','keyword','keyword');
  ctx_ddl.add_field_section('mygrp','caption','caption');
end;
create index myidx on mytab(mytxtcolumn)indextype is ctxsys.contextparameters   
('section group mygrp');
select * from mytab where contains(mytxtcolumn, 'aorta within keyword')>0;
options:

Use XML section group instead of basic section group if your tags have attributes or you need case-sensitive tag detection
Use zone sections instead of field sections if your sections overlap, or if you need to distinguish between instances. For instance, keywords. If keywords is a field section, then (aorta and echo cardiogram) within keywords finds the document. If it is a zone section, then it does not, because they are not in the SAME instance of keywords.

It is not so clear. It looks to me like this example is trying to find instances of elements containing "damage" that have a sibling element containing "aorta" within the same record. It's not clear what exactly he means by "record".

If each record equates to the in his example, and there can be multiple records in a single XML LOB, than I don't see how you could do this search with interMedia.

If there is only one per CLOB/row, than perhaps you could find it by ANDing two context element queries. But that would still be a sloppy sort of xml search relying on some expected limitations of the situation more so than the structural composition actually called for.

Storing an XML Document in CLOB: Using Oracle Text (intermedia Text)

Question

I need to store XML files (that are present on the file system as of now) into the database. I want to store the whole document. What I mean is that I do not want to break the document as per the tags and then store the info in separate tables/fields. Rather I want to have a universal table, that I can use to store different XML documents. I think internally it will be stored in a CLOB type of field in my case. My XML files will always contain ASCII data.

Can this be done using interMedia. Should we be using Oracle Text (intermedia Text) or interMedia Annotator for this? I downloaded Annotator from OTN, but I could not store XML document in the database.

I am trying to store XML document into CLOB column. Basically I have one table with the following definition:

CREATE TABLE xml_store_testing
(
    xml_doc_id  NUMBER,
    xml_doc     CLOB  )

I want to store my XML document in xml_doc field.

I have written another PL/SQL procedure shown below, to read the contents of the XML Document. The XML document is available on the file system. XML document contains just ASCII data - no binary data.

CREATE OR REPLACE PROCEDURE FileExec
(
  p_Directory      IN VARCHAR2, 
  p_FileName       IN VARCHAR2)  
  AS   v_CLOBLocator  CLOB;
       v_FileLocator       BFILE;
BEGIN
    SELECT  xml_doc
    INTO      v_CLOBLocator
    FROM    xml_store_testing
    WHERE  xml_doc_id = 1
    FOR        UPDATE;
    v_FileLocator := BFILENAME(p_Directory, p_FileName);
    DBMS_LOB.FILEOPEN(v_FileLocator, DBMS_LOB.FILE_READONLY);
    dbms_output.put_line(to_char(DBMS_LOB.GETLENGTH(v_FileLocator)));
      DBMS_LOB.LOADFROMFILE(v_CLOBLocator, v_FileLocator,
      DBMS_LOB.GETLENGTH(v_FileLocator));
      DBMS_LOB.FILECLOSE(v_FileLocator);
END FileExec;

Answer

Put the XML documents into your CLOB column, then add an Oracle Text (intermedia Text) index on it using the XML section-group. See the documentation and overview material at http://otn.oracle.com/products/intermedia.

Question 2

When I execute this procedure, it executes successfully. But when I select from the table I see unknown characters in the table in CLOB field. Could this be because of the reason of the character set difference between operating system (where XML file resides) and database (where CLOB data resides).

Answer 2

Yes. If the character sets are different then you probably have to pass the data through UTL_RAW.CONVERT to do a character set conversion before writing to the CLOB.

Can We Only Insert Structured When The Table is Created?

Question

We need to insert data in the Database from an XML file. Currently we only can insert structured data with the table already created. Is this true?

We are working in a law project where we need to store laws that have structured data and unstructured data, and then search the data using Oracle Text (interMedia Text). Can we insert unstructured data too? Or do we need to develop a custom application to do it? Then if we have the data stored with some structured parts and some unstructured parts, can we use Oracle Text to search it? If we stored the unstructured part in a CLOB, and the CLOB has tags, how can we search only data in an specific tag?

Answer

Consider using iFS which allows you to break up a document storing it across tables and in a LOB. Oracle Text can perform data searches with tags and is knowledgeable about the hierachical XML structure. From Oracle8i Release 2 (8.1.6), Oracle Text (intermedia Text) has this capability along with name/value pair attribute searches.

Question 2

Hence, this document breaking is not possible in these cases if I don't create a custom development? Although interMedia does not understand hierachical XML structure, can I do something like this?

<report>
   <day>yesterday</day> there was a disaster <cause>hurricane</cause>
</report>

Indexing with Oracle Text I would like to search LOBs where cause was hurricane. Is this possible?

Answer 2

You can perform that level of searching with the current release of Oracle Text (intermedia Text). Currently to break a document up you have to use the XML Parser with XSLT to create a stylesheet that transforms the XML into DDL. iFS gives you a higher level interface.

Another technique is to use a JDBC program to insert the text of the document or document fragment into a CLOB or LONG column, then do the searching using the CONTAINS() operator after setting up the indexes.

8 Searching XML Data with Oracle Text

Introducing Oracle Text

Accessing Oracle Text

Further Oracle Text Examples

Assumptions Made in this Chapter's Examples

Oracle Text Users and Roles

User CTXSYS

Role CTXAPP

Querying with the CONTAINS Operator

CONTAINS Syntax

Table 8-1 CONTAINS Operator: Syntax Description

Using a Simple SELECT Statement

Using the Score Operator with a Label to Obtain the Relevance

Using the WITHIN Operator to Narrow Query Down to Document Sections

WITHIN Syntax for Section Querying

WITHIN Operator Limitations

XML_SECTION_GROUP

AUTO_ or PATH_SECTION_GROUP

Constraints for Querying Attribute Sections

Using INPATH or HASPATH Operators for Query Searching With XPath-like Expressions

Table 8-2 Path Searching XML Documents Using the INPATH Operator

Using the HASPATH Operator for Path Searching

Using Oracle Text to Search XML Documents

Step 1. Create a Section Preference

Table 8-3 Comparing Oracle Text Section Groups

Deciding Which Section Group to Use

Table 8-4 Guidelines for Choosing XML_, AUTO_, or PATH_ Section Groups

Creating a Section Preference with XML_SECTION_GROUP

Creating a Section Preference with AUTO_SECTION_GROUP

Creating a Section Preference with PATH_SECTION_GROUP

Step 2. Create an Index Using the Section Preference Created in Step 1

Creating an Index Using XML_SECTION_GROUP

Creating an Index Using AUTO_SECTION_GROUP

Creating an Index Using PATH_SECTION_GROUP

Oracle Text Example 1: Creating an Index Using XML_SECTION_GROUP

Oracle Text Example 2: Creating an Index Using AUTO_SECTION_GROUP

Oracle Text Example 3: Creating an Index Using PATH_SECTION_GROUP

Building XML Query Applications with Oracle Text

Querying XML Documents

Distinguishing Tags Across DocTypes

Specifying Doctype Limiters to Distinguish Between Tags

Doctype-Limited and Unlimited Tags in a Section Group

Querying Within Attribute Sections

XML_SECTION_GROUP Attribute Sections

Attribute Value Sensitive Section Search

Dynamic Add Section

Constraints for Querying Attribute Sections

Procedure for Building a Query Application with Oracle Text

Using Table CTX_OBJECTS and CTX_OBJECT_ATTRIBUTES View

Step 1. Create a Preference

CTX_DDL.Create_Preference

Step 2. Set the Preference's Attributes

2.1 Using CTX_DDL.add_zone_section

add_zone_section Guidelines

2.2 Using CTX_DDL.Add_Attr_Section

add_attr_section Guidelines

2.3 Using CTX_DDL.add_field_section

add_field_section Guidelines

How Attr_Section Differs From Field_Section

2.5 Using CtX_DDL.Add_Stop_Section

Step 3. Create Your Query Syntax

Querying Within Attribute Sections

Using XML_SECTION_GROUP and add_attr_section to Aid Querying

Oracle Text Example 4: Querying a... Document

Oracle Text Example 5: Creating an Index and Performing a Text Query

Creating Table explain_ex to Use in this Example

Text Query Using "ABOUT" in the Text Query Expression

Creating Sections in XML Documents that are Document Type Sensitive

Repeated Sections

Overlapping Sections

Nested Sections

Nested Section Query Example

Presenting the Results of Your Query

Case Study: Searching an Online FAQ List Using Oracle Text

Figure 8-1 Online FAQ Search User Interface: Search Options

Figure 8-2 Creating an Online FAQ Search User Interface with Oracle Text: Searching for "XML" Within TITLE

Figure 8-3 Creating an Online FAQ Search User Interface with Oracle Text: Attribute Searching for "Billy" Within "FAQ OWNER"

1 Create and Populate Your FAQ Table. Create an Auto Section Group and Oracle Text Index

faqsearch_install.sql

2 Compile showxml.psp

8
Searching XML Data with Oracle Text