Oracle Text Application Developer's Guide Release 9.0.1 Part Number A90122-01 |
|
Document Section Searching, 4 of 4
Like HTML documents, XML documents have tagged text which you can use to define blocks of text for section searching. The contents of a section can be searched on with the WITHIN or INPATH operators.
For XML searching, you can do the following:
You can set up your indexing operation to automatically create sections from XML documents using the section group AUTO_SECTION_GROUP. The system creates zone sections for XML tags. Attribute sections are created for the tags that have attributes and these sections named in the form tag@attribute.
For example, the following command creates the index myindex on a column containing the XML files using the AUTO_SECTION_GROUP:
CREATE INDEX myindex ON xmldocs(xmlfile) INDEXTYPE IS ctxsys.context PARAMETERS ('datastore ctxsys.default_datastore filter ctxsys.null_filter section group ctxsys.auto_section_group');
You can search XML attribute text in one of two ways:
Consider an XML file that defines the BOOK tag with a TITLE attribute as follows:
<BOOK TITLE="Tale of Two Cities"> It was the best of times. </BOOK>
To define the title attribute as an attribute section, create an XML_SECTION_GROUP and define the attribute section as follows:
begin ctx_ddl.create_section_group('myxmlgroup', 'XML_SECTION_GROUP'); ctx_ddl.add_attr_section('myxmlgroup', 'booktitle', 'book@title'); end;
To index:
CREATE INDEX myindex ON xmldocs(xmlfile) INDEXTYPE IS ctxsys.context PARAMETERS ('datastore ctxsys.default_datastore filter ctxsys.null_filter section group myxmlgroup');
You can query the XML attribute section booktitle as follows:
'Cities within booktitle'
You can search attribute text with the INPATH operator. To do so, you must index your XML document set with the PATH_SECTION_GROUP.
You have an XML document set that contains the <book>
tag declared for different document types. You want to create a distinct book section for each document type.
Assume that mydocname1
is declared as an XML document type (root element) as follows:
<!DOCTYPE mydocname1 ... [...
Within mydocname1
, the element <book>
is declared. For this tag, you can create a section named mybooksec1
that is sensitive to the tag's document type as follows:
beginctx_ddl.create_section_group('myxmlgroup', 'XML_SECTION_GROUP'); ctx_ddl.add_zone_section('myxmlgroup', 'mybooksec1', 'mydocname1(book)');end;
Assume that mydocname2
is declared as another XML document type (root element) as follows:
<!DOCTYPE mydocname2 ... [...
Within mydocname2
, the element <book>
is declared. For this tag, you can create a section named mybooksec2
that is sensitive to the tag's document type as follows:
beginctx_ddl.create_section_group('myxmlgroup', 'XML_SECTION_GROUP'); ctx_ddl.add_zone_section('myxmlgroup', 'mybooksec2', 'mydocname2(book)');end;
To query within the section mybooksec1, use WITHIN as follows:
'oracle within mybooksec1'
XML documents can have parent-child tag structures such as the following:
<A> <B> <C> dog </C> </B </A>
In this example, tag C is a child of tag B which is a child of tag A.
With Oracle Text, you can do path searching with PATH_SECTION_GROUP. This section group allows you to specify direct parentage in queries, such as to find all documents that contain the term dog in element C which is a child of element B and so on.
With PATH_SECTION_GROUP, you can also perform attribute value searching and attribute equality testing.
The new operators associated with this feature are
To enable path section searching, index your XML document set with PATH_SECTION_GROUP.
Create the preference:
begin ctx_ddl.create_section_group('xmlpathgroup', 'PATH_SECTION_GROUP'); end;
Create the index:
CREATE INDEX myindex ON xmldocs(xmlfile) INDEXTYPE IS ctxsys.context PARAMETERS ('datastore ctxsys.default_datastore filter ctxsys.null_filter section group xmlpathgroup');
When you create the index, you can use the INPATH and HASPATH operators.
To find all documents that contain the term dog in the top-level tag <A>:
dog INPATH (/A)
or
dog INPATH(A)
To find all documents that contain the term dog in the <A> tag at any level:
dog INPATH(//A)
This query finds the following documents:
<A>dog</A>
and
<A><B><C>dog</C></B></A>
To find all documents that contain the term dog in a B element that is a direct child of a top-level A element:
dog INPATH(A/B)
This query finds the following XML document:
<A><B>My dog is friendly.</B><A>
but does not find:
<C><B>My dog is friendly.</B></C>
You can test the value of tags. For example, the query:
dog INPATH(A[B="dog"])
Finds the following document:
<A><B>dog</B></A>
But does not find:
<A><B>My dog is friendly.</B></A>
You can search the content of attributes. For example, the query:
dog INPATH(//A/@B)
Finds the document
<C><A B="snoop dog"> </A> </C>
You can test the value of attributes. For example, the query
California INPATH (//A[@B = "home address"])
Finds the document:
<A B="home address">San Francisco, California, USA</A>
But does not find:
<A B="work address">San Francisco, California, USA</A>
You can test if a path exists with the HASPATH operator. For example, the query:
HASPATH(A/B/C)
finds and returns a score of 100 for the document
<A><B><C>dog</C></B></A>
without the query having to reference dog at all.
You can use the HASPATH operator to do section quality tests. For example, consider the following query:
dog INPATH A
finds
<A>dog</A>
but it also finds
<A>dog park</A>
To limit the query to the term dog and nothing else, you can use a section equality test with the HASPATH operator. For example,
HASPATH(A="dog")
finds and returns a score of 100 only for the first document, and not the second.
|
Copyright © 1996-2001, Oracle Corporation. All Rights Reserved. |
|