Oracle8i interMedia Text Reference
Release 2 (8.1.6)

Part Number A77063-01

Library

Product

Contents

Index

Go to previous page Go to beginning of chapter Go to next page

CTX_DOC Package , 5 of 8


MARKUP

The CTX_DOC.MARKUP procedure takes a query specification and a document textkey and returns a version of the document in which the query terms are marked-up. These marked-up terms are either the words that satisfy a word query or the themes that satisfy an ABOUT query.

You can set the marked-up output to be either plaintext or HTML.

You can use one of the pre-defined tagsets for marking highlighted terms, including a tag sequence that enables HTML navigation.

You usually call CTX_DOC.MARKUP after a query, from which you identify the document to be processed.

You can store the marked-up document either in memory or in a result table.

Syntax 1: In-Memory Result Storage

CTX_DOC.MARKUP( 

index_name     IN VARCHAR2, 
textkey        IN VARCHAR2, 
text_query     IN VARCHAR2, 
restab         IN OUT CLOB, 
query_id       IN NUMBER    DEFAULT 0,  
plaintext      IN BOOLEAN   DEFAULT FALSE, 
tagset         IN VARCHAR2  DEFAULT 'TEXT_DEFAULT', 
starttag       IN VARCHAR2  DEFAULT NULL, 
endtag         IN VARCHAR2  DEFAULT NULL, 
prevtag        IN VARCHAR2  DEFAULT NULL, 
nexttag        IN VARCHAR2  DEFAULT NULL);

Syntax 2: Result Table Storage

CTX_DOC.MARKUP( 

index_name     IN VARCHAR2, 
textkey        IN VARCHAR2, 
text_query     IN VARCHAR2, 
restab         IN VARCHAR2, 
query_id       IN NUMBER    DEFAULT 0,  
plaintext      IN BOOLEAN   DEFAULT FALSE, 
tagset         IN VARCHAR2  DEFAULT 'TEXT_DEFAULT', 
starttag       IN VARCHAR2  DEFAULT NULL, 
endtag         IN VARCHAR2  DEFAULT NULL, 
prevtag        IN VARCHAR2  DEFAULT NULL, 
nexttag        IN VARCHAR2  DEFAULT NULL);
index_name

Specify the name of the index associated with the text column containing the document identified by textkey.

textkey

Specify the unique identifier (usually the primary key) for the document.

The textkey parameter can be one of the following:

You toggle between primary key and rowid identification using CTX_DOC.SET_KEY_TYPE.

text_query

Specify the original query expression used to retrieve the document.

restab

You can specify that this procedure store the marked-up text to either a table or to an in-memory CLOB.

To store results to a table specify the name of the table.

See Also:

For more information about the structure of the markup result table, see "Markup Table" in Appendix B

To store results in memory, specify the name of the CLOB locator. If restab is NULL, a temporary CLOB is allocated and returned. You must de-allocate the locator after using it.

If restab is not NULL, the CLOB is truncated before the operation.

query_id

Specify the identifier used to identify the row inserted into restab.

plaintext

Specify TRUE to generate plaintext marked-up document. Specify FALSE to generate a marked-up HTML version of document if you are using the INSO filter or indexing HTML documents.

tagset

Specify one of the following pre-defined tagsets. The second and third columns show how the four different tags are defined for each tagset:

Tagset  Tag  Tag Value 

TEXT_DEFAULT 

starttag 

<<< 

 

endtag 

>>> 

 

prevtag 

 

 

nexttag 

 

HTML_DEFAULT 

starttag 

<B> 

 

endtag 

</B> 

 

prevtag 

 

 

nexttag 

 

HTML_NAVIGATE 

starttag 

<A NAME=ctx%CURNUM><B> 

 

endtag 

</B></A> 

 

prevtag 

<A HREF=#ctx%PREVNUM>&lt;</A> 

 

nexttag 

<A HREF=#ctx%NEXTNUM>&gt;</A> 

starttag

Specify the character(s) inserted by MARKUP to indicate the start of a highlighted term.

The sequence of starttag, endtag, prevtag and nexttag with respect to the highlighted word is as follows:

... prevtag starttag word endtag nexttag...
endtag

Specify the character(s) inserted by MARKUP to indicate the end of a highlighted term.

prevtag

Specify the markup sequence that defines the tag that navigates the user to the previous highlight.

In the markup sequences prevtag and nexttag, you can specify the following offset variables which are set dynamically:

Offset Variable  Value 

%CURNUM 

the current offset number 

%PREVNUM 

the previous offset number 

%NEXTNUM 

the next offset number 

See the description of the HTML_NAVIGATE tagset for an example.

nexttag

Specify the markup sequence that defines the tag that navigates the user to the next highlight tag.

Within the markup sequence, you can use the same offset variables you use for prevtag. See the explanation for prevtag and the HTML_NAVIGATE tagset for an example.

Examples

In-Memory Markup

The following code generates a marked-up document and stores it in memory. The code passes a NULL CLOB locator to MARKUP and then de-allocates the returned CLOB locator after using it.

declare
mklob clob;
amt number := 40;
line varchar2(80);

begin
 ctx_doc.markup('myindex','1','dog & cat', mklob);
 -- mklob is NULL when passed-in, so ctx-doc.markup will allocate a temporary
 -- CLOB for us and place the results there.
 dbms_lob.read(mklob, amt, 1, line);
 dbms_output.put_line('FIRST 40 CHARS ARE:'||line);
 -- have to de-allocate the temp lob
 dbms_lob.freetemporary(mklob);
 end;

Markup Table

Create the highlight markup table to store the marked-up document as follows:

create table markuptab (query_id  number,   
                        document  clob); 

Word Highlighting in HTML

To create HTML highlight markup for the words dog or cat for document 23, issue the following statement:

begin
  ctx_doc.markup(index_name => 'my_index',
                      textkey => '23',
                      text_query => 'dog|cat',
                      restab => 'markuptab',
                      query_id => '1'
                      tagset => 'HTML_DEFAULT');
end;

Theme Highlighting in HTML

To create HTML highlight markup for the theme of politics for document 23, issue the following statement:

begin
  ctx_doc.markup(index_name => 'my_index',
                      textkey => '23',
                      text_query => 'about(politics)',
                      restab => 'markuptab',
                      query_id => '1'
                      tagset => 'HTML_DEFAULT');
end;

Notes

Before CTX_DOC.MARKUP is called, the result table specified in restab must exist.

When textkey is a composite textkey, you must encode the composite textkey string using the CTX_DOC.PKENCODE procedure.

If text_query includes wildcards, stemming, fuzzy matching which result in stopwords being returned, MARKUP does not highlight the stopwords.

If text_query contains the threshold operator, the operator is ignored. The MARKUP procedure always returns highlight information for the entire result set.

When query_id is not specified or set to NULL, it defaults to 0. You must manually truncate the table specified in restab.


Go to previous page Go to beginning of chapter Go to next page
Oracle
Copyright © 1996-2000, Oracle Corporation.

All Rights Reserved.

Library

Product

Contents

Index