|Oracle Text Application Developer's Guide
Part Number A96517-01
This chapter describes document presentation. The following topics are covered:
In Oracle Text query applications, you can present selected documents with query terms highlighted for text queries or with themes highlighted for ABOUT queries.
You can generate three types of output associated with highlighting: a marked-up version of the document, a plain text version of the document (filtered output), and highlight offset information for the document.
The three types of output are generated by three different procedures in the
CTX_DOC (document services) PL/SQL package. In addition, you can obtain plain text and HTML versions for each type of output.
For text highlighting, you supply the query, and Oracle highlights words in document that satisfy the query. You can obtain plain-text or HTML highlighting.
ABOUT queries, the
CTX_DOC procedures highlight and mark up words or phrases that best represent the
There are three highlighting procedures in
Highlight offset information is useful for when you write your own custom routines for displaying documents.
To obtain highlight offset information, use the
CTX_DOC.HIGHLIGHT procedure. This procedure takes a query and a document, and returns highlight offset information for either plaintext or HTML formats.
With offset information, you can display a highlighted version of document as desired. For example, you can display the document with different font types or colors rather than using the standard plain text markup obtained from
Oracle Text Reference for more information about using
CTX_DOC.MARKUP procedure takes a document reference and a query, and returns a marked-up version of the document. The output can be either marked-up plaintext or marked-up HTML.
You can customize the markup sequence for HTML navigation.
Oracle Text Reference for more information about
When documents are stored in their native formats such as Microsoft Word, you can use the filter procedure
CTX_DOC.FILTER to obtain either a plain text or HTML version of the document.
Oracle Text Reference for more information about
The following table describes list of themes, gists, and theme summaries.
List of Themes
You can generate list of themes where each theme is a single word or phrase or where each theme is a hierarchical list of parent themes.
Text in a document that best represents what the document is about as a whole.
Text in a document that best represents a given theme in the document.
To obtain this output, you use procedures in the
CTX_DOC supplied package. With this package, you can do the following:
ROWIDin addition to primary key
A list of themes is a list of the main concepts in a document. Use the
CTX_DOC.THEMES procedure to generate lists of themes.
Oracle Text Reference to learn more about the command syntax for
The following example generates the top 10 themes for document 1 and stores them in an in-memory table called
the_themes. The example then loops through the table to display the document themes.
declare the_themes ctx_doc.theme_tab; begin ctx_doc.themes('myindex','1',the_themes, numthemes=>10); for i in 1..the_themes.count loop dbms_output.put_line(the_themes(i).theme||':'||the_themes(i).weight); end loop; end;
To create a theme table:
To obtain a list of themes where each element in the list is a single theme, issue:
To obtain a list of themes where each element in the list is a hierarchical list of parent themes, issue:
A gist is the text of a document that best represents what the document is about as a whole. A theme summary is the text of a document that best represents a single theme in the document.
Use the procedure
CTX_DOC.GIST to generate gists and theme summaries. You can specify the size of the gist or theme summary when you call the procedure.
Oracle Text Reference to learn about the command syntax for
The following example generates a non-default size generic gist of at most 10 paragraphs. The result is stored in memory in a
CLOB locator. The code then de-allocates the returned
CLOB locator after using it.
declare gklob clob; amt number := 40; line varchar2(80); begin ctx_doc.gist('newsindex','34','gklob',1,glevel => 'P',pov => 'GENERIC', numParagraphs => 10); -- gklob is NULL when passed-in, so ctx-doc.gist will allocate a temporary -- CLOB for us and place the results there. dbms_lob.read(gklob, amt, 1, line); dbms_output.put_line('FIRST 40 CHARS ARE:'||line); -- have to de-allocate the temp lob dbms_lob.freetemporary(gklob); end;
To create a gist table:
The following example returns a default sized paragraph level gist for document 34:
The following example generates a non-default size gist of ten paragraphs:
begin ctx_doc.gist('newsindex',34,'CTX_GIST',1,'PARAGRAPH', pov =>'GENERIC', numParagraphs => 10); end;
The following example generates a gist whose number of paragraphs is ten percent of the total paragraphs in document:
begin ctx_doc.gist('newsindex',34,'CTX_GIST',1, 'PARAGRAPH', pov =>'GENERIC', maxPercent => 10); end;
The following example returns a theme summary on the theme of insects for document with textkey 34. The default Gist size is returned.