Skip Headers

Oracle® Text Application Developer's Guide
10g Release 1 (10.1)

Part Number B10729-01
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Feedback

Go to previous page
Previous
Go to next page
Next
View PDF

5 Document Presentation

This chapter describes document presentation. The following topics are covered:

5.1 Highlighting Query Terms

In Oracle Text query applications, you can present selected documents with query terms highlighted for text queries or with themes highlighted for ABOUT queries.

You can generate three types of output associated with highlighting: a marked-up version of the document, a plain text version of the document (filtered output), and highlight offset information for the document.

The three types of output are generated by three different procedures in the CTX_DOC (document services) PL/SQL package. In addition, you can obtain plain text and HTML versions for each type of output.

5.1.1 Text highlighting

For text highlighting, you supply the query, and Oracle Text highlights words in document that satisfy the query. You can obtain plain-text or HTML highlighting.

5.1.2 Theme Highlighting

For ABOUT queries, the CTX_DOC procedures highlight and mark up words or phrases that best represent the ABOUT query.

5.1.3 CTX_DOC Highlighting Procedures

There are three highlighting procedures in CTX_DOC:

  • CTX_DOC.HIGHLIGHT

  • CTX_DOC.MARKUP

  • CTX_DOC.FILTER

  • CTX_DOC.POLICY_FILTER

5.1.3.1 Highlight Procedure

Highlight offset information is useful for when you write your own custom routines for displaying documents.

To obtain highlight offset information, use the CTX_DOC.HIGHLIGHT procedure. This procedure takes a query and a document, and returns highlight offset information for either plaintext or HTML formats.

With offset information, you can display a highlighted version of document as desired. For example, you can display the document with different font types or colors rather than using the standard plain text markup obtained from CTX_DOC.MARKUP.


See Also:

Oracle Text Reference for more information about using CTX_DOC.HIGHLIGHT.

5.1.3.2 Markup Procedure

The CTX_DOC.MARKUP procedure takes a document reference and a query, and returns a marked-up version of the document. The output can be either marked-up plaintext or marked-up HTML.

You can customize the markup sequence for HTML navigation.

5.1.3.2.1 CTX_DOC.MARKUP Example

The following example is taken from the Web application described in Appendix A, " CONTEXT Query Application". The procedure showDoc takes an HTML document and a query, creates the highlight markup, and outputs the result to an in-memory buffer. It then uses htp.print to display it in the browser.

procedure showDoc (p_id in varchar2, p_query in varchar2) is

 v_clob_selected   CLOB;
 v_read_amount     integer;
 v_read_offset     integer;
 v_buffer          varchar2(32767);
 v_query           varchar(2000);
 v_cursor          integer;

 begin
   htp.p('<html><title>HTML version with highlighted terms</title>');
   htp.p('<body bgcolor="#ffffff">');
   htp.p('<b>HTML version with highlighted terms</b>');

   begin
     ctx_doc.markup (index_name => 'idx_search_table',
                     textkey    => p_id,
                     text_query => p_query,
                     restab     => v_clob_selected,
                     query_id   => 0,
                     starttag   => '<i><font color=red>',
                     endtag     => '</font></i>');

     v_read_amount := 32767;
     v_read_offset := 1;
     begin
      loop
        dbms_lob.read(v_clob_selected,v_read_amount,v_read_offset,v_buffer);
        htp.print(v_buffer);
        v_read_offset := v_read_offset + v_read_amount;
        v_read_amount := 32767;
      end loop;
     exception
      when no_data_found then
         null;
     end;

     exception
      when others then
        null; --showHTMLdoc(p_id);
   end;
end showDoc;
end;
/
show errors
set define on

See Also:

Oracle Text Reference for more information about CTX_DOC.MARKUP.

5.1.3.3 Filter Procedure

When documents are stored in their native formats such as Microsoft Word, you can use the filter procedure CTX_DOC.FILTER to obtain either a plain text or HTML version of the document.


See Also:

Oracle Text Reference for more information about CTX_DOC.FILTER.

5.1.3.4 CTX_DOC.POLICY_FILTER Procedure

This procedure takes a binary document as BLOB and uses the Inso filter to output text to a CLOB. This procedure is useful with MATCHES query, which can use CLOB data as input. The procedure can also be called from a user datastore procedure as a binary to text filter.

5.2 Obtaining Lists of Themes, Gists, and Theme Summaries

The following table describes lists of themes, gists, and theme summaries.

Table 5-1 Lists of Themes, Gists, and Theme Summaries

Output Type Description
List of Themes A list of the main concepts of a document.

You can generate list of themes where each theme is a single word or phrase or where each theme is a hierarchical list of parent themes.

Gist Text in a document that best represents what the document is about as a whole.
Theme Summary Text in a document that best represents a given theme in the document.

To obtain this output, you use procedures in the CTX_DOC supplied package. With this package, you can do the following:

5.2.1 Lists of Themes

A list of themes is a list of the main concepts in a document. Use the CTX_DOC.THEMES procedure to generate lists of themes.


See Also:

Oracle Text Reference to learn more about the command syntax for CTX_DOC.THEMES.

5.2.1.1 In-Memory Themes

The following example generates the top 10 themes for document 1 and stores them in an in-memory table called the_themes. The example then loops through the table to display the document themes.

declare
 the_themes ctx_doc.theme_tab;

begin
 ctx_doc.themes('myindex','1',the_themes, numthemes=>10);
 for i in 1..the_themes.count loop
  dbms_output.put_line(the_themes(i).theme||':'||the_themes(i).weight);
  end loop;
end;

5.2.1.2 Result Table Themes

To create a theme table:

create table ctx_themes (query_id number, 
                         theme varchar2(2000), 
                         weight number);
5.2.1.2.1 Single Themes

To obtain a list of themes where each element in the list is a single theme, issue:

begin
ctx_doc.themes('newsindex','34','CTX_THEMES',1,full_themes => FALSE);
end;
5.2.1.2.2 Full Themes

To obtain a list of themes where each element in the list is a hierarchical list of parent themes, issue:

begin
ctx_doc.themes('newsindex','34','CTX_THEMES',1,full_themes => TRUE);
end;

5.2.2 Gist and Theme Summary

A gist is the text of a document that best represents what the document is about as a whole. A theme summary is the text of a document that best represents a single theme in the document.

Use the procedure CTX_DOC.GIST to generate gists and theme summaries. You can specify the size of the gist or theme summary when you call the procedure.


See Also:

Oracle Text Reference to learn about the command syntax for CTX_DOC.GIST.

5.2.2.1 In-Memory Gist

The following example generates a non-default size generic gist of at most 10 paragraphs. The result is stored in memory in a CLOB locator. The code then de-allocates the returned CLOB locator after using it.

declare
  gklob clob;
  amt number := 40;
  line varchar2(80);

begin
 ctx_doc.gist('newsindex','34','gklob',1,glevel => 'P',pov => 'GENERIC',       numParagraphs => 10);
  -- gklob is NULL when passed-in, so ctx-doc.gist will allocate a temporary
  -- CLOB for us and place the results there.
  
  dbms_lob.read(gklob, amt, 1, line);
  dbms_output.put_line('FIRST 40 CHARS ARE:'||line);
  -- have to de-allocate the temp lob
  dbms_lob.freetemporary(gklob);
 end;

5.2.2.2 Result Table Gists

To create a gist table:

create table ctx_gist (query_id  number,
                       pov       varchar2(80), 
                       gist      CLOB);

The following example returns a default sized paragraph level gist for document 34:

begin
ctx_doc.gist('newsindex','34','CTX_GIST',1,'PARAGRAPH', pov =>'GENERIC');
end;

The following example generates a non-default size gist of ten paragraphs:

begin
ctx_doc.gist('newsindex','34','CTX_GIST',1,'PARAGRAPH', pov =>'GENERIC',        numParagraphs => 10);
end;

The following example generates a gist whose number of paragraphs is ten percent of the total paragraphs in document:

begin 
ctx_doc.gist('newsindex','34','CTX_GIST',1, 'PARAGRAPH', pov =>'GENERIC', maxPercent => 10);
end;

5.2.2.3 Theme Summary

The following example returns a theme summary on the theme of insects for document with textkey 34. The default Gist size is returned.

begin
ctx_doc.gist('newsindex','34','CTX_GIST',1, 'PARAGRAPH', pov => 'insects');
end;

5.3 Document Presentation and Highlighting

Typically, a query application enables the user to view the documents returned by a query. The user selects a document from the hit list and then the application presents the document in some form.

With Oracle Text, you can display a document in different ways. For example, you can present documents with query terms highlighted. Highlighted query terms can be either the words of a word query or the themes of an ABOUT query in English.

You can also obtain gist (document summary) and theme information from documents with the CTX_DOC PL/SQL package.

Table 5-2 describes the different output you can obtain and which procedure to use to obtain each type.

Table 5-2 CTX_DOC Output

Output Procedure
Plain text version, no highlights CTX_DOC.FILTER
HTML version of document, no highlights CTX_DOC.FILTER
Highlighted document, plain text version CTX_DOC.MARKUP
Highlighted document, HTML version CTX_DOC.MARKUP
Highlight offset information for plain text version CTX_DOC.HIGHLIGHT
Highlight offset information for HTML version CTX_DOC.HIGHLIGHT
Theme summaries and gist of document. CTX_DOC.GIST
List of themes in document. CTX_DOC.THEMES


See Also:

The Oracle Text Reference

Figure 5-1 shows an original document to which we can apply highlighting, gisting, and theme extraction in the following sections.

Figure 5-1 Sample Document for Highlighting, Gisting, and Theme Extraction

Description of pet01.gif follows
Description of the illustration pet01.gif

5.3.1 Highlighting Example

Figure 5-2 is a screen shot of a query application presenting the document shown in Figure 5-1 with the query term pet highlighted. This output was created using the text query application produced by a wizard described in Appendix A, " CONTEXT Query Application".

Figure 5-2 Query Application Presenting Highlighted Document

Description of pethilit.gif follows
Description of the illustration pethilit.gif

5.3.2 Document List of Themes Example

Figure 5-3 is a screen shot of a query application presenting a list of themes for the document shown in Figure 5-1. This output was created using the text query application produced by a wizard described in Appendix A, " CONTEXT Query Application".

Figure 5-3 Query Application Displaying Document Themes

Description of pettheme.gif follows
Description of the illustration pettheme.gif

5.3.3 Gist Example

Figure 5-4 is a screen shot of a query application presenting a gist of the document shown in Figure 5-1. This output was created using the text query application produced by a wizard described in Appendix A, " CONTEXT Query Application".

Figure 5-4 Query Application Presenting Document Gist

Description of petgist1.gif follows
Description of the illustration petgist1.gif