8.3 Obtaining Lists of Themes, Gists, and Theme Summaries

The following table describes lists of themes, gists, and theme summaries.

Table 8-1 Lists of Themes, Gists, and Theme Summaries

Output Type Description

List of Themes

A list of the main concepts of a document.

Each theme is a single word, a single phrase, or a hierarchical list of parent themes.

Gist

Text in a document that best represents what the document is about as a whole.

Theme Summary

Text in a document that best represents a given theme in the document.

To obtain lists of themes, gists, and theme summaries, use procedures in the CTX_DOC package to:

  • Identify documents by ROWID in addition to primary key

  • Store results in-memory for improved performance

8.3.1 Lists of Themes

A list of themes is a list of the main concepts in a document. Use the CTX_DOC.THEMES procedure to generate lists of themes.

See Also:

Oracle Text Reference to learn more about the command syntax for CTX_DOC.THEMES

The following in-memory theme example generates the top 10 themes for document 1 and stores them in an in-memory table called the_themes. The example then loops through the table to display the document themes.

declare
 the_themes ctx_doc.theme_tab;

begin
 ctx_doc.themes('myindex','1',the_themes, numthemes=>10);
 for i in 1..the_themes.count loop
  dbms_output.put_line(the_themes(i).theme||':'||the_themes(i).weight);
  end loop;
end;

The following example create a result table theme:

create table ctx_themes (query_id number, 
                         theme varchar2(2000), 
                         weight number);

In this example, you obtain a list of themes where each element in the list is a single theme:

begin
ctx_doc.themes('newsindex','34','CTX_THEMES',1,full_themes => FALSE);
end;

In this example, you obtain a list of themes where each element in the list is a hierarchical list of parent themes:

begin
ctx_doc.themes('newsindex','34','CTX_THEMES',1,full_themes => TRUE);
end;

8.3.2 Gist and Theme Summary

A gist is the text in a document that best represents what the document is about as a whole. A theme summary is the text in a document that best represents a single theme in the document.

Use the CTX_DOC.GIST procedure to generate gists and theme summaries. You can specify the size of the gist or theme summary when you call the procedure.

See Also:

Oracle Text Reference to learn about the command syntax for CTX_DOC.GIST

In-Memory Gist Example

The following example generates a nondefault size generic gist of at most 10 paragraphs. The result is stored in memory in a CLOB locator. The code then de-allocates the returned CLOB locator after using it.

declare
  gklob clob;
  amt number := 40;
  line varchar2(80);

begin
 ctx_doc.gist('newsindex','34','gklob',1,glevel => 'P',pov => 'GENERIC',       numParagraphs => 10);
  -- gklob is NULL when passed-in, so ctx-doc.gist will allocate a temporary
  -- CLOB for us and place the results there.
  
  dbms_lob.read(gklob, amt, 1, line);
  dbms_output.put_line('FIRST 40 CHARS ARE:'||line);
  -- have to de-allocate the temp lob
  dbms_lob.freetemporary(gklob);
 end;

Result Table Gists Example

To create a gist table, enter the following:

create table ctx_gist (query_id  number,
                       pov       varchar2(80), 
                       gist      CLOB);

The following example returns a default-sized paragraph gist for document 34:

begin
ctx_doc.gist('newsindex','34','CTX_GIST',1,'PARAGRAPH', pov =>'GENERIC');
end;

The following example generates a nondefault size gist of 10 paragraphs:

begin
ctx_doc.gist('newsindex','34','CTX_GIST',1,'PARAGRAPH', pov =>'GENERIC',        numParagraphs => 10);
end;

The following example generates a gist whose number of paragraphs is 10 percent of the total paragraphs in the document:

begin 
ctx_doc.gist('newsindex','34','CTX_GIST',1, 'PARAGRAPH', pov =>'GENERIC', maxPercent => 10);
end;

Theme Summary Example

The following example returns a theme summary on the theme of insects for document with textkey 34. The default gist size is returned.

begin
ctx_doc.gist('newsindex','34','CTX_GIST',1, 'PARAGRAPH', pov => 'insects');
end;