Oracle8 ConText Cartridge Application Developer's Guide
Release 2.3

A58164-01

Library

Product

Contents

Index

Prev Next

8
Using ConText Linguistics

This chapter explains how to use the ConText linguistics to generate linguistic output for English text. It also provides some tips and suggestions for building linguistically-enhanced text applications.

The topics covered in this chapter are:

Specifying Linguistic Settings

When a ConText server with the Linguistic personality is started, ConText automatically loads a default setting configuration (GENERIC) from the database. The default setting configuration is active during your database session unless you explicitly specify a label for a different setting configuration with the CTX_LING.SET_SETTINGS_LABEL function.

You can specify one of the two predefined setting configurations (GENERIC or SA) provided with ConText or a custom setting configuration that you create using the administration tool.

To specify a setting configuration for a session, use the CTX_LING.SET_SETTINGS_LABEL procedure with a setting label. For example, to process all-uppercase or all-lowercase text for your current session:

	execute ctx_ling.set_settings_label('SA')

When you specify a setting configuration label, ConText checks the label against the setting configuration that is currently active. If the specified setting configuration is not already active, ConText loads the new settings from the database before any documents are processed by ConText servers with the Linguistic personality.

The specified setting configuration is active for your session until SET_SETTINGS_LABEL is called with a new setting configuration label.

You can use the CTX_LING.GET_SETTINGS_LABEL function to return the label for the active setting configuration for the current session.

Generating Linguistic Output

Before theme and Gist information can be used in an application, you must perform the following tasks:

Creating Output Tables

To create a theme table called CTX_THEMES, issue the following SQL statement:

    create table ctx_themes (
        cid        number,
        pk         varchar2(64),
        theme      varchar2(2000),
        weight     number);

To create a Gist table called CTX_GIST, issue the following SQL statement:

    create table ctx_gist (
        cid        number,
        pk         varchar2(64),
        pov        varchar2(80),
        gist       long); 




Note:

Because the combination of the CID (column ID) and PK (primary key) columns in the output tables uniquely identify each document in a text column, you can use the output tables to store theme and Gist information for multiple text columns. You can also choose to create multiple output tables to store the theme and Gist information separately for each text column.

 

See Also:

For more information about the structure of linguistic output tables, see "Linguistic Output Table Structures" in Appendix A.

 

Creating Composite Textkey Output Tables

To create a theme table whose textkey has two columns, issue the following SQL statement:

    create table ctx_themes (
        cid        number,
        pk1        varchar2(64),
        pk2        varchar2(64),
        theme      varchar2(2000),
        weight     number);

To create a Gist table whose textkey has two columns, issue the following SQL statement:

    create table ctx_gist (
        cid        number,
        pk1        varchar2(64),
        pk2        varchar2(64),
        pov        varchar2(80),
        gist       long);



See Also:

For more information about the structure of linguistic output tables, see "Linguistic Output Table Structures" in Appendix A.

 

Generating Themes and Gists

To generate linguistic output for the documents in a text column, you first call CTX_LING.REQUEST_GISTor CTX_LING.REQUEST_THEMES for each document in the column, then call CTX_LING.SUBMIT to enter these requests in the services queue as a single transaction for that particular document.


Note:

A policy must be defined for a column before you can generate linguistic output for the documents in the column.

 

The following example shows how you could use the procedures and functions in CTX_LING package to generate linguistic output:

declare handle number;
begin
ctx_ling.request_themes(
   'CTXSYS.DOC_POLICY',
   '7039',
   'CTXSYS.CTX_THEMES');
ctx_ling.request_gist(
   'CTXSYS.DOC_POLICY',
   '7039',
   'CTXSYS.CTX_GIST');
handle := ctx_ling.submit; 
end;

The first two calls request themes and Gist output for document 7039 in the text column for the DOC_POLICY policy. These procedures store the themes and Gists in the linguistic output tables (ctx_themes and ctx_gists), which were created in the previous step.

The final API call submits the requests as one batch request to the services queue and returns a handle which can be used to monitor the status of the request. Because the two requests are submitted as one batch request, ConText parses the document only once while still generating both theme and Gist output.

Generating Theme Hierarchical Information

By default, ConText generates single theme names when you request a list of themes with CTX_LING.REQUEST_THEMES. To generate the hierarchical theme information with theme names, you must set the full themes flag to TRUE with CTX_LING.SET_FULL_THEMES.

For example, the following SQL statements generate and output single theme information for a document identified by pk:

SQL> exec ctx_ling.request_themes('ctx_thidx', pk, 'ctx_themes')  
SQL> exec ctx_ling.submit(200)  
SQL> select theme from ctx_themes;  
 
THEME 
-------------------------------------------------------------------------------
NASDAQ - National Association of Securities Dealers Automated Quotation System 
stocks 
indexes 
weakness 
composites 
prices 
franchises 
shares 
cellularity 
declining issues 
measures 
analysts 
OTC 
purchases 
Wall Street 
lows 
 
16 rows selected. 

However, when you set the full themes flag to TRUE, ConText generates theme hierarchical information:

SQL> exec ctx_ling.set_full_themes(TRUE)  
SQL> exec ctx_ling.request_themes('ctx_thidx', pk, 'ctx_themes')  
SQL> exec ctx_ling.submit(200)  
SQL> select theme from ctx_themes  
  
THEME 
-------------------------------------------------------------------------------
:stock market:NASDAQ - National Association of Securities Dealers Automated 
Quotation System: 
:stock market:stocks: 
:catalogs, itemization:indexes: 
:weakness, fatigue:weakness: 
:combination, mixture:composites: 
:retail trade industry:prices: 
:business fundamentals:franchises: 
:possession, ownership:shares: 
:cellularity: 
:stock market:declining issues: 
:analysis, evaluation:measures: 
:analysis, evaluation:analysts: 
:OTC: 
:general commerce:purchases: 
:general investment:Wall Street: 
:bottoms, undersides:lows: 

Generating theme hierarchical information as such helps to match themes with the theme summaries generated with CTX_LING.REQUEST_GIST.


Monitoring the Services Queue

When you submit a request to the services queue with CTX_LING.SUBMIT, a handle is returned. With this handle, you can use procedures in the CTX_SVC package to perform the following tasks:

Monitoring the Status of Requests

To monitor the status of requests in the Services Queue, use the CTX_SVC.REQUEST_STATUS function. This function returns one of the following statuses:

Status   Meaning  

PENDING  

The request has not yet been picked up by a ConText server.  

RUNNING  

The request is being processed by a ConText server.  

ERROR  

The request errored.  

SUCCESS  

The request completed successfully.  

For example, the following PL/SQL procedure submits a request to generate themes and gist for a document with an id of 49. It then checks the status of the request.

 CREATE OR REPLACE PROCEDURE GENERATE_THEMES AS 
  
   v_Handle number; 
   v_Status varchar2(10); 
   v_Time   date; 
   v_Errors varchar2(60);  
  
BEGIN 
  DBMS_OUTPUT.PUT_LINE('Begin generate_themes procedure' ); 
  
  ctx_ling.request_themes('CTXDEMO.DEMO_POLICY', '49', 'CTXDEMO.ctx_themes' ); 
  ctx_ling.request_gist('CTXDEMO.DEMO_POLICY', '49', 'CTXDEMO.ctx_gist' ); 
  v_Handle := ctx_ling.submit; 
  
  DBMS_OUTPUT.PUT_LINE( v_Handle ); 
  
  v_Status := ctx_svc.request_status( v_Handle, v_Time, v_ErrorS );  
  DBMS_OUTPUT.PUT_LINE( v_Status );  
  DBMS_OUTPUT.PUT_LINE( v_Time ); 
  DBMS_OUTPUT.PUT_LINE( substr( v_Errors, 1, 20 ) ); 
  
  
  EXCEPTION 
    WHEN OTHERS THEN 
       DBMS_OUTPUT.PUT_LINE(' Exception handling' ); 
  
END GENERATE_THEMES; 
/ 
 

This procedure binds the return value of REQUEST_STATUS to v_Status for the linguistic request identified by v_Handle. The value for v_Handle is returned by the call to CTX_LING.SUBMIT which placed the requests for the themes and gists in the Services Queue.

Removing Pending Requests

To remove requests with a status of PENDING from the Services Queue, use the CTX_SVC.CANCEL procedure.

For example:

execute ctx_svc.cancel(3321)

In this example, a pending request with handle 3321 is removed from the Services Queue.

If a request has a status of RUNNING, ERROR, or SUCCESS, it cannot be removed from the Services Queue.

Clearing Requests with Errors

To remove requests with a status of ERROR from the Services Queue, use the CTX_SVC.CLEAR_ERROR procedure.

For example:

execute ctx_svc.clear_error(3321)

In this example, a request with handle 3321 is removed from the Services Queue.

If a value of 0 (zero) is specified for the handle, all requests with a status of ERROR are removed from the queue. If a request has a status of PENDING, RUNNING, or SUCCESS, it cannot be removed from the queue using CLEAR_ERROR.

Specifying Completion and Error Procedures

To specify a procedure to be called when a linguistic request completes or errors, use the SET_COMPLETION_CALLBACK and SET_ERROR_CALLBACK procedures in CTX_LING. ConText invokes the procedure defined by SET_COMPLETION_CALLBACK after it processes a linguistic request; ConText invokes the procedure defined by SET_ERROR_CALLBACK when it encounters and error.

The following is an example of how to define and use a completion callback procedure. This example is taken from genling.sql in the ctxling demonstration provided with the ConText distribution package.

For every linguistic request processed, ling_comp_callback keeps track of the number articles processed by decrementing num_docs, previously defined as the number of articles in the table. The procedure also keeps track of the any errors by incrementing num_errors.

create or replace procedure LING_COMP_CALLBACK (
p_handle in number,
p_status in varchar2,
p_errors in varchar2
) IS
l_total number; l_pk varchar2(64); BEGIN -- decrement the count in the tracking table update ling_tracking set num_docs = num_docs - 1; -- if the request errored, mark the errors in the pending table IF (p_status = 'ERROR') then update ling_tracking set num_errros = num_errors + 1; end IF; commit; END; /

The following code is an anonymous PL/SQL block that sets the linguistic completion callback procedure to ling_comp_callback and then generates linguistic output for every document in the articles table:

declare
  cursor c1 is select article_id
                    from articles;
  l_handle number;

begin

-- set the completion callback procdure to keep the pending table
-- in sync with the number of documents processed (completed requests) 
-- and the number of errored requests.

ctx_ling.set_completion_callback('LING_COMP_CALLBACK');
end; -- loop through all articles in the article table, requesting themes -- and gists -- for crec in c1 loop ctx_ling.request_themes('DEMO_POLICY', crec.article_id, 'ARTICLE_THEMES'); ctx_ling.request_gist('DEMO_POLICY', crec.article_id, 'ARTICLE_GISTS'); l_handle := ctx_ling.submit; end loop; end;

Logging Parse Information

At start-up of a ConText server, the logging of linguistic parse information is disabled by default.

To enable logging of the parse information generated by ConText linguistics during a session, use the CTX_LING.SET_LOG_PARSE procedure.

For example:

	execute ctx_ling.set_log_parse('TRUE')

Once you enable parse logging for a session, it is active until you explicitly disable it during the session. You can use the CTX_LING.GET_LOG_PARSE function to know whether parse logging is enabled or disabled for the session.


Attention:

Parse logging is a useful feature if you are having difficulty generating linguistic output and you want to monitor how ConText is parsing your documents; however, parse logging may affect performance considerably. As such, you should only enable parse logging if you encounter problems with generating linguistic output.

 

Combining Theme/Text Queries with Linguistic Output

Theme queries allow you to search a set of documents for a given theme. The result is a hitlist containing the IDs of the documents that match the theme.

Generating list of themes is a good way of extending theme queries. For a document in a theme query hitlist, the user can learn more about the document by reading a list-of themes or Gist.

For example, suppose a theme query on music returns a hitlist containing 20 documents. If these documents are lengthy, the user might not want to read every single document to find out what each is about. Rather than return to the user the document text, you can return a list of themes or a Gist for each document, which the user could skim.

Implementation

Generally, you can generate linguistic output for a document set at two different times:

Generating Linguistic Output at Indexing Time

You can generate linguistic output (creating the list of themes in this case) at indexing time, that is, before the queries are issued against the document set. When you do so, the linguistic output is returned to the user immediately, since the output was already created.

However, while the retrieval time for the linguistic output is good, the drawback to this method is that you have to maintain a permanent theme output table, using your own triggers to keep it updated. A permanent theme table for an entire document set also takes up system disk space.

Generating Linguistic Output at Query-Time

You could also generate a list of themes after executing a query. The advantage of generating themes as needed is that the output table lasts only for the user session; you need not maintain a permanent theme table for all your documents.

However, generating list of themes on the fly takes time depending on the number of documents, the length of the documents, and how your linguistic servers are configured. A user might not want to wait a few minutes to process a large number of documents.

The example below shows how to generate linguistic output after a theme query.

Example

The following PL/SQL code illustrates how to generate a list of themes for every document in a hitlist table. (You can use the same method to loop through any text table, once the text column table has a policy attached to it.)

create or replace procedure get_theme IS
handle number;

cursor ctx_cur is
select textkey from ctx_temp;

BEGIN

ctx_query.contains('DOWTHEME', 'birds', 'ctx_temp');

for ctx_cur_rec in ctx_cur loop
ctx_ling.request_themes('DOWPOLICY' , ctx_cur_rec.textkey, \
       'ctx_themes');
handle:= ctx_ling.submit;
end loop;

END;
/

This routine first declares a cursor that selects the rows from the ctx_temp result table, to be populated with a theme query on birds.

The cursor FOR loop opens the cursor, executing the select statement that copies all textkeys in the ctx_temp table to the cursor. The loop index ctx_cur_rec is implicitly defined as a cursor record of type%ROWTYPE.

Every iteration of the loop calls the CTX_LING.REQUEST_THEMES procedure with the document textkey derived from ctx_cur_rec. Each request is submitted to the services queue with CTX_LING.SUBMIT, which returns a handle.

The theme output is written to the ctx_themes table.





Prev

Next
Oracle
Copyright © 1997 Oracle Corporation.

All Rights Reserved.

Library

Product

Contents

Index