4.5 Creating a CONTEXT Index
The CONTEXT index type is well suited for indexing large, coherent documents in formats such as Microsoft Word, HTML, or plain text.
With a CONTEXT index, you can also customize your index in a variety of ways. The documents must be loaded in a text table.
4.5.1 CONTEXT Index and DML
A CONTEXT index is not transactional. When you delete a record, the index is changed immediately. That is, your session no longer finds the record from the moment you make the change, and other users cannot find the record after you commit. For inserts and updates, the new information is not visible to text searches until an index synchronization has occurred. Therefore, when you perform inserts or updates on the base table, you must explicitly synchronize the index with CTX_DDL.SYNC_INDEX.
See Also:
4.5.2 Default CONTEXT Index Example
The following statement creates a default CONTEXT index called myindex on the text column in the docs table:
CREATE INDEX myindex ON docs(text) INDEXTYPE IS CTXSYS.CONTEXT;
When you use the CREATE INDEX statement without explicitly specifying parameters, the system completes the following actions by default for all languages:
-
Assumes that the text to be indexed is stored directly in a text column. The text column can be of type
CLOB,BLOB,BFILE,VARCHAR2,orCHAR. -
Detects the column type and uses filtering for the binary column types of
BLOBandBFILE.Most document formats are supported for filtering. If your column is plain text, the system does not use filtering. -
Assumes that the language of the text to index is the language specified in your database setup.
-
Uses the default stoplist for the language specified in your database setup. Stoplists identify the words that the system ignores during indexing.
-
Enables fuzzy and stemming queries for your language, if this feature is available for your language.
You can always change the default indexing behavior by customizing your preferences and specifying those preferences in the parameter string of CREATE INDEX.
See Also:
Oracle Text Reference to learn more about configuring your environment to use the AUTO_FILTER filter
4.5.3 Incrementally Creating a CONTEXT Index
The ALTER INDEX and CREATE INDEX statements support incrementally creating a CONTEXT index.
You can incrementally create Oracle Text indexes, which means that the index structure is immediately created but the data is not populated during the index creation or rebuild process. You populate the index later at a suitable time. This procedure is useful for creating indexes in large installations that cannot afford to have the indexing process running continuously. It provides finer control over the creation of indexes, allowing you to avoid building indexes in a single operation.
Incremental index creation involves the following steps:
-
Create an empty index:
If you specify the
NOPOPULATEkeyword at the time of index creation or rebuild, it only creates metadata for the index tables but does not populate them.-
Global index:
For a global index, use
CREATEINDEXto support theNOPOPULATEkeyword in theREPLACEparameter of theREBUILDclause. -
Local index partition:
For a local index partition, modify the
ALTERINDEX...REBUILDpartition...parameters('REPLACE...') parameter string to support theNOPOPULATEkeyword.For a partition on a local index,
CREATEINDEX...LOCAL... (partition...parameters('NOPOPULATE')) is supported. The partition-levelPOPULATEorNOPOPULATEkeywords override anyPOPULATEorNOPOPULATEspecified at the index level.
-
-
Place all ROWIDs into the pending queue:
Use the
CTX_DDL.POPULATE_PENDINGprocedure to populate the pending queues with every ROWID in the base table or table partition. -
Populate the index:
Use the
CTX_DDL.SYNC_INDEXprocedure to populate the index with the queued data.The
SYNC_INDEXprocedure includes themaxtimeargument that indicates a suggested time limit in minutes for the operation. The indexing process runs in an estimate of the givenmaxtimeinstead of running to completion. You might need to run multipleSYNC_INDEXcalls until the index is fully synced.You can choose to run both the
POPULATE_PENDINGandSYNC_INDEXcalls separately so that the population of the pending queue and the population of the index happen at different times, thereby optimizing system performance.
Example 4-1 Incrementally Build an Empty Global Index
-- Create an empty index
CREATE INDEX ctx_ind ON ctx_tab(doc) INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS ('NOPOPULATE');declare
n_pending number;
function get_pending return number is
n_pending number;
begin
n_pending := 0;
begin
execute immediate 'SELECT COUNT(*) FROM DR$CTX_IND$C' into n_pending;
exception when others then
if (sqlcode != -942) then
raise;
end if;
end;
if (n_pending = 0) then
execute immediate 'SELECT COUNT(*) FROM CTX_USER_PENDING WHERE PND_INDEX_NAME = :1'
into n_pending using 'CTX_IND';
end if;
return n_pending;
end get_pending;
begin
-- Fill in the pending queue
CTX_DDL.POPULATE_PENDING('CTX_IND');
n_pending := get_pending;
while (n_pending > 0) loop
-- Populate the index through sync_index
CTX_DDL.SYNC_INDEX('CTX_IND', maxtime => 1);
n_pending := get_pending;
end loop;
end;
/Related Topics
4.5.4 Custom CONTEXT Index Example: Indexing HTML Documents
To index an HTML document set located by URLs, specify the system-defined preference for the NULL_FILTER in the CREATE INDEX statement.
You can also specify your htmgroup section group that uses HTML_SECTION_GROUP and NETWORK_PREF datastore that uses NETWORK_DATASTORE:
begin
ctx_ddl.create_preference('NETWORK_PREF','NETWORK_DATASTORE');
ctx_ddl.set_attribute('NETWORK_PREF','HTTP_PROXY','www-proxy.us.example.com');
ctx_ddl.set_attribute('NETWORK_PREF','NO_PROXY','us.example.com');
ctx_ddl.set_attribute('NETWORK_PREF','TIMEOUT','300');
end;
begin
ctx_ddl.create_section_group('htmgroup', 'HTML_SECTION_GROUP');
ctx_ddl.add_zone_section('htmgroup', 'heading', 'H1');
end;
You can then index your documents:
CREATE INDEX myindex on docs(htmlfile) indextype is ctxsys.context parameters( 'datastore NETWORK_PREF filter ctxsys.null_filter section group htmgroup' );
Note:
Starting with Oracle Database 19c, the Oracle Text type
URL_DATASTORE is deprecated. Use
NETWORK_DATASTORE instead.
Related Topics
4.5.5 CONTEXT Index Example: Query Processing with FILTER BY and ORDER BY
To enable more efficient query processing and better response time for mixed queries, use FILTER BY and ORDER BY clauses as shown in the following example:
CREATE INDEX myindex on docs(text) INDEXTYPE is CTXSYS.CONTEXT FILTER BY category, publisher, pub_date ORDER BY pub_date desc;
Because you specified the FILTER BY category, publisher, pub_date clause at query time, Oracle Text also considers pushing a relational predicate on any of these columns into the Oracle Text index row source.
Also, when the query has matching ORDER BY criteria, by specifying ORDER BY pub_date desc, Oracle Text determines whether to push SORT into the Oracle Text index row source for better response time.