3.6 Improved Document Services Performance with a Forward Index
When it searches for a word in a document, Oracle Text uses an inverted index and then displays the results by calculating the snippet from that document. For calculating the snippet, each document returned as part of the search result is reindexed. The search operation slows down considerably when a document’s size is very large.
The forward index overcomes the performance problem of very large documents. It uses a $O mapping table that refers to the token offsets in the $I inverted index table. Each token offset is translated into the character offset in the original document, and the text surrounding the character offset is then used to generate the text snippet.
Because the forward index does not use in-memory indexing of the documents while calculating the snippet, it provides considerable improved performance over the inverted index while searching for a word in very large documents.
The forward index improves the performance of the following procedures in the Oracle Text CTX_DOC package:
-
CTX_DOC.SNIPPET -
CTX_DOC.HIGHLIGHT -
CTX_DOC.MARKUP
See Also:
Oracle Text Reference for information about the forward_index parameter clause of the BASIC_STORAGE indexing type
3.6.1 Enabling Forward Index
The following example enables the forward index feature by setting the forward_index attribute value of the BASIC_STORAGE storage type to TRUE:
exec ctx_ddl.create_preference('mystore', 'BASIC_STORAGE');
exec ctx_ddl.set_attribute('mystore','forward_index','TRUE');3.6.2 Forward Index with Snippets
In some cases, when you use the forward_index option, generated snippets may be slightly different from the snippets that are generated when you do not use the forward_index option. The differences are generally minimal, do not affect snippet quality, and are typically "few extra white spaces" and "newline."
3.6.3 Forward Index with Save Copy
Using Forward Index with Save Copy
To use the forward index effectively, you should store copies of the documents in the $D table, either in plain-text format or filtered format, depending upon the CTX_DOC package procedure that you use. For example, store the document in plain-text when you use the SNIPPET procedure and store it in the filtered format when you use the MARKUP or HIGHLIGHT procedure.
You should use the Save Copy feature of Oracle Text to store the copies of the documents in the $D table. Implement the feature by using the save_copy attribute or the save_copy column parameter.
-
save_copybasic storage attribute:The following example sets the
save_copyattribute value of theBASIC_STORAGEstorage type toPLAINTEXT.This example enables Oracle Text to save a copy of the text document in the$Dtable while it searches for a word in that document.exec ctx_ddl.create_preference('mystore', 'BASIC_STORAGE'); exec ctx_ddl.set_attribute('mystore','save_copy','PLAINTEXT');
-
save_copy columnindex parameter:The following example uses the
save_copy columnindex parameter to save a copy of a text document into the$Dtable. Thecreate indexstatement creates the$Dtable and copies document 1 ( "hello world") into the$Dtable.create table docs( id number, txt varchar2(64), save varchar2(10) ); insert into docs values(1, 'hello world', 'PLAINTEXT'); create index idx on docs(txt) indextype is ctxsys.context parameters('save_copy column save');
For the save_copy attribute or column parameter, you can specify one of the following values:
-
PLAINTEXTsaves the copy of the document in a plain-text format in the$Dindex table. The plain-text format is defined as the output format of the sectioner. Specify this value when you use theSNIPPETprocedure. -
FILTEREDsaves a copy of a document in a filtered format in the$Dindex table. The filtered format is defined as the output format of the filter. Specify this value when you use theMARKUPorHIGHLIGHTprocedure. -
NONEdoes not save the copy of the document in the$Dindex table. Specify this value when you do not use theSNIPPET, MARKUP,orHIGHLIGHTprocedure and when the indexed column is eitherVARCHAR2orCLOB.
3.6.4 Forward Index Without Save Copy
In the following scenarios, you can take advantage of the performance enhancement of forward index without saving copies of all documents in the $D table (that is, without using the Save Copy feature):
-
The document set contains HTML and plain text: Store all documents in the base table by using the
DIRECT_DATASTOREor theMULTI_COLUMN_DATASTOREdatastore type. -
The document set contains HTML, plain text, and binary: Store all documents in the base table by using the
DIRECT_DATASTOREdatastore type. Store only the binary documents in the$Dtable in the filtered format.