Use JSON Search on Documents in Object Storage

You can build a JSON search index on documents residing in files stored in Object Storage, which allows you to search fields in your JSON documents using JSON_TEXTCONTAINS, for example, searching with wildcards.

Use JSON Search on Documents in Object Storage

You can create a JSON search index on documents in files stored in Object Storage. A JSON search index is specifically designed for JSON documents and enables you to do regular as well as full text search on your documents or fragments of your documents using JSON_TEXTCONTAINS.

See JSON_TEXTCONTAINS Condition for more information.

You use DBMS_CLOUD.CREATE_EXTERNAL_TEXT_INDEX to create a JSON search index on files in Object Storage. You can configure a refresh rate that indicates the frequency in minutes at which the index is refreshed for any new uploads or deletes. See Create a JSON Search Index on JSON Documents in Files for more information.

The following objects are created when you create a JSON search index:

  • A JSON search index index_name on the Object Storage JSON files.

  • A local table INDEX_NAME$TXTIDX. This table is the mapping table between your documents in the files in Object Storage and the JSON search index that is created in the database. See JSON Search Index Reference Table for more information.

  • A view with the standard name INDEX_NAME. You can utilize the view to perform a search using JSON_TEXTCONTAINS. This view is created on top of the INDEX_NAME$TXTIDX table. See JSON Search Index Reference View for more information.

  • An error log table with the standard suffix index_name$txtidx_err. See Error Log Table for more information.

The JSON search index creation operation is logged in the ALL_SCHEDULER_JOB_RUN_DETAILS view. See Monitor Text Index Creation for more information.

Autonomous Database supports JSON search index creation for a variety of text files containing JSON documents. For example, you can index JSON files in both compressed and uncompressed format, or choose a different JSON document delimiter. See DBMS_CLOUD Package for more information about the file handling options.

See Indexing with Oracle Text for more information.

Create a JSON Search Index on JSON Documents in Files

Use DBMS_CLOUD.CREATE_EXTERNAL_TEXT_INDEX to create a JSON search index on the JSON files stored in Object Storage.

See Indexing with Oracle Text for more information on Oracle Text stop words.

  1. Create a credential object to access the source location.

    See CREATE_CREDENTIAL Procedure for more information.

  2. Run the DBMS_CLOUD.CREATE_EXTERNAL_TEXT_INDEX procedure to create a JSON search index on the object storage files.
    
    BEGIN 
     DBMS_CLOUD.CREATE_EXTERNAL_TEXT_INDEX (
            credential_name => 'OBJ_STORE_CRED',
            location_uri    => 'https://objectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/o/vcnlogs/',
            index_name      => 'VNCLOGS',
            format          =>  JSON_OBJECT ('json_index' value 'true', 'json_index_doc_len' value 12000, 'file_compression' value 'gzip')
    );
    END;
    /
    This example creates the following:
    • A JSON search index VNCLOGS on the Object Storage files located at the URI specified in the location_uri parameter.

    • A local table VNCLOGS$TXTIDX. See JSON Search Index Reference Table for more information.

    • A view with the name VNCLOGS. This view is created on top of the VNCLOGS$TXTIDX table. You can utilize the view to perform a search using JSON_TEXTCONTAINS. See JSON Search Index Reference View for more information.

    • An error log table with the name VNC$TXTIDX_ERR. The table contains a list of invalid JSON documents in your files. See Error Log Table for more information.

    After creating the JSON search index, you can query the VNCLOGS view. For example:

    SELECT JSON_QUERY (data, '$' returning CLOB pretty) AS RECORD 
     FROM vnclogs WHERE JSON_TEXTCONTAINS(file_line_json, '$.tuples', '{XXX.XXX.0.1.10.10.10.10.null.XXX_345}');
    

    This query returns the log details for the specified IP addresses in the JSON_TEXTCONTAINS condition. Note that the JSON Search index criteria is specified on the file_line_json column, the indexed column in your JSON search index table in the database, while the JSON documents are streamed directly from Object Storage, represented through the data column. The data column contains the content of the Object Storage file.

    See CREATE_EXTERNAL_TEXT_INDEX Procedure for more information.

Drop a JSON Search Index

Use the DBMS_CLOUD.DROP_EXTERNAL_TEXT_INDEX procedure to drop a JSON search index on the JSON files stored in Object Storage.

For example:

BEGIN 
 DBMS_CLOUD.DROP_EXTERNAL_TEXT_INDEX (
  index_name => 'VCNLOGS',
);
END;
/

This example drops the VCNLOGS JSON search index.

See DROP_EXTERNAL_TEXT_INDEX Procedure for more information.

Monitor JSON Search Index Creation

When you run DBMS_CLOUD.CREATE_EXTERNAL_TEXT_INDEX the JSON search index creation operation is logged in the ALL_SCHEDULER_JOB_RUN_DETAILS view.

You can query the ALL_SCHEDULER_JOB_RUN_DETAILS view to obtain the status and any error reported by the index creation job.

The name of the DBMS_SCHEDULER job is derived from the INDEX_NAME parameter specified when you call DBMS_CLOUD.CREATE_EXTERNAL_TEXT_INDEX.

To query the ALL_SCHEDULER_JOB_RUN_DETAILS view, you must be logged in as the ADMIN user or have READ privilege on the ALL_SCHEDULER_JOB_RUN_DETAILS view.

For example, the following SELECT statement with a WHERE clause on job_name shows the run details for the job:

SELECT status, additional_info 
   FROM all_scheduler_job_run_details WHERE LOWER(job_name) = LOWER('index_name$JOB');

You can also query for the existence of an index creation scheduler job.

For example:
SELECT status 
    FROM all_scheduler_jobs where LOWER(job_name) = LOWER('index_name$JOB');

See CREATE_EXTERNAL_TEXT_INDEX Procedure for more information.

JSON Search Index Reference Table

A local table is created within your database with a standard suffix INDEX_NAME$TXTIDX. This table is created internally when you run DBMS_CLOUD.CREATE_EXTERNAL_TEXT_INDEX.

You can query the INDEX_NAME$TXTIDX table to search for a string using the JSON_TEXTCONTAINS keyword. For example, when you call DBMS_CLOUD.CREATE_EXTERNAL_TEXT_INDEX procedure the INDEX_NAME value as VNCLOGS, this creates the VNCLOGS$TXTIDX JSON search index reference table.

The JSON search index reference table has the following columns:
  • object_name: is the file name on the Object Storage that contains the searched text string.

  • object_path: is the Object Storage bucket or folder URI that contains the object storage file.

  • length: is the length of the JSON document.

  • offset: is the byte offset of the beginning of the file.

  • mtime: is the last modified timestamp of the object storage file. This is the time when the file was last accessed by DBMS_CLOUD.

Run the following query to retrieve the object name, object path, offset and length from the JSON search index reference table:
SELECT * FROM (SELECT object_name, object_path, length, offset 
 FROM vnclogs$txtidx 
 WHERE JSON_TEXTCONTAINS (file_line_json, '$.tuples', '{XXX.XXX.0.1.10.10.10.10.null.XXX_345}') 
 ORDER BY mtime DESC) 
 WHERE rownum < 3;
 

This query provides internal information about where in your files in Object Storage the relevant documents are located. The query does not provide the actual documents.

To retrieve the documents directly, use the following query:

SELECT * FROM (SELECT object_name, object_path, data
 FROM vnclogs
 WHERE JSON_TEXTCONTAINS (file_line_json, '$.tuples', '{XXX.XXX.0.1.10.10.10.10.null.XXX_345}')
 ORDER BY mtime DESC)
 WHERE rownum < 3;
 

JSON Search Index Reference View

A view with the name INDEX_NAME is created when you run DBMS_CLOUD.CREATE_EXTERNAL_TEXT_INDEX to create a JSON index.

The view itself contains no data. The data is fetched in the view when you query the INDEX_NAME view using SQL condition JSON_TEXTCONTAINS.

Column Description
OBJECT_NAME

The file name on the Object Storage that contains the searched text string.

OBJECT_PATH

The Object Storage bucket or folder URI that contains the object storage file.

LENGTH

The length of the JSON document.

OFFSET

The byte offset of the beginning of the file.

DATA

The content of the Object Storage file.

FILE_LINE_JSON

The column on which the JSON_TEXTCONTAINS is performed.

MTIME

The last modified timestamp of the object storage file. This is the time when the file was last accessed by DBMS_CLOUD.

Error Log Table

An error log table index_name$txtidx_err is created when you create a JSON search index on the JSON files stored in Object Storage.

The table contains a list of the invalid JSON documents in your files. A JSON document is marked as invalid if the length of the source file that is indexed exceeds the default value or the specified value. The default value is 32767 bytes, and the maximum value you can specify using the json_index_doc_len format option is 200000 bytes. See Create a JSON Search Index on JSON Documents in Files for more information.

You can query the index_name$txtidx_err to get the list of invalid JSON files reported during the index creation.

Column Description
OBJECT_NAME The file name on the Object Storage that contains the searched text string.
OBJECT_PATH The Object Storage bucket or folder URI that contains the object storage file.
OFFSET Byte offset of beginning of the line.
LENGTH Length of the JSON document.
LINE_NUMBER Line number in the file at object store.