Query Text in Object Storage
The PL/SQL package DBMS_CLOUD
enables you to build a
text index on the object store files, which allows you to search the text and use wildcards
with your search.
About Using a Text Index to Query Text on Object Storage
DBMS_CLOUD
provides
fast and efficient ways to manage data in object store. The DBMS_CLOUD
APIs let you create,
copy, download, delete, and traverse files present in object store. When you define
external tables you can run SQL queries on data stored in your object store (or with
hybrid partitioned external tables, across data in your database and in object store).
When you use DBMS_CLOUD
to define
a text index, this allows you to search your data for text and use wildcards.
Autonomous Database support for word-based search works for commonly used data formats, for example CSV and JSON. You can configure a refresh rate that indicates the frequency in minutes at which the index is refreshed for any new uploads or deletes.
A local table with the standard suffix INDEX_NAME$TXTIDX
is
created when you create an index on the object storage, and you can utilize the table to
perform a search using the CONTAINS
keyword.
See Indexing with Oracle Text for more information.
Create a Text Index on Object Storage Files
Use DBMS_CLOUD.CREATE_EXTERNAL_TEXT_INDEX
to
create a text index on files in object storage.
Drop an Index on the Cloud Storage Files
Use the DBMS_CLOUD.DROP_EXTERNAL_TEXT_INDEX
procedure
to drop a text index on object storage files.
Run the DBMS_CLOUD.DROP_EXTERNAL_TEXT_INDEX
procedure to drop a text index on files in object storage.
BEGIN
DBMS_CLOUD.DROP_EXTERNAL_TEXT_INDEX
(
index_name => 'EMP',
);
END;
/
This example drops the EMP
text index.
See DROP_EXTERNAL_TEXT_INDEX Procedure for more information.
Text Index Reference Table
A local table is created within your database with a standard suffix
INDEX_NAME$TXTIDX
. This table is created internally when you
run DBMS_CLOUD.CREATE_EXTERNAL_TEXT_INDEX
.
You can query the INDEX_NAME$TXTIDX
table to search for a
string using the CONTAINS
keyword. For example, when you call DBMS_CLOUD.CREATE_EXTERNAL_TEXT_INDEX
procedure the INDEX_NAME
value as EMP
, this
creates the EMP$TXTIDX
the text reference table.
-
object_name
: is the file name on the object storage that contains the searched text string. -
object_path
: is the object storage bucket or folder URI that contains the object storage file. -
mtime
: is the last modified timestamp of the object storage file. This is the time when the file was last accessed byDBMS_CLOUD
.
SELECT object_path, object_name FROM EMP$TXTIDX WHERE CONTAINS(OBJECT_NAME, 'king') > 0;
OBJECT_PATH OBJECT_NAME
------------------------------------------------------------------------------------------ ------------------------------------
https://objectstorage.us-phoenix-1.oraclecloud.com/n/example1/b/adbs_data_share/o/ts_data/ data_2_20221026T195313585601Z.json
This query returns the file names and location URI on the object storage which
contains the text string king
, in either upper or lowercase.
SELECT object_name, mtime FROM EMP$TXTIDX;
OBJECT_NAME MTIME
----------------------------- -------------------------------------
data_1_20220531T165402Z.json 31-MAY-22 04.54.02.979000 PM +00:00
data_1_20220531T165427Z.json 31-MAY-22 04.54.27.997000 PM +00:00
This query returns file name and last modified timestamp of the object files on which the index EMP
is created.
Monitor Text Index Creation
When you run DBMS_CLOUD.CREATE_EXTERNAL_TEXT_INDEX
the text index creation
operation is logged in the ALL_SCHEDULER_JOB_RUN_DETAILS
view.
You can query the ALL_SCHEDULER_JOB_RUN_DETAILS
view to obtain the
status and any error reported by the index creation job.
The name of the DBMS_SCHEDULER
job is derived from the
INDEX_NAME
parameter specified when you call DBMS_CLOUD.CREATE_EXTERNAL_TEXT_INDEX
.
To query the ALL_SCHEDULER_JOB_RUN_DETAILS
view, you must be logged
in as the ADMIN
user or have READ
privilege on the
ALL_SCHEDULER_JOB_RUN_DETAILS
view.
For example, the following SELECT
statement with a
WHERE
clause on job_name
shows the run details for
the job:
SELECT status, additional_info
FROM all_scheduler_job_run_details WHERE LOWER(job_name) = LOWER('index_name$JOB');
You can also query for the existence of an index creation scheduler job.
SELECT status
FROM all_scheduler_jobs where LOWER(job_name) = LOWER('index_name$JOB');
See CREATE_EXTERNAL_TEXT_INDEX Procedure for more information.