Data-Guide Formats and Ways of Creating a Data Guide

24.3 Data-Guide Formats and Ways of Creating a Data Guide

There are three formats for a data guide: flat, hierarchical, and schema. All are available in SQL and PL/SQL as CLOB JSON documents. Both hierarchical and schema data guides are JSON Schema documents (that is, JSON schemas), but only schema format is designed for validating JSON documents.

You can use a schema data guide to validate JSON documents.
You can use a hierarchical or schema data guide to create a view, or to add virtual columns, using particular document fields that you choose on the basis of data-guide information.
With a flat data guide you can more easily query data-guide information such as the types and the occurrence frequencies of fields used in your documents.

Flat data guides are represented in JSON as an array of objects, each of which represents the JSON data of a specific path in the document set. A Flat Data Guide For Purchase-Order Documents describes a flat data guide for the purchase-order data of Example 1-1.

Hierarchical and schema data guides are each represented in JSON as an object with nested JSON data, in the same format as that defined by JSON Schema. Although a hierarchical data guide is a JSON schema, do not use it for validating JSON data — use a schema data guide for that. A Hierarchical Data Guide For Purchase-Order Documents describes a hierarchical data guide for the purchase-order data of Example 1-1.

You can create a data guide of any format (flat, hierarchical, or schema) by scanning a set of JSON documents. You use SQL aggregate function json_dataguide to do this. This does not require a data guide-enabled JSON search index. The data guide accurately reflects the document set at the moment function json_dataguide is invoked.

You can create a flat or hierarchical data guide from the data-guide information stored in a JSON search index. You use PL/SQL function DBMS_JSON.get_index_dataguide to do this. (You cannot create a schema-format data guide from search-index data.)

A data guide can include statistical fields, such as how frequently each JSON field is used in the document set.

If you use SQL function json_dataguide then statistical fields are present only if you specify DBMS_JSON.gather_stats in the third argument. They are computed dynamically (up-to-date) at the time you invoke gather_stats.
If you use PL/SQL function DBMS_JSON.get_index_dataguide then statistical fields are present only if you have gathered them on the JSON search index. They are not updated automatically — gather them anew if you want to be sure they are up to date.

Table 24-1 SQL and PL/SQL Functions to Obtain a Data Guide

Uses Data Guide-Enabled Search Index?	Flat Data Guide	Hierarchical Data Guide	Schema Data Guide
Yes	PL/SQL function `get_index_dataguide` with format `DBMS_JSON.FORMAT_FLAT`	PL/SQL function `get_index_dataguide` with format `DBMS_JSON.FORMAT_HIERARCHICAL`	None
No	SQL function `json_dataguide`, with no format argument or with `DBMS_JSON.FORMAT_FLAT` as the format argument	SQL function `json_dataguide`, with `DBMS_JSON.FORMAT_HIERARCHICAL` as the format argument	SQL function `json_dataguide`, with `DBMS_JSON.FORMAT_SCHEMA` as the format argument

Advantages of obtaining a data guide based on a data guide-enabled JSON search index include:

Additive updates to the document set are automatically reflected in the persisted data-guide information whenever the index is synced.
Because this data-guide information is persisted, obtaining a data guide based on it (using PL/SQL function get_index_dataguide) is typically faster than obtaining a data guide by analyzing the document set (using SQL function json_dataguide).

Advantages of obtaining a data guide without using a data guide-enabled JSON search index include assurance that the data guide is accurate and the lack of index maintenance overhead. In addition, a data guide that is not derived from an index is appropriate in these particular use cases:

The JSON data is in an external table. You cannot create an index on it.
The JSON column could be indexed, but the index would not be very useful. This can be the case, for example, if the column contains different kinds of documents. In this case, it can sometimes be helpful to add a column to the table that identifies the kind of document stored in the JSON column. You can then use the data guide with SQL aggregate functions and GROUP BY. See Multiple Data Guides Per Document Set.