20.4 DISCOVER Function

This is a function to discover the column profile of a file. This function calls parse() and then returns the generated file profile. This function is a shortcut which can be used instead of first calling parse() and then get_file_profile().

Syntax

APEX_DATA_PARSER.DISCOVER (
    p_content             IN BLOB,
    p_file_name           IN VARCHAR2,
    p_decimal_char        IN VARCHAR2 DEFAULT NULL,
    p_xlsx_sheet_name     IN VARCHAR2 DEFAULT NULL,
    p_row_selector        IN VARCHAR2 DEFAULT NULL,
    p_csv_row_delimiter   IN VARCHAR2 DEFAULT LF,
    p_csv_col_delimiter   IN VARCHAR2 DEFAULT NULL,
    p_csv_enclosed        IN VARCHAR2 DEFAULT '"',
    p_file_charset        IN VARCHAR2 DEFAULT 'AL32UTF8',
    p_max_rows            IN NUMBER   DEFAULT 200 )
RETURN CLOB;

Parameter

Table 20-2 DISCOVER Parameters

Parameter Description
p_content The file content to be parsed as a BLOB.
p_file_name The name of the file used to derive the file type.
p_decimal_char Use this decimal character when trying to detect NUMBER data types. If not specified, the procedure will auto-detect the decimal character.
p_xlsx_sheet_name For XLSX workbooks. The name of the worksheet to parse. If omitted, the function uses the first worksheet found.
p_row_selector

Whether to detect data types (NUMBER, DATE, TIMESTAMP) during parsing.

If set to Y, the function will compute the file profile and also add data type information to it.

If set to 'N', no data types will be detected and all columns will be VARCHAR2.

Default is Y.

p_decimal_char Use this decimal character when trying to detect NUMBER data types. If not specified,the procedure will auto-detect the decimal character.
p_xlsx_sheet_name For XLSX workbooks. The name of the worksheet to parse. If omitted, the function uses the first worksheet found.
p_row_selector
For JSON and XML files. Pointer to the array / list of rows within the JSON or XML file. If omitted, the function will:
  • For XML files: Use /*/* (first tag under the root tag) as the row selector.
  • For JSON files: Look for a JSON array and use the first array found.
p_csv_row_delimiter Override the default row delimiter for CSV parsing.
p_csv_row_delimiter Override the default row delimiter for CSV parsing.
p_csv_col_delimiter Use a specific CSV column delimiter. If omitted, the function detects the column delimiter based on the first row contents.
p_csv_enclosed Override the default enclosure character for CSV parsing.
p_file_charset File encoding, if not UTF-8 (AL32UTF8).
p_max_rows Stop discovery after P_MAX_ROWS rows have been processed.

Returns

Returns a CLOB containing the file profile in JSON format.

Example

select apex_data_parser.discover(
           p_content => {BLOB containing XLSX file},
           p_file_name=>'large.xlsx' ) as profile_json
from dual;


PROFILE_JSON
-----------------------------------------------------------
{
    "file-encoding" : "AL32UTF8",
    "single-row" : false,
    "file-type" : 1,
    "parsed-rows" : 2189,
    "columns" : [
       {
          "name" : "C0",
          "format-mask" : "",
          "selector" : "",
          "data-type" : 2
       },
       {
          "selector" : "",
          "format-mask" : "",
          "data-type" : 1,
          "name" : "FIRST_NAME"
       },
       {
          "name" : "LAST_NAME",
          "format-mask" : "",
          "selector" : "",
          "data-type" : 1
       },

       :

       {
          "name" : "DATE_",
          "format-mask" : "DD\"/\"MM\"/\"YYYY",
          "data-type" : 3,
          "selector" : ""
       },
       {
          "format-mask" : "",
          "selector" : "",
          "data-type" : 2,
          "name" : "ID"
       }
    ],
    "row-selector" : "",
    "headings-in-first-row" : true,
    "xslx-worksheet" : "sheet1.xml",
    "csv-delimiter" : ""
 }