11.3 DISCOVER Function

This is a function to discover the column profile of a file. This function calls parse() and then returns the generated file profile. This function is a shortcut which can be used instead of first calling parse() and then get_file_profile().

Syntax

function discover(
    p_content             in blob,
    p_file_name           in varchar2,
    p_decimal_char        in varchar2                                     default null,
    p_xlsx_sheet_name     in varchar2                                     default null,
    p_row_selector        in varchar2                                     default null,
    p_csv_row_delimiter   in varchar2                                     default LF,
    p_csv_col_delimiter   in varchar2                                     default null,
    p_csv_enclosed        in varchar2                                     default '"',
    p_file_charset        in varchar2                                     default 'AL32UTF8',
    p_max_rows            in number                                       default 200 ) return clob;

Parameter

Table 11-1 DISCOVER Function Parameters

Parameter Description

p_content

The file content to be parsed as a BLOB

P_FILE_NAME

The name of the file used to derive the file type.

P_DECIMAL_CHAR

Use this decimal character when trying to detect NUMBER data types. If not specified, the procedure will auto-detect the decimal character.

P_XLSX_SHEET_NAME

For XLSX workbooks. The name of the worksheet to parse. If omitted, the function uses the first worksheet found.

P_ROW_SELECTOR

Whether to detect data types (NUMBER, DATE, TIMESTAMP) during parsing. If set to 'Y', the function will compute the file profile and also add data type information to it. If set to 'N', no data types will be detected and all columns will be VARCHAR2. Default is 'Y'.

P_DECIMAL_CHAR

Use this decimal character when trying to detect NUMBER data types. If not specified,the procedure will auto-detect the decimal character.

P_XLSX_SHEET_NAME

For XLSX workbooks. The name of the worksheet to parse. If omitted, the function uses the first worksheet found.

P_ROW_SELECTOR

For JSON and XML files. Pointer to the array / list of rows within the JSON or XML file. If omitted, the function will:
  • For XML files: Use "/*/*" (first tag under the root tag) as the row selector.
  • For JSON files: Look for a JSON array and use the first array found.

P_CSV_ROW_DELIMITER

Override the default row delimiter for CSV parsing.

P_CSV_ROW_DELIMITER

Override the default row delimiter for CSV parsing.

P_CSV_COL_DELIMITER

Use a specific CSV column delimiter. If omitted, the function detects the column delimiter based on the first row contents.

P_CSV_ENCLOSED

Override the default enclosure character for CSV parsing.

P_FILE_CHARSET

File encoding, if not UTF-8 (AL32UTF8).

P_MAX_ROWS

Stop discovery after P_MAX_ROWS rows have been processed.

Returns

Returns a CLOB containing the file profile in JSON format.

Example

select wwv_flow_data_parser.discover(
           p_content => {BLOB containing XLSX file},
           p_file_name=>'large.xlsx' ) as profile_json
from dual;


PROFILE_JSON
-----------------------------------------------------------
{
    "file-encoding" : "AL32UTF8",
    "single-row" : false,
    "file-type" : 1,
    "parsed-rows" : 2189,
    "columns" : [
       {
          "name" : "C0",
          "format-mask" : "",
          "selector" : "",
          "data-type" : 2
       },
       {
          "selector" : "",
          "format-mask" : "",
          "data-type" : 1,
          "name" : "FIRST_NAME"
       },
       {
          "name" : "LAST_NAME",
          "format-mask" : "",
          "selector" : "",
          "data-type" : 1
       },

       :

       {
          "name" : "DATE_",
          "format-mask" : "DD\"/\"MM\"/\"YYYY",
          "data-type" : 3,
          "selector" : ""
       },
       {
          "format-mask" : "",
          "selector" : "",
          "data-type" : 2,
          "name" : "ID"
       }
    ],
    "row-selector" : "",
    "headings-in-first-row" : true,
    "xslx-worksheet" : "sheet1.xml",
    "csv-delimiter" : ""
 }