DISCOVER Function

This is a function to discover the column profile of a file. This function calls parse() and then returns the generated file profile. This function is a shortcut which can be used instead of first calling parse() and then get_file_profile().

Syntax

function discover(
    p_content             in blob,
    p_file_name           in varchar2,
    p_decimal_char        in varchar2                                     default null,
    p_xlsx_sheet_name     in varchar2                                     default null,
    p_row_selector        in varchar2                                     default null,
    p_csv_row_delimiter   in varchar2                                     default LF,
    p_csv_col_delimiter   in varchar2                                     default null,
    p_csv_enclosed        in varchar2                                     default '"',
    p_file_charset        in varchar2                                     default 'AL32UTF8',
    p_max_rows            in number                                       default 200 ) return clob;

Parameter

Table 11-1 DISCOVER Function Parameters

Parameter	Description
`p_content`	The file content to be parsed as a BLOB
`P_FILE_NAME`	The name of the file used to derive the file type.
`P_DECIMAL_CHAR`	Use this decimal character when trying to detect `NUMBER` data types. If not specified, the procedure will auto-detect the decimal character.
`P_XLSX_SHEET_NAME`	For XLSX workbooks. The name of the worksheet to parse. If omitted, the function uses the first worksheet found.
`P_ROW_SELECTOR`	Whether to detect data types (`NUMBER, DATE, TIMESTAMP`) during parsing. If set to `'Y'`, the function will compute the file profile and also add data type information to it. If set to `'N'`, no data types will be detected and all columns will be `VARCHAR2`. Default is `'Y'`.
`P_DECIMAL_CHAR`	Use this decimal character when trying to detect `NUMBER` data types. If not specified,the procedure will auto-detect the decimal character.
`P_XLSX_SHEET_NAME`	For XLSX workbooks. The name of the worksheet to parse. If omitted, the function uses the first worksheet found.
`P_ROW_SELECTOR`	For JSON and XML files. Pointer to the array / list of rows within the JSON or XML file. If omitted, the function will: For XML files: Use "//" (first tag under the root tag) as the row selector. For JSON files: Look for a JSON array and use the first array found.
`P_CSV_ROW_DELIMITER`	Override the default row delimiter for CSV parsing.
`P_CSV_ROW_DELIMITER`	Override the default row delimiter for CSV parsing.
`P_CSV_COL_DELIMITER`	Use a specific CSV column delimiter. If omitted, the function detects the column delimiter based on the first row contents.
`P_CSV_ENCLOSED`	Override the default enclosure character for CSV parsing.
`P_FILE_CHARSET`	File encoding, if not `UTF-8 (AL32UTF8)`.
`P_MAX_ROWS`	Stop discovery after `P_MAX_ROWS` rows have been processed.

Returns

Returns a CLOB containing the file profile in JSON format.

Example

select wwv_flow_data_parser.discover(
           p_content => {BLOB containing XLSX file},
           p_file_name=>'large.xlsx' ) as profile_json
from dual;


PROFILE_JSON
-----------------------------------------------------------
{
    "file-encoding" : "AL32UTF8",
    "single-row" : false,
    "file-type" : 1,
    "parsed-rows" : 2189,
    "columns" : [
       {
          "name" : "C0",
          "format-mask" : "",
          "selector" : "",
          "data-type" : 2
       },
       {
          "selector" : "",
          "format-mask" : "",
          "data-type" : 1,
          "name" : "FIRST_NAME"
       },
       {
          "name" : "LAST_NAME",
          "format-mask" : "",
          "selector" : "",
          "data-type" : 1
       },

       :

       {
          "name" : "DATE_",
          "format-mask" : "DD\"/\"MM\"/\"YYYY",
          "data-type" : 3,
          "selector" : ""
       },
       {
          "format-mask" : "",
          "selector" : "",
          "data-type" : 2,
          "name" : "ID"
       }
    ],
    "row-selector" : "",
    "headings-in-first-row" : true,
    "xslx-worksheet" : "sheet1.xml",
    "csv-delimiter" : ""
 }