12.9 PARSE Function
The PARSE function enables you to parse XML, XLSX, CSV or JSON files and returns a generic table of the following structure:
LINE_NUMBER COL001 COL002 COL003 COL004 ... COL300
Values are generally returned in VARCHAR2 format. A
returned table row can have a maximum of 300 columns. The maximum length for a
varchar2 table column is 4000 bytes; there is no line length maximum. 20 out of the
300 supported columns can be handled as a CLOB.
File parsing happens on-the-fly as this function is invoked. No data is written to a collection or to a temporary table.
If the P_FILE_PROFILE parameter is not passed, the
function will compute a file profile with column information during parse.
If P_DETECT_DATA_TYPES is passed as 'Y'
(default), the function also detect column data types during parse. The computed
file profile can be retrieved using GET_FILE_PROFILE after this
function is finished.
- Invoke
PARSE- Use this table function to parse the files and get rows and columns in order to display a data preview. - Invoke
GET_FILE_PROFILE- Retrieve file profile information in JSON format. - Process the data - Generate a SQL query based on the data profile to perform custom processing.
Note:
- JSON parsing is supported on 11.2 and 12.1.0.1 database
versions. In this case, the function uses
APEX_JSONandXMLTABLEfunctions. For performance reasons it's recommended to upgrade the database to at least 12.2 - JSON parsing is faster by magnitudes on these versions. - XLSX parsing is done by using
APEX_ZIPto extract individual XML files from the XLSX archive; the actual XLSX parsing is then done by using theXMLTABLE SQLfunction.
About CLOB Support
Starting with release 19.2, this package supports string values larger than 400
bytes. 20 out of the 300 supported columns can be handled as a
CLOB. The level of CLOB support depends upon the
file type being parsed.
- CSV and XLSX
CLOBvalues are supported up to 32KCLOBcolumns can be detected during discovery- When the data profile is discovered, values below 4000 bytes are
normally returned as
COLNNN.CLOBvalues are returned in theCLOBNNcolumn and the first 1000 characters are returned asCOLNNN. If a data profile is passed in and that hasCLOBcolumn defined, all values are returned in theCLOBNNcolumn only.
- XML
CLOBvalues with more than 32K are supportedCLOBcolumns can be detected during discovery- When the data profile is discovered, values below 4000 bytes are
normally returned as
COLNNN.CLOBvalues are returned in theCLOBNNcolumn and the first 1000 characters are returned asCOLNNN. If a data profile is passed in and that hasCLOBcolumn defined, all values are returned in theCLOBNNcolumn only.
- JSON
CLOBvalues with more than 32K are supportedCLOBcolumns are *not* detected during discovery;CLOBsupport is only active, if a file profile containingCLOBcolumn is passed in as thep_file_profileparameter- Since
JSON_TABLEdoes not support CLOBs on 12c databases, the parser uses XMLTYPE-based processing if a file profile with CLOB columns is passed in. Processing will be significantly slower.
Syntax
FUNCTION PARSE(
p_content IN BLOB,
p_file_name IN VARCHAR2 DEFAULT NULL,
p_file_type IN T_FILE_TYPE DEFAULT NULL,
p_file_profile IN CLOB DEFAULT NULL,
p_detect_data_types IN VARCHAR2 DEFAULT 'Y',
p_decimal_char IN VARCHAR2 DEFAULT NULL,
p_xlsx_sheet_name IN VARCHAR2 DEFAULT NULL,
p_row_selector IN VARCHAR2 DEFAULT NULL,
p_csv_row_delimiter IN VARCHAR2 DEFAULT LF,
p_csv_col_delimiter IN VARCHAR2 DEFAULT NULL,
p_csv_enclosed IN VARCHAR2 DEFAULT '"',
p_skip_rows IN PLS_INTEGER DEFAULT 0,
p_add_headers_row IN VARCHAR2 DEFAULT 'N',
p_file_charset IN VARCHAR2 DEFAULT 'AL32UTF8',
p_max_rows IN NUMBER DEFAULT NULL,
p_return_rows IN NUMBER DEFAULT NULL,
p_store_profile_to_collection IN VARCHAR2 DEFAULT NULL ) RETURN apex_t_parser_table pipelined;Parameter
Table 12-6 PARSE Function Parameters
| Parameter | Description |
|---|---|
|
|
The file content to be parsed as a BLOB |
|
|
The name of the file; only used to derive the file type. Either |
|
|
The type of the file to be parsed. Use this to explicitly pass the file type in. Either |
|
|
File profile to be used for parsing. The file profile might have been computed in a previous |
|
|
Whether to detect data types ( |
|
|
Use this decimal character when trying to detect |
|
|
For XLSX workbooks. The name of the worksheet to parse. If omitted, the function uses the first worksheet found. |
|
|
For JSON and XML files. Pointer to the array / list of rows within the JSON or XML file. If omitted, the function will:
|
|
|
Override the default row delimiter for CSV parsing. |
|
|
Use a specific CSV column delimiter. If omitted, the function will detect the column delimiter based on the first row contents. |
|
|
Override the default enclosure character for CSV parsing. |
|
|
Skip the first N rows when parsing. |
|
|
For XML, JSON: Emit the column headers (tag, attr names) as the first row. |
|
|
File encoding, if not |
|
|
Stop parsing after |
|
|
Amount of rows to return. This is useful when the parser shall to parse more rows (for data type detection), than it is supposed to return. When the specified amount of rows have been emitted, the function will continue parsing (and refining the detected data types) until |
|
|
Store the File profile which has been computed during parse into a collection. The collection will be cleared, if it exists. Only be used for computed profiles. |
Returns
Returns rows of the APEX_T_PARSER_ROW type.
LINE_NUMBER COL001 COL002 COL003 COL004 ... COL300
Example
select line_number, col001,col002,col003,col004,col005,col006,col007,col008
from table(
apex_data_parser.parse(
p_content => {BLOB containing XLSX spreadsheet},
p_file_name => 'test.xlsx',
p_xlsx_sheet_name => 'sheet1.xml') ) ;
LINE_NUMBER COL001 COL002 COL003 COL004 COL005 COL006 COL007 COL008
----------- -------- ------------ ------------ -------- --------------- -------- ------------ -------------
1 0 First Name Last Name Gender Country Age Date Id
2 1 Dulce Abril Female United States 32 15/10/2017 1562
3 2 Mara Hashimoto Female Great Britain 25 16/08/2016 1582
4 3 Philip Gent Male France 36 21/05/2015 2587
5 4 Kathleen Hanner Female United States 25 15/10/2017 3549
6 5 Nereida Magwood Female United States 58 16/08/2016 2468
7 6 Gaston Brumm Male United States 24 21/05/2015 2554
8 7 Etta Hurn Female Great Britain 56 15/10/2017 3598
9 8 Earlean Melgar Female United States 27 16/08/2016 2456
10 9 Vincenza Weiland Female United States 40 21/05/2015 6548
: : : : : : : : :
select line_number, col001,col002,col003,col004,col005,col006,col007,col008
from table(
apex_data_parser.parse(
p_content => {BLOB containing JSON file},
p_file_name => 'test.json') ) ;
LINE_NUMBER COL001 COL002 COL003 COL004 COL005
----------- --------- ---------------------------------------------- --------------- --------------
1 Feature 1.5 41km E of Cape Yakataga, Alaska 1536513727239 1536514117117
2 Feature 0.21 11km ENE of Aguanga, CA 1536513299520 1536513521231
3 Feature 1.84 5km SSW of Pahala, Hawaii 1536513262940 1536513459610
4 Feature 2.55 9km W of Volcano, Hawaii 1536513100890 1536513446680
5 Feature 1.3 62km ESE of Cape Yakataga, Alaska 1536512917361 1536513322236
6 Feature 1.79 7km SW of Tiptonville, Tennessee 1536512379690 1536512668010
7 Feature 1.9 126km NNW of Arctic Village, Alaska 1536512346186 1536512846567
8 Feature 1.4 105km NW of Arctic Village, Alaska 1536512140162 1536512846334
About Large CSV Files
If the BLOB passed to APEX_DATA_PARSER.PARSE is less
than 50 MB, Oracle Application Express copies the BLOB to an
_internal, cached_ temporary LOB. Thus all CSV
parsing is done in memory. For larger BLOBs, Oracle Application Express does CSV
parsing on the original BLOB locator. If it is selected from a table, CSV parsing
can happen on disk but might be significantly slower. Note that a performance
degradation may occur when parsed CSV files grow beyond 50 MB.
However, developers can also use the
DBMS_LOB.CREATETEMPORARY (passing CACHE =>
TRUE ) and DBMS_LÖB.COPY procedures in order to
explicitly create a cached temporary LOB, even for a larger file. Instead of the
original BLOB, the cached temporary LOB can be passed to
APEX_DATA_PARSER.PARSE. This approach also enables in-memory
parsing for files larger than 50 MB.
Note:
"CREATETEMPORARY Procedures" and "COPY Procedures" in Oracle Database PL/SQL Packages and Types Reference
Parent topic: APEX_DATA_PARSER