12.9 PARSE Function
The PARSE function enables you to parse XML, XLSX, CSV or JSON files and returns a generic table of the following structure:
LINE_NUMBER COL001 COL002 COL003 COL004 ... COL300
Values are generally returned in VARCHAR2
format. A
returned table row can have a maximum of 300 columns. The maximum length for a
varchar2 table column is 4000 bytes; there is no line length maximum. 20 out of the
300 supported columns can be handled as a CLOB.
File parsing happens on-the-fly as this function is invoked. No data is written to a collection or to a temporary table.
If the P_FILE_PROFILE
parameter is not passed, the
function will compute a file profile with column information during parse.
If P_DETECT_DATA_TYPES
is passed as 'Y'
(default), the function also detect column data types during parse. The computed
file profile can be retrieved using GET_FILE_PROFILE
after this
function is finished.
- Invoke
PARSE
- Use this table function to parse the files and get rows and columns in order to display a data preview. - Invoke
GET_FILE_PROFILE
- Retrieve file profile information in JSON format. - Process the data - Generate a SQL query based on the data profile to perform custom processing.
Note:
- JSON parsing is supported on 11.2 and 12.1.0.1 database
versions. In this case, the function uses
APEX_JSON
andXMLTABLE
functions. For performance reasons it's recommended to upgrade the database to at least 12.2 - JSON parsing is faster by magnitudes on these versions. - XLSX parsing is done by using
APEX_ZIP
to extract individual XML files from the XLSX archive; the actual XLSX parsing is then done by using theXMLTABLE SQL
function.
About CLOB Support
Starting with release 19.2, this package supports string values larger than 400
bytes. 20 out of the 300 supported columns can be handled as a
CLOB
. The level of CLOB
support depends upon the
file type being parsed.
- CSV and XLSX
CLOB
values are supported up to 32KCLOB
columns can be detected during discovery- When the data profile is discovered, values below 4000 bytes are
normally returned as
COLNNN
.CLOB
values are returned in theCLOBNN
column and the first 1000 characters are returned asCOLNNN
. If a data profile is passed in and that hasCLOB
column defined, all values are returned in theCLOBNN
column only.
- XML
CLOB
values with more than 32K are supportedCLOB
columns can be detected during discovery- When the data profile is discovered, values below 4000 bytes are
normally returned as
COLNNN
.CLOB
values are returned in theCLOBNN
column and the first 1000 characters are returned asCOLNNN
. If a data profile is passed in and that hasCLOB
column defined, all values are returned in theCLOBNN
column only.
- JSON
CLOB
values with more than 32K are supportedCLOB
columns are *not* detected during discovery;CLOB
support is only active, if a file profile containingCLOB
column is passed in as thep_file_profile
parameter- Since
JSON_TABLE
does not support CLOBs on 12c databases, the parser uses XMLTYPE-based processing if a file profile with CLOB columns is passed in. Processing will be significantly slower.
Syntax
FUNCTION PARSE(
p_content IN BLOB,
p_file_name IN VARCHAR2 DEFAULT NULL,
p_file_type IN T_FILE_TYPE DEFAULT NULL,
p_file_profile IN CLOB DEFAULT NULL,
p_detect_data_types IN VARCHAR2 DEFAULT 'Y',
p_decimal_char IN VARCHAR2 DEFAULT NULL,
p_xlsx_sheet_name IN VARCHAR2 DEFAULT NULL,
p_row_selector IN VARCHAR2 DEFAULT NULL,
p_csv_row_delimiter IN VARCHAR2 DEFAULT LF,
p_csv_col_delimiter IN VARCHAR2 DEFAULT NULL,
p_csv_enclosed IN VARCHAR2 DEFAULT '"',
p_skip_rows IN PLS_INTEGER DEFAULT 0,
p_add_headers_row IN VARCHAR2 DEFAULT 'N',
p_file_charset IN VARCHAR2 DEFAULT 'AL32UTF8',
p_max_rows IN NUMBER DEFAULT NULL,
p_return_rows IN NUMBER DEFAULT NULL,
p_store_profile_to_collection IN VARCHAR2 DEFAULT NULL ) RETURN apex_t_parser_table pipelined;
Parameter
Table 12-6 PARSE Function Parameters
Parameter | Description |
---|---|
|
The file content to be parsed as a BLOB |
|
The name of the file; only used to derive the file type. Either |
|
The type of the file to be parsed. Use this to explicitly pass the file type in. Either |
|
File profile to be used for parsing. The file profile might have been computed in a previous |
|
Whether to detect data types ( |
|
Use this decimal character when trying to detect |
|
For XLSX workbooks. The name of the worksheet to parse. If omitted, the function uses the first worksheet found. |
|
For JSON and XML files. Pointer to the array / list of rows within the JSON or XML file. If omitted, the function will:
|
|
Override the default row delimiter for CSV parsing. |
|
Use a specific CSV column delimiter. If omitted, the function will detect the column delimiter based on the first row contents. |
|
Override the default enclosure character for CSV parsing. |
|
Skip the first N rows when parsing. |
|
For XML, JSON: Emit the column headers (tag, attr names) as the first row. |
|
File encoding, if not |
|
Stop parsing after |
|
Amount of rows to return. This is useful when the parser shall to parse more rows (for data type detection), than it is supposed to return. When the specified amount of rows have been emitted, the function will continue parsing (and refining the detected data types) until |
|
Store the File profile which has been computed during parse into a collection. The collection will be cleared, if it exists. Only be used for computed profiles. |
Returns
Returns rows of the APEX_T_PARSER_ROW type
.
LINE_NUMBER COL001 COL002 COL003 COL004 ... COL300
Example
select line_number, col001,col002,col003,col004,col005,col006,col007,col008
from table(
apex_data_parser.parse(
p_content => {BLOB containing XLSX spreadsheet},
p_file_name => 'test.xlsx',
p_xlsx_sheet_name => 'sheet1.xml') ) ;
LINE_NUMBER COL001 COL002 COL003 COL004 COL005 COL006 COL007 COL008
----------- -------- ------------ ------------ -------- --------------- -------- ------------ -------------
1 0 First Name Last Name Gender Country Age Date Id
2 1 Dulce Abril Female United States 32 15/10/2017 1562
3 2 Mara Hashimoto Female Great Britain 25 16/08/2016 1582
4 3 Philip Gent Male France 36 21/05/2015 2587
5 4 Kathleen Hanner Female United States 25 15/10/2017 3549
6 5 Nereida Magwood Female United States 58 16/08/2016 2468
7 6 Gaston Brumm Male United States 24 21/05/2015 2554
8 7 Etta Hurn Female Great Britain 56 15/10/2017 3598
9 8 Earlean Melgar Female United States 27 16/08/2016 2456
10 9 Vincenza Weiland Female United States 40 21/05/2015 6548
: : : : : : : : :
select line_number, col001,col002,col003,col004,col005,col006,col007,col008
from table(
apex_data_parser.parse(
p_content => {BLOB containing JSON file},
p_file_name => 'test.json') ) ;
LINE_NUMBER COL001 COL002 COL003 COL004 COL005
----------- --------- ---------------------------------------------- --------------- --------------
1 Feature 1.5 41km E of Cape Yakataga, Alaska 1536513727239 1536514117117
2 Feature 0.21 11km ENE of Aguanga, CA 1536513299520 1536513521231
3 Feature 1.84 5km SSW of Pahala, Hawaii 1536513262940 1536513459610
4 Feature 2.55 9km W of Volcano, Hawaii 1536513100890 1536513446680
5 Feature 1.3 62km ESE of Cape Yakataga, Alaska 1536512917361 1536513322236
6 Feature 1.79 7km SW of Tiptonville, Tennessee 1536512379690 1536512668010
7 Feature 1.9 126km NNW of Arctic Village, Alaska 1536512346186 1536512846567
8 Feature 1.4 105km NW of Arctic Village, Alaska 1536512140162 1536512846334
About Large CSV Files
If the BLOB passed to APEX_DATA_PARSER.PARSE
is less
than 50 MB, Oracle Application Express copies the BLOB to an
_internal
, cached_ temporary
LOB. Thus all CSV
parsing is done in memory. For larger BLOBs, Oracle Application Express does CSV
parsing on the original BLOB locator. If it is selected from a table, CSV parsing
can happen on disk but might be significantly slower. Note that a performance
degradation may occur when parsed CSV files grow beyond 50 MB.
However, developers can also use the
DBMS_LOB.CREATETEMPORARY
(passing CACHE =>
TRUE
) and DBMS_LÖB.COPY
procedures in order to
explicitly create a cached temporary LOB, even for a larger file. Instead of the
original BLOB, the cached temporary LOB can be passed to
APEX_DATA_PARSER.PARSE
. This approach also enables in-memory
parsing for files larger than 50 MB.
Note:
"CREATETEMPORARY Procedures" and "COPY Procedures" in Oracle Database PL/SQL Packages and Types Reference
Parent topic: APEX_DATA_PARSER