Export Data as Parquet to Cloud Object Storage
Shows the steps to export table data from your Autonomous AI Database to Cloud Object Storage as Parquet data by specifying a query.
This export method supports all the Cloud Object Stores supported by Autonomous AI Database, and you can use an Oracle Cloud Infrastructure resource principal to access your Oracle Cloud Infrastructure Object Store, Amazon Resource Names (ARNs) to access AWS Simple Storage Service (S3), an Azure service principal to access Azure BLOB storage or Azure Data Lake Storage, or a Google service account to access Google Cloud Platform (GCP) resources.
-
Connect to your Autonomous AI Database instance.
See Connect to Autonomous AI Database for more information.
-
Store your Cloud Object Storage credential using
DBMS_CLOUD.CREATE_CREDENTIAL.For example:
BEGIN DBMS_CLOUD.CREATE_CREDENTIAL( credential_name => 'DEF_CRED_NAME', username => 'user1@example.com', password => 'password' ); END; /The values you provide for
usernameandpassworddepend on the Cloud Object Storage service you are using.Creating a credential to access Oracle Cloud Infrastructure Object Store is not required if you enable resource principal credentials. See Use Resource Principal to Access Oracle Cloud Infrastructure Resources for more information.
-
Run
DBMS_CLOUD.EXPORT_DATAand specify theformatparametertypewith the valueparquetto export the results as parquet files on Cloud Object Storage.To generate the parquet output files there are two options for the
file_uri_listparameter:-
Set the
file_uri_listvalue to the URL for an existing bucket on your Cloud Object Storage. -
Set the
file_uri_listvalue to the URL for an existing bucket on your Cloud Object Storage and include a file name prefix to use when generating the file names for the exported parquet files.
If you do not include the file name prefix in the
file_uri_list,DBMS_CLOUD.EXPORT_DATAsupplies a file name prefix. See File Naming for Text Output (CSV, JSON, Parquet, or XML) for details.For example, the following shows
DBMS_CLOUD.EXPORT_DATAwith a file name prefix specified infile_uri_list:BEGIN DBMS_CLOUD.EXPORT_DATA( credential_name => 'DEF_CRED_NAME', file_uri_list => 'https://objectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/o/dept_export', query => 'SELECT * FROM DEPT', format => JSON_OBJECT('type' value 'parquet', 'compression' value 'snappy')); END; /In this example,
namespace-stringis the Oracle Cloud Infrastructure object storage namespace andbucketnameis the bucket name. See Understanding Object Storage Namespaces for more information.For detailed information about the parameters, See EXPORT_DATA Procedure.
For detailed information about the available
formatparameters you can use withDBMS_CLOUD.EXPORT_DATA, See DBMS_CLOUD Package Format Options for EXPORT_DATA. -
Notes for exporting with DBMS_CLOUD.EXPORT_DATA:
-
The
queryparameter that you supply can be an advanced query, if required, such as a query that includes joins or subqueries. -
Specify the
formatparameter with thecompressionoption to compress the output files. The defaultcompressionfortype parquetissnappy. -
When you no longer need the files that you export, use the procedure
DBMS_CLOUD.DELETE_OBJECTor use native Cloud Object Storage commands to delete the files. -
See DBMS_CLOUD Package Oracle Data Type to Parquet Mapping for details on Oracle Type to Parquet Type mapping.
The following types are not supported or have limitations on their support for exporting Parquet with
DBMS_CLOUD.EXPORT_DATA:Oracle Type Notes BFILENot supported BLOBNot supported DATESupported with the following limitation:
DATEformat supports only date, month and year. Hour, minute and seconds are not supported.See DBMS_CLOUD Package Oracle Data Type to Parquet Mapping for details on NLS format limitations for exporting
DATEto Parquet.INTERVAL DAY TO SECONDSupported and is treated as string internally INTERVAL YEAR TO MONTHSupported and is treated as string internally LONGNot supported LONG RAWNot supported NUMBERSupported with the following limitations:
- Can have maximum precision of 38 and scale equal to less than precision.
- If no precision and scale is provided for the column
NUMBERtype, by default precision of 38 and scale of 20 is used. - Negative scale is not supported for
NUMBERtypes.
Object TypesNot supported TIMESTAMPSupported with the following limitations:
- If there are multiple columns with different precision, highest precision will be taken.
TIMESTAMP WITH TIME ZONEOracle datatype will use the timestamp only.
See DBMS_CLOUD Package Oracle Data Type to Parquet Mapping for details on NLS format limitations for exporting
TIMESTAMPto Parquet.