Loading Data from Cloud Storage

You can load data from a cloud store to a table in your Autonomous Database.

You can load files in these file formats: AVRO, CSV, JSON, ORC, Delimited TXT, XLSX, PRQ, GZ, GNU ZIP and Tab-Separated Values. For information on supported file formats, see Format Specifications for JSON and AVRO Files. The Data Load tool supports loading tables from multiple worksheets XLSX files when the file is in a cloud store.

You can set filters on the data for a table to load only the specified data. For example, to limit the files to only those that are CSV files, enter *.CSV in the file extension filter.

Configure and run a data load job from the Load Cloud Object page. To open that page:

  1. Open the Database Actions and select Data Load.
  2. Select LOAD DATA and Cloud Store .

On the left side of the page is a navigator pane, where you choose a cloud store connection and the folders or files containing the data. On the right of the page is the data load cart, where you stage the files and folders for the data load job. You can set options for the data load job before running it. The Autonomous Database comes with predefined CPU/IO shares assigned to different consumer groups. You can set the consumer group to either low, medium or high while executing a data load job depending on your workload.

To load files from a cloud store into your database, do the following:

Manage Cloud Storage Links for Data Load Jobs

Before you can load data from a cloud store, you must establish a connection to the cloud store you want to use. You can select cloud store location from the cloud store locations field.

On the Load Data page when you select Cloud Store:

  1. Click the Create Cloud Store locations menu besides the cloud store locations text field. This opens an Add Cloud Store Location dialog box. See Managing Connections to add cloud store location.

See Managing Connections.

To return to the Load Cloud Object page, click Data Load in the breadcrumbs at the top of the page and then navigate back to the page.

Prepare the Data Load Job

As you'll see below in Enter Details for the Data Load Job, the first decision you'll make when configuring a data load job is how to load the source data into a new or existing table in the database. The choices are:

  • Create a table and insert data loaded from the source into the new table.

  • Insert data loaded from the source into an existing table.

  • Delete all data in an existing table and then insert new data from the source into the table.

  • Drop a table, create a new table, and then insert data loaded from the source into the new table.

  • Merge data from the source into a table by updating existing rows in the table and inserting new rows into the table.

You may have to adjust your source data or your target table so that the source data loads correctly into the external target table. The number, order, and data types of columns in the source must match those in the target. Consider:

  • If you're creating a new table or if the columns in your source exactly match the columns in an existing target, you don't have to do any special preparation.

  • If the columns in your source don't match the columns in an existing target, you must edit your source files or target table so they do match.

  • If you're loading multiple files, you must make sure that:

    • All the source files are of the same type, for example, CSV, JSON, etc.

    • The number, order, and data types of the columns in all the source files match (and that they match the target, if you're loading into an existing table).

  • If you want to partition by date:

    • The source file must contain data where the data type is date or timestamp.

    • You must load a folder containing two or more data sources.

    • The names of the files in the folder must indicate a date or dates, for example, MAR-1999.csv or 2017-04-21.xlsx.

Add Files or Folders for the Data Load Job

Add files from the cloud store to the data load cart, where you can edit the details of the data load job. To add the files:

  1. From the list at the top of the navigator pane on the left, select the bucket with your source data.

    The list shows links that were established on the Manage Cloud Storage page. If you haven't yet registered the cloud store you want to use, click the Create Cloud Store Location button at the top of the page and register a connection. See Manage Cloud Storage Links for Data Load Jobs, above.

  2. Drag one or more items from the file navigator on the left and drop them into the cart on the right.

    • You can add files, folders, or both. A card is added to the cart for each file or folder you drag into it. The card lists the name of the source file or folder and a proposed name for the target table.

    • If you add a folder that contains multiple files, all the files must be of the same type, that is, CSV, TXT, etc.

      When you add the folder to the cart, a prompt is displayed that asks if you want to load all the objects from the multiple source files into a single target table. Click OK to continue or Escape to cancel.

    • When you add multiple individual files or multiple folders to the cart, the data represented by each card will be loaded into a separate table, but all the items in the cart will be processed as part of the same data load job.

    • You can add files or folders from a different bucket, but if you do that, you're prompted to remove all files that are already in the cart before proceeding. To select files from a different bucket, select the bucket from the drop-down list in the navigator pane on the left and then add the file(s), as described above.

    • You can drop files or folders into the data load cart and then navigate away from the Data Load Object page. When you return to the page, those items remain on the page, but you may receive a message, "Remove All Data Load Items. Changing to another Cloud storage location requires all items to be removed from the data load job. Do you wish to continue?" Click Yes to remove the items from the cart. Click No to keep the items in the cart. Then you can continue to work.

You can remove items from the cart before running the data load job:

  • To remove an item from the cart, click the Actions icon and select Remove on the card for the item.

  • To remove all items from the cart, click Remove All in the data link cart menu bar at the top of the pane.

Enter Details for the Data Load Job

Enter the details about the data load job in the Load Data from Cloud Storage pane.

On the card in the data link cart, click the Actions icon and select the Settings to open the Load Data from Cloud Storage pane for that job. The pane contains:

Settings Tab - Table Section

Set details about the target table in the Table section.

  • Option: Select an item from the Option list to specify how the data should be loaded into a new or existing table. The processing options are:

    • Create Table: Creates a table and inserts the data into the new table. When you select this option, the Name field on the Settings tab is filled with a default name, based on the name of the source file or folder. You can change it if you want.

    • Insert into Table: Inserts data loaded from the source into an existing table. When you select this option, the Name field on the Settings tab presents a list of the tables in the current schema. Select the table into which you want to insert the data.

    • Replace Data: Deletes all data in the existing table and then inserts new data from the source into the table. When you select this option, the Name field on the Settings tab presents a list of the tables in the current schema. Select the table you want to use.

    • Drop Table and Create New Table: Drops the table (if it already exists), creates a new table, and then inserts the new data into the table. When you select this option, the Name field on the Settings tab presents a list of the tables in the current schema. Select the table you want to use.

    • Merge into Table: Updates existing rows and inserts new rows in the table. When you select this option, the Name field on the Settings tab presents a list of the tables in the current schema. Select the table you want to use.

  • Name: The name of the target table.
  • Partition Column:

    List Partitions and Date-based partitions are the different types of partitions available in data loading.

    List partitioning is required when you specifically want to map rows to partitions based on discrete values.

    To partition according to a specific column, click the Partition Column drop-down list and select the column you want to use for the partitioning.

    You will have N files per partition value, all partitioned by the partition column you select.

    Note:

    • For linked files (from external tables) there is also a requirement that for each file, the list partitioning column can contain only a single distinct value across all of the rows.
    • If a file is list partitioned, the partitioning key can only consist of a single column of the table.

    Date-based partitioning is available when you load a folder containing two or more data sources that contain date or timestamp data.

    To partition according to date, click the Partition Column drop-down list and select the DATE or TIMESTAMP column you want to use for the partitioning.

Settings Tab - Properties Section

Specify options to control how the source data is interpreted, previewed, and processed. These options vary, depending on the type of source data.

  • Encoding: Select a character encoding type from the list. This option is available when the loaded file is in plain text format (CSV, TSV, or TXT). The default encoding type is UTF-8.

  • Text enclosure: Select the character for enclosing text: " (double-quote character), ' (single-quote character) or None. This option is visible only when the selected file is in plain text format (CSV, TSV, or TXT).

  • Field delimiter: Select the delimiter character used to separate columns in the source. For example, if the source file uses semicolons to delimit the columns, select Semicolon from this list. The default is Comma. This option is visible only when the selected file is in plain text format (CSV, TSV, or TXT).

  • Column header row: Select the Column header row checkbox to use the column names form the source table in the target table.

    • By selecting this option you can indicate what row in the file contains column names. The rows in the Mapping section, below, are filled with those names (and with the existing data types, unless you change them).

    • If you deselect this option, the first row is processed as data. To specify column names manually, enter a name for each target column in the Mapping section. (You will also have to enter data types.)

  • Start processing data at row: Specifies the number of rows to skip when loading the source data into the target:

    • If you select the Column header row option under Source column name (see below) and if you enter a number greater than 0 in the Start processing data at row field, then that number of rows after the first row are not loaded into the target.

    • If you deselect the Column header row option under Source column name, and if you enter a number greater than 0 in the Start processing data at row field, then that number of rows including the first row are not loaded into the target.

  • Numeric column: Select the Convert invalid data to null checkbox to convert an invalid numeric column value into a null value.

  • Newlines included in data values: Select this option if there are newline characters or returns to the beginning of the current line without advancing downward in the data fields. Selecting this option will increase the time taken to process the load. If you do not select this option when loading the data, the rows with newlines in the fields will be rejected. You can view the rejected row in the Job Report panel.

Settings Template

The save settings feature saves the configuration set in the Cart settings in the form of a JSON file. When opening the Settings template, you have the following options:
  1. Load Settings Template: Loads a settings template from your local system.
  2. Save Settings Template: Saves the current existing settings template.
Description of settings-template.png follows
Description of the illustration settings-template.png
You can use the Load Settings Template if you want to use an existing customized template present in your local.
  1. From the Settings Template in the Settings tab of the Load Data page, select Load Settings Template.
  2. You will see a Load Settings Template wizard, click the Settings Template JSON to load a JSON file from your system.
  3. Clicking the Settings template JSON will open your local system. Click OK to load the JSON file.
  4. After you load the JSON file, you can view the updates applied automatically to the settings tab which matches the JSON settings template you load from your local.
You can use the Save Settings Template to save the existing current Settings template.
  1. From the Settings Template in the Settings tab of the Load Data page, select Save Settings Template.
  2. The Template file editor appears. Click the Template File name and name the new template.
  3. Click OK to finish saving the new name of the existing template.
  4. You can test the configuration of the new template.

Bulk Edit Settings

You can use the Bulk edit settings to update all the columns at once from the mapping table. Use it to apply changes to the selection currently displayed in the results pane. You can search for the values of the column you want to edit in the search field and click the magnifier icon. The mapping table will display the results of the search. Select the Bulk Edit setting to update the column. The Bulk Edit setting allows you to:
  • Update values of all the fields in a group.
  • Find and replace, Add Prefix and Add suffix to target column name.
  • Include the column(s) for loading data to the target table.
  • Exclude the column(s) for loading data to the target table.

Searching the Column

The Bulk Edit setting updates the columns returned by the search field. The search box besides the Bulk Edit setting icon filters the list of columns you wish to update in a bulk. As soon as you start typing in the search field, the tool returns the field values which contains the letters you type. You can remove the filter by deleting all the content from the search box and clicking the magnifier icon that appears next to the search box.

The Bulk Edit setting enables you to update the values of the following columns for all the searches returned by the search field:
  • Data Type
  • Target Column name
  • Include Columns for loading
  • Exclude columns for loading
Consider changing the Data Type of first and third row from VARCHAR to NUMBER in the mapping table.Description of bulk-edit1.png follows
Description of the illustration bulk-edit1.png

Settings Tab - Mapping Section

The settings in the Mapping section control how data from the source files are loaded into the rows of the target database table. For each row, the data from the column listed under Source column will be loaded into the column listed under Target column.

As mentioned above, the contents of the Mapping table change according to what processing option you chose in the Table section and which properties you set in the Properties section.

  • Include: This check box ensures that the row you select is loaded from the source column to the target column.
  • Source column: Lists the columns from the source file.

    If the Column header row option under Properties is selected, Source column shows the names of the columns in the source file. If the Column header row option is not selected, generic names like COLUMN_1, COLUMN_2, etc., are used. This field is always read only.

    You can view two source columns FILE$NAME and SYSTIMESTAMP. The FILE$NAME column enables you to locate the source file containing a particular data record. For example, you load a source file that contains a list of files. The file names in the file list refer to the department names across the organization. For instance, a finance.txt file contains data from the Finance department. In the mapping, you can use string data types to extract the department name from the output of the file name column. You can use the extracted department name to process data differently for each department.

    The SYSTIMESTAMP column allows us to view the current timestamp in the database.

    Note:

    • FILE$NAME and SYSTIMESTAMP source columns are not included by default. You must check the Include check box and run the load for the target table to display these two columns.
    • When you are creating a livefeed, the FILE$NAME and SYSTIMESTAMP source columns appear in the Mapping table by default.
  • Target column: Lists the columns in the target table. Accept, select, or enter a column in the target table. You need to make sure that the target column is not empty. The target column name cannot have duplicate name as another target column. The target column length must not be beyond 128 bytes. 128 byte limit is a database limit.

    The contents of this column differ, depending on what you selected for the table processing Option and whether you selected for the Column header row option.

    • If (1) the processing option is Create Table or Drop Table and Create New Table and (2) the Column header row option is selected, then the Target column uses the names of the columns in the source file. You can change the name of a target column by replacing the provided name with a new one.

    • If (1) the processing option is Create Table or Drop Table and Create New Table and (2) the Column header row option is not selected, then generic names like COLUMN_1, COLUMN_2, etc., are used. You can change the name of a target column by replacing the provided name with a new one.

    • If (1) the processing option is Insert into Table, Replace Data, or Merge Into Table and (2) the Column header row option is selected, then the Target column has a drop-down list of all the columns in the target table, with their data types. By default, the column with the name corresponding to the source column is selected, but you can select a different one from the list.

    • If (1) the processing option is Insert into Table, Replace Data, or Merge Into Table and (2) the Column header row option is not selected, then the Target column has a drop-down list of all the columns in the target table, with their data types. Select a column from the list to use as the target column.

    Note:

    If you're loading multiple files from a folder in a single data load job, only the first file will be shown in the Mapping section. However, as long as the column names and data types match, the data from all source files will be loaded.

  • Data Type: Lists the data type to use for data in that column. This column is displayed only for Create Table or Drop Table and Create New Table. The contents change depending on whether the Get from file header option is selected.

    • If the Get from file header option is selected, Data type shows the data types of the columns in the source file (for Create Table) or in the existing table (for Drop Table and Create New Table). If you want to change the data type for the target, click the name and select a different one from the list.

    • If the Column header row option is not selected, Data type shows all available data types. Select the data type to use for the target column from the list.

  • Length/Precision (Optional): For columns where the Data Type is NUMBER, enter the length/precision for the numbers in the column. Precision is the number of significant digits in a number. Precision can range from 1 to 38.

    For columns where Data Type is VARCHAR2, the Auto value in Length/Precisionfield enables the Auto Size feature.

    With the Auto-Size column Width feature, you can automatically size any column to fit the largest value in the column. Select Auto from the Length/Precision drop-down values or pick a value from the drop-down list.

  • Scale (Optional): For columns where the Data Type is NUMBER, enter the scale for the numbers in the column. Scale is the number of digits to the right (positive) or left (negative) of the decimal point. Scale can range from ranges from -84 to 127.

  • Format: If the data type in the Data type column is DATE or one of the TIMESTAMP types, select a format for that type from the from the Format drop-down list.

  • Merge Key: This option is used only for the processing option Merge into Table.

    For the Merge into Table option, you must specify at least one column to use as a key column. Merge keys are one or more columns that uniquely identify each row in the table. To specify a key column, select the Merge Key checkbox for the column. Merge keys must not contain any null values. For loading tables with primary keys, this option automatically enables the selection of primary key columns as the merge keys.

Preview Tab

The Preview tab displays the source data in tabular form. The display reflects the settings you chose in the Properties section.

If you dragged a folder containing multiple files into the data load cart and then clicked Settings Settings for that card, the File pane includes a Preview Object (File) drop-down list at the top of the pane that lists all the files in the folder. Select the source file you want to preview from that list.

Table Tab

The Table tab displays what the target table is expected to look like after the data has been loaded. If you chose the Create Table processing option, no table is shown.

Errors Tab

The Error tab lists any errors generated when attempting to run the data load job.

SQL Tab

The SQL tab displays the SQL commands that will be run to complete this data load job.

Note:

You can see the SQL code even before the table is created.

Close Button - Save and Close the Pane

After entering all the details for the data load job, click Close at the bottom of the page. This saves the details you entered and returns you to the Load Data from Cloud Storage pane. To close the page without saving your entries, press Escape.

Run the Data Load Job

Once you've added data sources to the data load cart and entered details about the data load job, you can run the job.

To run the job:

  1. If you haven't already done so, click the Close button in the Load Data from Cloud Storage pane to save your settings and close the pane. If any of the settings are invalid, an error message reports the problem. Fix the problem and click Close.
  2. Click Start Start in the data load cart menu bar. To stop the data load job, click Stop Stop.

    When the data load job completes, the Load Cloud Object page displays the results of the job. At the top of the page, a Status message shows the number of items for which the load has completed over the number of items in the job and the total time elapsed for the job.

View Details About the Data Load Job After It Is Run

To view details about the data load job after it is run, click the Actions icon and select Settings in the card for the item. The Load Data from Cloud Store Locations pane is displayed again, with the settings used for the job plus some additional details about job run.

Settings Tab

The Settings tab shows the details that were set in the Settings tab when preparing for job.

Preview Tab

The Preview tab shows the source file or files used for the data load job.

Table Tab

The Table tab shows the table created or modified from the data load job.

Job Report

Displays a report of the total rows loaded and failed for a specific table. The report displays of the total rows loaded and failed. You can view the name of the table, the time the table was loaded and the time taken to process the load.

SQL Tab

The SQL tab shows the SQL that was created and run to load the data. You can click the Copy to clipboard Copy to clipboard button at the top of the pane to copy it.

Data Definition Tab

The Data Definition tab shows the data definition of the table created from the data load job.

View the Table Resulting from the Data Load Job

To view the new tables or tables modified by the data load job, you can:

  1. Fix your data load job. After your data load job, you might see errors that you want to correct, or upon inspection, realize that you wanted to name a column differently. In such cases, click the Reload Cart option to reload cards from your recent cart and edit them as you did before your first attempt. Click the pencil icon to make any changes to the data load job (i.e., change a column name).

  2. On the Data Load home page, click the Data Load Jobs card.
  3. On the Data Load Jobs page, find the new or updated table.
  4. Click Settings on the right side of the card and review the table.