8 Prepare Data

The Data Preparation Task enables you to harvest data from a data source, perform a variety of transformations to organize and cleanse your data, and then write the resultant data to a new data source.

What is Data Preparation?

Data Preparation is the process of collecting, cleansing, organizing, and consolidating data from one or more data sources.

The Data Integration Platform Cloud task enables you to understand your data, refine it, reshape it, and then load it to a target data source. This cleansed data is then used to build integration projects. When performing a Data Preparation task, you harvest the data, define transformations to cleanse the data, and then load the resultant data to a target data source.

What’s Certified for Data Preparation

Review the supported data sources and limitations before choosing your source and target data source for the Data Preparation Task in.

Note:

  • All data sources must have x86_64, the 64-bit version of x86 operating systems, with the latest update.

  • You need a VPN to use an on-premises Oracle Database for Data Preparation.

Connection type Version OEL RHEL SLES Windows Source Target

Oracle Database Cloud Classic

12.2

6.x

no

no

no

yes

yes

Oracle Database Cloud Classic

12.1

6.x

no

no

no

yes

yes

Oracle Database Cloud Classic

11.2

6.x

no

no

no

yes

yes

Oracle Database

12.2

6.x

6.x, 7.x

11, 12

2012, 2016

yes

yes

Oracle Database

12.1

6.x

6.x, 7.x

11, 12

2012, 2016

yes

yes

Oracle Database

11.2.0.4

6.x

6.x, 7.x

11, 12

2012

yes

yes

Oracle Object Storage Classic

Latest

n/a

no

no

no

yes

yes

Flat Files

n/a

yes

yes

yes

yes

yes

yes

See Agent Certifications.

Before You Prepare Data

To create a Data Preparation Task, you must first set up default connections, download and configure your Agent, and create Connections to your source and target data sources.

Set up Default Connections

You must create and configure Default Connections to prepare data in. You must at least set your Oracle Staging Connection. This Connection is used to stage the data harvested from your source data sources before running the task.

For more information about configuring these Connections, see Set up Default Connections.

Download and configure Agents

Agents enable data exchanges between Data Integration Platform Cloud and your data sources.

For more information about downloading and configuring agents, see Set up an Agent.

Create Connections

Connections enable you to enter the connectivity details for your source or target data sources, so you can use them for different tasks when needed.

For more information about creating Connections, see Create a Connection.

Create a Data Preparation Task

Create a Data Preparation Task to harvest data from a data source, organize and cleanse the data, and then write the resultant data to a new data source.

To create a Data Preparation Task:

  1. On the Catalog page, select Data Preparation from the Create menu.

  2. In the General Information section, enter a name and description.

    Note:

    You’ll use the value you provide in the Name field to identify it in the Catalog. The Identifier is auto-generated based on what you enter into the Name field and is used when searching for this Task in menus and other collections. You can edit the Identifier later, but it can only include capital letters, numbers, and underscores (_).

  3. In the Source Configuration section, select a source Connection:

    • If your Connection is a database, select the appropriate Schema and Data Entity for this Task

    • If your Connection is a file, the default Directory selected when you created the Connection is displayed. Click Select to choose a subdirectory of the default directory.

      For File, click Select to choose the appropriate file for this Task. Click Options to specify how Data Integration Platform Cloud reads your data.

  4. In the Target Configuration section, select a target Connection:

    • If your Connection is a database, select the appropriate target Schema for this Task, and then enter a name for your new table in the Data Entity field.

    • If your Connection is a file, the default Directory selected when you created the Connection is selected. Click Select to choose a subdirectory of the default directory.

      For File, click Select to choose the appropriate file for this Task. Click Options to specify how Data Integration Platform Cloud writes your data.

  5. Click Save to save your Task.

  6. Click Transform to save the configuration and then use the Transformation Creation Tool to select the transforms to perform on your data.

It may take a few minutes for Data Integration Platform Cloud to harvest your data before it’s ready for you to transform. You can monitor the progress on the Monitor page.

Transform Data

After you create your Data Preparation Task, the next step is to create your transforms. It may take a few minutes for your data to become available.

Let's review the different areas of the Transformation Creation Tool:

  • Configuration: Displays the source and target Connection details. Click Edit to return to the Data Preparation Task page and modify your Connection details.

  • Transforms: Lists all transform activity for this Transformation session. Click Remove to undo specific transforms.

  • Interactive transforms area:

    • Metadata: Displays the columns, column type, and sample data values harvested from the source data entity. From the Transform Actions menu, you can perform transforms such as Delete, Rename, and Replace.

    • Data: Displays all the data values harvested from the source data entity.

    • Reorder Columns: Click this icon to reorder the columns displayed in the interactive transforms area.

    • Expand/Collapse Data Profile: Click this icon to collapse the Data Profile pane to give yourself more room to work in the Interactive transforms area. Click the icon once more to expand the Data Profile pane to view summary information about your data.

  • Data Profile: Displays a summary of your data, such as total number of rows and columns.

  • Column Profile: Displays a more detailed summary of your data.

After your data becomes available, you can perform the following transforms from the Transform Actions menu of a selected columns.

  • Delete: Removes the column and its data values from the source data entity.

  • Rename: Renames the column to a name of your choice.

  • Replace: Enables you to search and replace data values in the column with other specified values.

  • Replace with Regex: Enables you to search and replace data values using regex patterns.

  • Extract: Searches and extracts data values into a new column.

  • Extract with Regex: Searches and extracts data values into a new column using regex patterns.

  • Change Case: Modifies column values to either Lower, Upper, or Proper.

  • Null Check: Specifies the threshold for null values within your data.

  • Null Fillup: Replaces all null values with a specified string.

  • Force Type: Changes the column from one type to another.

  • Merge Columns: Combines two columns into a new column.

The Transformation Creation Tool uses the same regex pattern syntax used with Oracle Database.

See Using Regular Expressions in Database Applications.

After completing your Transformation session, click Save to save your changes or click Run to save and execute the Job.