Before You Begin
This 20 min tutorial guides you through the creation of a Data Preparation task, teaches you how to create transformations using the Transformation Creation tool, and how to execute the Data Preparation job.
Background
Collect, cleanse, and consolidate your data quickly and easily with Oracle Data Integration Platform Cloud's Data Preparation task. Before you begin, you'll need to create a connection identifying your data source, from which Data Integration Platform Cloud harvests your data, and then you'll set up your data preparation task.
For the purposes of this tutorial, a File type Connection is used. If you want to use our webclicks.txt, you'll need to upload it to the Host VM.
What Do You Need?
Create
and Set your Default Connections
You must now set your Default Connections in the Admin area. Be sure to create your Object Storage Classic Connection and Oracle Database Connection before attempting to perform these steps. You only have to perform this configuration step once, but you can always return to make changes.
- In the Navigation menu, click Admin.
Description of the illustration module-1-homepage - On the Admin page, click Edit, then select an Oracle
Object Storage Classic connection.
You need one if you're preparing data from a remote file accessible from the Data Integration Platform Cloud agent. - Now select a default stage connection from the Oracle menu.
You need this connection in order to do any data preparation tasks using the interactive Transformation Creation tool. - Click Save.
Create
a Data Preparation Task
Now you can create a data preparation task. You create tasks from the Home page, or from the Create menu in the Catalog.
- Click Home and then click the right arrow (>) in the carousel.
- Click Create from the Data Preparation tile.
Description of the illustration module-2-dataprep-tile - On the Create Data Preparation Task page, enter a name and
description for your task.
You’ll use the value you provide in the Name field to identify it in the catalog. The Identifier is auto-generated based on what you enter as the name and is used when searching for this task in menus and other collections. You can edit the Identifier later, but it can only include capital letters, numbers, and underscores (_). For this tutorial enter: - Name: PrepWebData
- Description: Prep External Website Data
- Specify your source configuration.
- Click the Add New Connection icon to add a source connection.
- For this tutorial enter:
- Name: LocalFileStore
- Description: Local File Store
- Agent: Leave blank
- Type: File
- Directory: /tmp
- Review connection settings.
- Click Test Connection and review for errors.
- Click Save.
- Click the Select button located next to File and pick webclicks.txt from the list.
- Click Select.
- Click Options and uncheck the Header > Enable check box.
- Click OK.
- Specify your target configuration.
- Click the Create icon to create a new target connection.
- For this tutorial enter:
- Name: Sync Target
- Description: Target Schema
- Agent: DIPC.localdomain:7005
- Type: Oracle (selecting Oracle expands the Connection Settings.)
- Hostname: DIPC
- Port: 1521
- Username: DIPC_TGT
- Password: welcome1
- Service Name: oclpdb1.localdomain
- Schema: DIPC_TGT
- CDB Connection: ORACLE_CDB
- Click Test Connection and review for errors.
- Click Save.
- For Data Entity, enter WEBCLICKS.
- Click Save to save your task.
- Click Transform.
It may take a few minutes for Data Integration Platform Cloud to parse and profile your data before it’s ready for you to transform. You can monitor the progress on the Jobs page until it’s ready. Once finished the Data Preparation page displays.Description of the illustration module-2-dataprep-processing
Prep
and Transform your Data
After creating your data preparation task, the next step is to create your transforms. The profiling process has captured advanced profiling information as the flat file was ingested.
- Click each column to review the profiling results in the right hand data profile drawer.
- Click the Data tab to view the data.
Description of the illustration module-3-dataprep-data - Click the menu bar on the column to transform data.
Let's perform a few simple transforms. First, we'll replace text using the Text & Patterns > Replace transform.
- Select Column 5, then from the Actions
menu, select Text &
Patterns > Replace
Description of the illustration module-3-dataprep-actionsmenu - Replace Shoes.html with shoes.html
- Click Apply.
Note: The transform is saved and displayed in the left-hand drawer. This transform can be deleted and the data, metadata, and profiling statistics are updated accordingly.
Next let's add the following column transforms:
- Column 1: Rename WROWID
- Column 2: Rename CUSTID
- Column 5: Rename PAGENAME
- Column 6: Rename PAGEREFERRER
- Column 7: Rename VISITDATE
Now let's merge COLUMN3 and COLUMN4 and create a new NAME column.
- Select COLUMN3 and COLUMN4, then from the Actions menu, select Merge Columns.
- Enter Name for the new column name.
- Click Apply.
- Delete columns 3 and 4.
Finally, let's fill null values in column PAGEREFERRER.
- Select the PAGEREFERRER column, then from the Actions menu, select Null Values > Null Fillup
- Enter the value Unknown.
- Click Apply.
- Review your transforms and data.
Description of the illustration module-3-dataprep-transforms - Click Save and Run.
Monitor
Your Job
- When you click Save &
Run, you start the Data Preparation job, and are
brought to the Monitor page where you select your job and
review its details.
Description of the illustration module-4-monitor-homepage - After the job completes, you can go to the Catalog to view
the target data entity you created by your transforms.
Description of the illustration module-4-catalog-attributes - Now you can try it on your own.