3 Managing Data Pipelines

Data pipelines prepare filtered data which can be used to create and run scenarios. Data pipelines prepare data by selecting and joining data sources to create virtual tables of data, adding derived attributes to data, running derivations on the data to determine the risk associated with the entity, and so on.

Data pipelines are categorized according to function, into the following types:

·        Data pipelines which prepare the data and make it compatible for use in Oracle FCCM Cloud products.

·        Data pipelines which move the evented data to Case Management (CM) for further action.

Pre-configured Data Pipelines

The application comes with the following sets of ready-to-use data pipelines:

·        Data Pipelines which prepare data

§       Account Data Movement Pipeline

§       Case Data Movement Pipeline

§       City Data Load

§       Country Data Load

§       Customer Data Movement Pipeline

§       Transaction Data Movement Pipeline

§       Event Data Movement Pipeline

§       Evented Account Data Movement Pipeline

§       Evented Customer Data Movement Pipeline

§       Evented External Entity and Derive Address Data Movement Pipeline

§       Evented Transaction Data Movement Pipeline

§       External Entity and Derive Address Data Movement Pipeline

§       Goods Data Load

§       Load Additional Account Data

§       Load Account Anticipatory Profile Data

§       Load Account Data

§       Load Account Group Data

§       Load and Prepare Watchlists

§       Load Customer Add On Data

§       Load Customer Anticipatory Profile Data

§       Load Customer Data

§       Load Customer Mapping Data

§       Load Identifier

§       Load Intermediate Account and Transaction Data

§       Load Transaction Data and Derive External Entities and Risk

§       Derive Risk and Load Supplementary Information

§       Load Account Staging Data

§       Load Customer Staging Data

§       Load Transaction Staging Data

§       Load Watchlist Staging Data

§       Data Loading File Transfer

§       Data Loading File Scanner

§       Data Truncate Holiday Master

§       Data Loading File Transfer Holiday Data

§       Load Holiday Master Data

§       Load Correspondent Bank

§       Port Data Load

§       StopKeyword Watchlist

·        Data Pipelines which move data to Case Management

§       Load Account Business Data to Case Management

§       Load Calendar Data

§       Load Case Data

§       Load Customer Business Data to Case Management

§       Load Evented Account Data to Case Management

§       Load Evented Customer Data to Case Management

§       Load Evented External Entity and Derived Address Data to Case Management

§       Load Evented Transaction Data to Case Management

§       Load Event Data to Case Management

§       Load External Entity and Derived Address Data to Case Management

§       Load Scenario Data to Case Management

§       Load Transaction Business Data to Case Management

§       Load Trusted Pair Data

·        Data Pipelines which prepare data for Investigation Hub

§       Business Data Load for CS

§       Business Data Load for KYC

§       Data Loading File Transfer CS

§       Data Loading File Transfer KYC

·        Pipelines which are used to maintain data

§       Apply Redaction Policy

§       Create Batch Redaction Policy

§       Drop Batch Redact Policy

§       Delete Business Data

§       Delete Case Management Data

§       Delete Staging Data

Widgets in Data Pipelines

Depending on the pipeline type, specific widgets are available in the widgets pane of the pipeline. The following table describes the widgets available in Data pipelines. For more information about the widgets available in Scenario Pipelines, see Widgets in Scenario Pipeline.

 

Data Pipeline – Widgets and Descriptions

Widget

Name

Description

Dataset

Use this widget to add a Dataset. Datasets correspond to the contents of a single database table which can be a staging table, business table, or a table that has been created by a data pipeline. A data pipeline must always begin with a Dataset widget.

Filter

Use this widget to filter the data in the pipeline to use a subset of the data records which are available. On applying a filter, all data matching the filter conditions are obtained. This allows you to search and analyze behaviors of interest.

Join

Use this widget to combine or group multiple tables using various join operators.

 

Persist

Use this widget to write data to database tables so that it can be used in other pipelines.

 

External Service

Use this widget to add an external service. External Services perform actions on the data, such as loading or moving the data, or performing a virus scan.

 

Dataset Widget

The dataset widget enables you to select and filter data sources for use in the later stages of the pipeline. A data pipeline must always begin with a dataset. Datasets correspond to the contents of a single database table which can be a staging table, business table, or a table that has been created by a data pipeline.

Using the dataset widget, you can select any available staging table, name the dataset, perform DQ (data quality) checks on one, multiple, or all columns of the selected staging table, and filter the output by defining conditions for one, multiple, or all columns of the selected staging table using one of three methods: Expression Builder, Tables, or Text. When multiple columns are selected, the OR logic is applied to filter the outputs.

To create a dataset, follow these steps:

1.     Navigate to the Pipeline Designer page.

2.     Drag and drop the Dataset widget from the widgets pane in the upper-right corner of the designer pane.

3.     Hover on the Dataset widget and click Edit . Provide details as described in the following table:

Dataset Widgets and their Descriptions

Field

Description

Name

Enter the name for your dataset.

Tables

Select a table from the Tables drop-down list. This list consists of all the staging tables that are available.

The columns of the selected table are displayed in the Attributes pane. The attributes include the Logical Name, Column name, and Column Type.

Enable DQ check

Select this option to enable the data quality check for the table. You can select each column of the table, specify checks such as range, length, LOV, and null check, and save the rule after naming it. Based on the rule, checks are performed on the columns of the selected staging table to filter out information you do not require.

To specify DQ rules, follow these steps:

1.     Click Add + next to the Enable DQ check option.

2.     Under Master DQ, select one or multiple Primary Key options. All columns of the selected staging table are listed for you to select.

3.     Under DQ Rules, select a column from the Available Columns list. This list contains all columns of the selected staging table.

4.     Enter a rule name for the selected column of the staging table and specify the following checks for this rule:

a.     Range Check DQ Rules. Specify the following range checks:

    Is Range Check Required: Select Yes or No. If you select No, jump to the length check rule. If you select Yes, provide a value in the Minimum Value field.

    Is Provided Minimum Value Inclusive: Select Yes or No.

    Maximum Value: Provide a value in the Maximum Value field.

    Is Provided Maximum Value Inclusive: Select Yes or No.

b.     Length Check DQ Rules: Specify Is Length Check Required: Select Yes or No. If you select No, jump to the LOV check rule. If you select Yes, provide a value each in the Minimum Length and Maximum Length fields.

c.     LOV Check DQ Rules: Specify is LOV Check Required: Select Yes or No. If you select No, jump to the Null Check DQ rule. If you select Yes, provide the LOV values in the LOV Values field.

d.     Null Check DQ Rules: Specify the following Null check DQ rules:

    Is NULL Check Required: Select Yes or No. If you select No, jump to the Is Null Value Allowed rule. If you select Yes, provide the null default value in the Null Default Values field.

    Is NULL Value Allowed: Select Yes or No. If you select No, provide the null default value in the Null Default Values field.

e.     Referential Check DQ Rules:  Specify if Is Referential Check Required. Select Yes or No. If you select Yes, select the name of the table and column that the DQ Rule will refer to when verifying the data.

NOTE:   

You must select a value for these checks, either Yes or No.

5.     Click Save to save your DQ rule.

6.     Repeat these steps to define DQ rules for all the columns of the table based on your requirement.

 

4.     Click Save  to save the changes. The dataset is created and is visible on the canvas. It is also available for use in the Dataset pane.

5.     To reuse a dataset you have created, click the Dataset icon  on the upper-left corner to view the Dataset pane. Click Expand to open the list to display the available datasets including the ones you have created. Click the dataset name you want and drag it into the canvas of the Pipeline Designer.

6.     You can perform certain tasks that are common in all the widgets, such as edit, delete, filter, and so on. For more information, see Common Tasks.

Creating Persist

The Persist widget enables you to write data to database tables so that it can be used in other pipelines. It is used to map columns of the source table to a destination table. The Persist widget helps you to map attributes from the input datasets to the target table which will be stored.

To create a persist, follow these steps:

1.     Navigate to the Pipeline Designer page.

2.     Drag and drop the Persist widget from the widgets pane to the designer pane.

3.     Hover on the Persist widget and click Edit . A dialog box is displayed.

4.     Provide the details as described in the following table

Fields to Create Persist and their Descriptions

Field

Description

Save As

Enter the name for the Persist widget.

Source Datasets

Displays the list of datasets that are linked to the persist widget.

Target Table

Select the target table to which you want to map the columns in the source dataset tables.

Type

Select the type of mapping that you want to implement for the columns in the target table.

The following options are available:

·        Full Load: This option enables you to truncate the existing data in the target table and load with new data from the source datasets.

·        SCD: This option represents a slowly changing dimension. This option is used to map data from source datasets to the target table with both current and historical data stored in the target table. . You can select the following options:

§       Surrogate: Values of this type are typically generated incremental keys. For example, Sequence IDs.  

§       Unique: Use this type for values which are unique across the dataset. For example, Customer Identifiers.

§       Type 2: Use this type for values which may be changed or added to. For example, Customer Names. Values of this type compare both current and historical data to provide the latest record as active. Historical values will be marked inactive.

§       Direct: Use this type for values which should consider only the current data for this record. For example, Data Origin.

·        Incremental: This option is used to map data from the source dataset to a target table in an incremental manner. Incremental mapping adds new entries in addition to the existing data.

·        Merge: This option is used to map data from source dataset to target table such that both current and historical data are stored and incremental data is also stored.

·        Generate CSV: This option is used to configure the headers of the source dataset and map the source columns with target column headers with user preference names. This provides insight into the source dataset, which can then be downloaded as a .csv file using the Get Object PAR API. For more information on the Get Object PAR API, see Rest API for FCCM Cloud Service.

Join

Available only if you have connected multiple datasets. For information on Joining datasets, see section Creating Join, beginning at Step 5.

Hints

Hints provide a mechanism to direct the optimizer to choose a certain query execution plan based on the specific criteria.

Select the Type of SQL Operation from the drop-down list and provide a hint in the Hints field.

 

5.     Follow these steps in the Map pane:

a.     Select the source dataset from the drop-down list on the left-hand side.

b.     The columns in the table that are associated with the selected source dataset are listed on the left-hand side.

NOTE:   

The source dataset table is referred to as Source Entity, the columns in the Source Entity are referred to as Source Column.

c.     Select the target table on the right-hand side. The columns in the target table are listed on the right-hand side.

NOTE:   

The target dataset table is referred to as the Target Entity, the columns in the Target Entity are referred to as the Target Column.

 

·        To Automap, click the link icon. Source and target columns are auto-mapped based on Column Names and Data Types.

·        To map source and target columns manually, select a source column, target column, and then click Expand .

NOTE:   

You must select columns of the same data type.

The source column is mapped to the target column. The mapping details are displayed in the table on the right-hand side.

·        To add a condition to the target column, click Add + and use the Expression Builder to create the condition.

The result is displayed in the target column on the right pane.

·        You can also import source and target columns from an Excel sheet. Click Choose File and select the Excel sheet.

·        You can also export the mapped source and target columns to Excel using Export.

6.     Click Save  to save the changes. The persist is created.

You can perform certain tasks that are common in all the widgets, such as edit, delete, filter, and so on. For more information, see Common Tasks.

Creating Join

The Join widget enables you to combine or group multiple tables using various join operators.

To create a join, follow these steps:

1.     Navigate to the Pipeline Designer page.

2.     Drag and drop the Join widget from the widgets pane to the designer pane.

3.     Hover on the Join widget and click Edit . A dialog box is displayed.

4.     Enter the name in the Name field.

5.     Follow these steps in the Output pane:

d.     Select the required tables from the drop-down lists on the left-hand side and right-hand side that you want to join.

e.     Select the join operators to join the two tables. For more information, see Join Operators.

6.     Add a Join condition to the Join table to save the widget.

7.     To add a condition, click Add + on the right (Add Group and then Add Condition) and specify rules for the condition. You can add multiple groups and multiple conditions under each group.

8.     Click Save  to save the changes. The join widget is created.

9.     You can perform certain tasks that are common in all the widgets, such as edit, delete, filter, and so on. For more information, see Common Tasks.

Join Operators

The following types of join operators are available:

·        Inner Join: The Inner Join selects all rows from both participating tables as long as there is a match between the columns.

·        Left Join: The Left Join returns all rows from the left table, with the matching rows in the right table.

·        Right Join: The Right Join returns all rows from the right table, with the matching rows in the left table.

·        Full Join: The Full Join combines the results of both the left and right outer joins and returns all rows from the tables on both sides.

Creating Filters

The Filter widget defines criteria that filter the data in the pipeline to use a subset of the data records which are available. On applying a filter, the data matching the filter conditions are obtained which can be used to search and analyze behaviors of interest.

To create a filter, follow these steps:

1.     Navigate to the Pipeline Designer page.

2.     Drag and drop the Filter widget from the widgets pane to the designer pane.

3.     Hover on the Filter widget and click Edit Edit icon. The Filter pane is displayed.

4.     Enter the name for the filter in the Name field.

5.     Navigate to the Filter pane. The Output pane is displayed.

6.     Configure the filter. For more information, see Configuring Filter.

Creating External Service

External Service refers to an existing set of services that the customer can use to derive the risk of certain business entities, configure data movement for case management, create events, and so on. A business entity refers to parameters such as customer, account, transaction, and so on.

To create an external service, follow these steps:

1.     Navigate to the Pipeline Designer page.

2.     Drag and drop the External Service widget from the widgets pane to the designer pane.

3.     Hover over the External Service widget and click Edit  Edit icon. The External Service pane is displayed.

4.     Select the external service from the Name drop-down list.

5.     Based on the external service selected, the following details are auto-populated:

§        The description for the external service is auto-displayed in the Description field.

§        The corresponding details of the selected external service are displayed in a table. The details include parameter names and parameter values associated with the External Service.

6.     Click Save  to save the changes. The external service is created.

7.     You can perform certain tasks that are common in all the widgets, such as edit, delete, filter, and so on. For more information, see  Common Tasks.