Creating Datasets Using the Dataset Widget

The dataset widget enables you to select and filter data sources for use in the later stages of the pipeline.

A data pipeline must always begin with a dataset. Datasets correspond to the contents of a single database table which can be a staging table, business table, or a table that has been created by a data pipeline.Using the dataset widget, you can select any available staging table, name the dataset, perform DQ (data quality) checks on one, multiple, or all columns of the selected staging table, and filter the output by defining conditions for one, multiple, or all columns of the selected staging table using one of three methods: Expression Builder, Tables, or Text. When multiple columns are selected, the OR logic is applied to filter the outputs.

To create a dataset, follow these steps:

  1. Navigate to the Pipeline Designer page.
  2. Drag and drop the Dataset widget from the widgets pane in the upper-right corner of the designer pane.
  3. Hover on the Dataset widget and click Edit Edit icon. Provide details as described in the following table:

    Table 4-3 Dataset Widget Details

    Field Description
    Name Enter the name for your dataset.
    Tables Select a table from the Tables drop-down list. This list consists of all the staging tables that are available.

    The columns of the selected table are displayed in the Attributes pane. The attributes include the Logical Name, Column name, and Column Type.

    Enable DQ check Select this option to enable the data quality check for the table. You can select each column of the table, specify checks such as range, length, LOV, and null check, and save the rule after naming it. Based on the rule, checks are performed on the columns of the selected staging table to filter out information you do not require.To specify DQ rules, follow these steps:
    1. Click Add + next to the Enable DQ check option.
    2. Under Master DQ, select one or multiple Primary Key options. All columns of the selected staging table are listed for you to select.
    3. Under DQ Rules, select a column from the Available Columns list. This list contains all columns of the selected staging table.
    4. Enter a rule name for the selected column of the staging table and specify the following checks for this rule:
      • Range Check DQ Rules:Specify the following range checks:
        • Is Range Check Required: Select Yes or No. If you select No, jump to the length check rule. If you select Yes, provide a value in the Minimum Value field.
        • Is Provided Minimum Value Inclusive: Select Yes or No.
        • Maximum Value: Provide a value in the Maximum Value field.
        • Is Provided Maximum Value Inclusive: Select Yes or No.
      • Length Check DQ Rules: Specify Is Length Check Required: Select Yes or No. If you select No, jump to the LOV check rule. If you select Yes, provide a value each in the Minimum Length and Maximum Length fields.
      • LOV Check DQ Rules: Specify is LOV Check Required: Select Yes or No. If you select No, jump to the Null Check DQ rule. If you select Yes, provide the LOV values in the LOV Values field.
      • Null Check DQ Rules: Specify the following Null check DQ rules:
        • Is NULL Check Required: Select Yes or No. If you select No, jump to the Is Null Value Allowed rule. If you select Yes, provide the null default value in the Null Default Values field.
        • Is NULL Value Allowed: Select Yes or No. If you select No, provide the null default value in the Null Default Values field.
      • Referential Check DQ Rules: Specify if Is Referential Check Required. Select Yes or No. If you select Yes, select the name of the table and column that the DQ Rule will refer to when verifying the data.

        Note:

        You must select a value for these checks, either Yes or No.
    5. Click Save to save your DQ rule.
    6. Repeat these steps to define DQ rules for all the columns of the table based on your requirement.
  4. Click Save Save icon to save the changes. The dataset is created and is visible on the canvas. It is also available for use in the Dataset pane.
  5. To reuse a dataset you have created, click the Dataset icon Dataset icon on the upper-left corner to view the Dataset pane. Click Expand to open the list to display the available datasets including the ones you have created. Click the dataset name you want and drag it into the canvas of the Pipeline Designer.
    You can perform certain tasks that are common in all the widgets, such as edit, delete, filter, and so on. For more information, see Common Tasks.