Define Pipeline Characteristics

Enter basic configuration details in this window.
To configure the basic details for the dataset:
  1. Enter the required details in the Define Pipeline Characteristics window as shown in the following table.

    Table 8-2 Details for Basic Details pane

    Field Description
    Code

    Enter the identification code of the dataset.

    This field is limited to 30 alphanumeric characters.

    Dataset Name

    Enter the name of dataset.

    This field is limited to 30 alphanumeric characters. Space not exceeding 30 characters. You cannot keep this field blank.

    Description

    Enter the purpose of the creation of the dataset.

    This field is limited to 150 alphanumeric characters. Space not exceeding 150 characters.

  2. Select the data library from the options: Pandas, Modin, or Spark and select Python Runtime from the drop-down and click Close.
    • Pandas: An open-source data manipulation library for Python. It provides data structures such as Series (1-dimensional) and DataFrame (2-dimensional) that allow for easy manipulation and analysis of data. It also provides tools for reading and writing data to various file formats, including CSV, Excel, and SQL databases.

      Pandas is the default selection.

    • Modin: An open-source library that allows for faster operations on DataFrames using distributed computing which can lead to significant speed improvements, particularly for large datasets or computationally expensive operations.
    • Spark : Pyspark option for scaling dataset.

      If Spark library is selected, the Python Runtime drop-down option is not displayed.

  3. Click Next to go to the next step.