- User Guide
- Modeling
- Dataset
- Create a Dataset
- Creating a Dataset
- Define Pipeline Characteristics
Define Pipeline Characteristics
- Enter the required details in the Define Pipeline
Characteristics window as shown in the following table.
Table 8-2 Details for Basic Details pane
Field Description Code Enter the identification code of the dataset.
This field is limited to 30 alphanumeric characters.
Dataset Name Enter the name of dataset.
This field is limited to 30 alphanumeric characters. Space not exceeding 30 characters. You cannot keep this field blank.
Description Enter the purpose of the creation of the dataset.
This field is limited to 150 alphanumeric characters. Space not exceeding 150 characters.
- Select the data library from the options: Pandas,
Modin, or Spark and select
Python Runtime from the drop-down and click
Close.
- Pandas: An open-source data
manipulation library for Python. It provides data structures such as
Series (1-dimensional) and DataFrame (2-dimensional) that allow for easy
manipulation and analysis of data. It also provides tools for reading
and writing data to various file formats, including CSV, Excel, and SQL
databases.
Pandas is the default selection.
- Modin: An open-source library that allows for faster operations on DataFrames using distributed computing which can lead to significant speed improvements, particularly for large datasets or computationally expensive operations.
- Spark : Pyspark option for scaling dataset.
If Spark library is selected, the Python Runtime drop-down option is not displayed.
- Pandas: An open-source data
manipulation library for Python. It provides data structures such as
Series (1-dimensional) and DataFrame (2-dimensional) that allow for easy
manipulation and analysis of data. It also provides tools for reading
and writing data to various file formats, including CSV, Excel, and SQL
databases.
- Click Next to go to the next step.