Create a Dataset Using a Data Flow

Use a data flow to curate data and create a dataset. For example, you might merge two datasets, cleanse the data, and output the results to a new dataset.

  1. On your home page, click Create and select Data Flow.
  2. In the Add Data dialog, select a dataset or subject area, then click Add.
    You can add more data sources at any time by clicking Add Step (+), then clicking Add Data.
  3. Optional: In the Add Data pane, include or exclude columns, or the position of columns. For example, you might want to include Zipcode and position it as the first column in the dataset.
  4. Build your data flow:
    For each function that you want to perform, double-click the step name in the Dataflow Steps Pane, then specify the properties in the Step editor pane.

    You can also edit your steps using Options in the Column headers in the Data Preview pane. For example, you can rename, reformat, merge, or transform columns.

  5. Add a Save Data step to the end of your data flow, and change the default dataset name if required.
    The output dataset is saved to the same catalog location as the data flow.
    You can optionally adjust the Treat As and Default Aggregation column settings, and add column descriptions.
  6. Save your data flow.
    You can start processing your data immediately by clicking Run Data Flow, or later by locating the dataflow in the Catalog, and using the Run right-click option. You can access the generated dataset in the Catalog in the same folder as the data flow.