Workflow in Studio

Studio is the visual face of Big Data Discovery. It enables exploratory data analysis and is structured around Explore, Transform and Discover areas.

  • After you log in, you see the Catalog. It lets you browse and discover the contents of the data lake. (A data lake is a storage repository that holds raw data in its native format until it is needed.)

    In Catalog you can find projects you created before, or projects shared by other users. You can also browse data sets discovered in a Hive data base, or data sets that other users uploaded from a file or imported from an external JDBC data source:

    Catalog in Studio where you can select existing projects or data sets.

  • Next, if there are no Discover pages created yet by other users, the Explore page for the first data set opens. If Discover pages already have been created by other users, the first Discover page opens.

  • If you move to Explore, you can pick data sets that are of interest to you for further exploration, and use the results of data profiling and enrichments to grasp basic data characteristics. For example, you can look for outliers. Outliers are values that are distant from other values for the attribute. You can also explore scatter plots to see correlations between values for two attributes, or link data sets.

    explore text highlighted

  • If some pages by other users have already been created and you have access to them, then instead of Explore, you move to Discover for the project that is shared with you.
  • If you started with Explore, then to move to Transform, or Discover, you must first add the data set to a project. Alternatively, you can find and select an existing project:

    Add to project button in Studio

    Note that once your data set is in a project, you can continue using Explore.

  • You can then move to Transform, where you transform the data and remove inconsistencies. For example, you can change data types, or create custom transformation scripts.

    transform text highlighted

  • In Discover you arrive at insights about the data and save your projects for sharing with others.

    discover text highlighted