Verifying the service instance

To verify all BDDCS components are running successfully, log into Studio and add a data set by uploading a CSV file. Observe it in the Studio's Catalog and Explore pages.

It is useful to distinguish between these two user interfaces in BDDCS:
  • In the BDDCS console, you can start and stop the instance and perform other life cycle operations, such as configure and run backups. You also log into Studio from here.
  • In Studio, you load, explore, transform, and visualize data. (To automate loading data, use the Data Processing CLI in BDDCS).

You can create a new data set in Studio by uploading personal data from files. Studio supports Microsoft Excel files, delimited files such as CSV, TSV, and TXT, and also compressed file such as ZIP, GZ and GZIP. A compressed file may include only one delimited file. After upload, the data is available as a data set in the Studio's Catalog.

To verify your BDDCS instance:

  1. In the BDDCS console, confirm that the status of the instance is UP. This indicates that all BDDCS components are running.
  2. Find any publicly available CSV file of relatively small size. A file of the size of several MB works well.
  3. Copy the file to local storage on any BDCS node, and then use the HDFS put command to copy it into HDFS.
  4. Log into Studio. Click Open BDDCS Studio and provide your Studio admin user's email and password. See Accessing Studio.
  5. In Studio, load a CSV file:
    1. Go to Studio's Catalog and click Add Data Set.
    2. Click Create a data set from a file, and then click Browse.
    3. Locate the file, and click Open, and then click Next.
    4. In the Preview page, make changes as needed. For example, you can exclude an attribute from the data set, modify the name of the attribute, specify the header row, optionally omit rows from an uploaded file (this will limit the data), and also specify delimiters, quote signs, language, and encoding settings.
    5. Click Next. After Create you data set opens, specify its details and click Create. Studio creates a new data set based on the uploaded file. Studio maps the data set name to a unique Hive table name that BDD creates for it. The data set appears in the Catalog.
As a result, you have added a data set to BDDCS Studio. If the file uploads successfully, this confirms that the Data Processing component of Big Data Discovery is running successfully. The Dgraph and the HDFS Agent components are also running successfully.
Now that you have loaded your first data into BDDCS using Studio, learn how to use Studio to explore and analyze your data, or how to load more data and update it.