Profiling and Enriching Data

This topic summarizes how Big Data Discovery helps to profile and enrich new data.

You can add new data as files in HDFS using data integration technologies. You can also use Studio to upload files, such as Excel and CSV, or pull data from a database using your credentials, and import it into a personal sandbox in HDFS. In either case, when it loads data, Big Data Discovery performs these activities for you:
  • Profiles the data, inferring data types, classifying content, and understanding value distributions.
  • Lists most interesting data sets first and indicates what they contain.
  • Decorates the data with metadata, by adding profile information.
  • Enriches the data, by extracting terms, locations, sentiment, and topics, and storing them as new data in HDFS.
  • Takes a random sample of the data of the specified size. (You can increase the sample size, or load full data.)
  • Indexes the data and prepares it for fast search and analysis in BDD.

As an outcome, the data you need resides in HDFS, is indexed by the Dgraph and enriched by BDD, and is ready for your inspection and analysis.