This topic summarizes how Big Data Discovery helps to profile and
enrich new data.
You
can add new data as files in HDFS using data integration technologies. You can
also use Studio to upload files, such as Excel and CSV, or pull data from a
database using your credentials, and import it into a personal sandbox in HDFS.
In either case, when it loads data, Big Data Discovery performs these
activities for you:
- Profiles the data, inferring
data types, classifying content, and understanding value distributions.
- Lists most interesting data
sets first and indicates what they contain.
- Decorates the data with
metadata, by adding profile information.
- Enriches the data, by
extracting terms, locations, sentiment, and topics, and storing them as new
data in HDFS.
- Takes a random sample of the
data of the specified size. (You can increase the sample size, or load full
data.)
- Indexes the data and
prepares it for fast search and analysis in BDD.
As an outcome, the data you need resides in HDFS, is indexed by the
Dgraph and enriched by BDD, and is ready for your inspection and analysis.