About this guide

This guide describes the Data Processing component of Big Data Discovery (BDD). It explains how the product behaves in Spark when it runs its processes, such as sampling, loading, updating, and transforming data. It also describes Spark configuration, the Data Processing CLI for loading and updating data sets (via cron jobs and on demand), and the behavior of Data Enrichment Modules, such as GeoTagger and Sentiment Analysis. Lastly, it includes logging information for the Data Processing component in BDD, the Transform Service, and the Dgraph HDFS Agent.