1 Introduction

This section provides a high-level introduction to the Data Processing component of Big Data Discovery.

BDD integration with Spark and Hadoop
Hadoop provides a number of components and tools that BDD requires to process and manage data. The Hadoop Distributed File System (HDFS) stores your source data and Hadoop Spark on YARN runs all Data Processing jobs. This topic discusses how BDD fits into the Spark and Hadoop environment.
Secure Hadoop options
This section describes how BDD workflows can be used in a secure Hadoop environment.
Preparing your data for ingest
Although not required, it is recommended that you clean your source data so that it is in a state that makes Data Processing workflows run smoother and prevents ingest errors.