Running Spark Jobs Using Apache Oozie

Run Spark jobs using Apache Oozie.

Apache Oozie is a service which takes a properties file to trigger a sequence of actions as part of a pipeline. It supports various actions such as Apache Hive, Spark, and shell programs.

You can run Spark jobs on both HA and non-HA clusters. The properties files for all jobs are the same and only vary based on the nature of the cluster. Oozie accesses HDFS for running its workflow using a workflow XML, when it tries to reach for it. For this purpose, Oozie supports HA for HDFS using the nameservice. Therefore, that's the only distinction between HA and non-HA clusters running Spark jobs.

For more information, see: