Spark Dynamic Allocation and Spark Structured Streaming
Spark dynamic allocation feature is part of Spark and its source code. Data Flow helps the work of that algorithm by correctly billing for, and scheduling, the resources by monitoring signals from Spark.
- Under
spark.dynamicAllocation
, which is the official algorithm, and is documented. Although designed for batch jobs, this algorithm is compatible with batch and Spark structured streaming. not compatible with Spark Streaming. - Under
spark.streaming.dynamicAllocation
, which is unofficial. Here, dynamic allocation is designed for Spark Streaming. Disablespark.dynamicAllocation
when usingspark.streaming.dynamicAllocation
. - No implementation is designed specifically for Spark Structured Streaming. The Spark feature request is: SPARK-24815.
While spark.dynamicAllocation.enabled
can be used with Spark Structured
Streaming, it's not designed for streaming job patterns and works poorly for certain
applications. spark.dynamicAllocation.enabled
monitors the job queue and
makes scaling decisions based on the queue, but doesn't consider the nature of the streaming,
for example, trigger cadence, task cadence, and average task execution time. You might have
excessive arrhythmic allocation or de-allocation. By adjusting the values of
spark.dynamicAllocation
timeouts, you can tune it for streaming
applications.
More information about spark.dynamicAllocation
implementation can be found
in the source code comments.