Spark Dynamic Allocation and Spark Structured Streaming

Spark dynamic allocation feature is part of Spark and its source code. Data Flow helps the work of that algorithm by correctly billing for, and scheduling, the resources by monitoring signals from Spark.

Spark has several implementations of dynamic allocation:
  • Under spark.dynamicAllocation, which is the official algorithm, and is documented. Although designed for batch jobs, this algorithm is compatible with batch and Spark structured streaming. not compatible with Spark Streaming.
  • Under spark.streaming.dynamicAllocation, which is unofficial. Here, dynamic allocation is designed for Spark Streaming. Disable spark.dynamicAllocation when using spark.streaming.dynamicAllocation.
  • No implementation is designed specifically for Spark Structured Streaming. The Spark feature request is: SPARK-24815.

While spark.dynamicAllocation.enabled can be used with Spark Structured Streaming, it's not designed for streaming job patterns and works poorly for certain applications. spark.dynamicAllocation.enabled monitors the job queue and makes scaling decisions based on the queue, but doesn't consider the nature of the streaming, for example, trigger cadence, task cadence, and average task execution time. You might have excessive arrhythmic allocation or de-allocation. By adjusting the values of spark.dynamicAllocation timeouts, you can tune it for streaming applications.

More information about spark.dynamicAllocation implementation can be found in the source code comments.