YARN setting changes

To ensure that each YARN worker node has access to sufficient resources during processing, you need to update the following YARN-specific Hadoop properties.

You can access these properties in your cluster manager (Cloudera Manager/Ambari). If you need help locating any of them, refer to your Hadoop distribution's documentation.

Property Description
yarn.nodemanager.resource.memory-mb The total amount of memory available to your entire YARN cluster. This should be at least 16GB, although you might need to set it higher depending on the amount of data you plan on processing.
yarn.scheduler.maximum-allocation-vcores The maximum number of virtual CPU cores allocated to each YARN container per request.

If your cluster contains only one YARN worker node, this should be less than or equal to half of that node's cores. If your cluster contains multiple YARN worker nodes, this should be less than or equal to each node's total number of cores.

yarn.scheduler.maximum-allocation-mb The maximum amount of RAM allocated to each YARN container per request.

If your cluster contains only one YARN worker node, this should be less than or equal to half of that node's RAM. If your cluster contains multiple YARN worker nodes, this should be less than or equal to each node's total amount of RAM.

yarn.scheduler.capacity.maximum-applications The maximum number of concurrently-running jobs allowed on each node. This can be between 2 and 8.

Note that setting this value higher could cause jobs submitted at the same time to hang indefinitely.