YARN setting changes

To ensure that each node in your YARN cluster has access to sufficient resources during processing, you need to update the following YARN-specific Hadoop properties.

You can access these properties from your Hadoop cluster manager (Cloudera Manager, Ambari, or MCS). If you need help locating any of them, refer to your distribution's documentation.

Property Description
yarn.nodemanager.resource.memory-mb The total amount of memory that YARN can use on a given node. This should be at least 16GB, although you might need to set it higher depending on the amount of data you plan on processing.
yarn.scheduler.maximum-allocation-vcores The maximum number of virtual CPU cores allocated to each YARN container per request.

If your Hadoop cluster contains only one YARN worker node, this should be less than or equal to half of that node's cores. If it contains multiple YARN worker nodes, this should be less than or equal to each node's total number of cores.

yarn.scheduler.maximum-allocation-mb The maximum amount of RAM allocated to each YARN container per request. This should be at least 16GB. Additionally:
  • If your Hadoop cluster contains only one YARN node, this should be less than or equal to half of that node's RAM.
  • If your Hadoop cluster contains multiple YARN nodes, this should be less than or equal to each node's total amount of RAM.
yarn.scheduler.capacity.maximum-applications The maximum number of concurrently-running jobs allowed on each node. This can be between 2 and 8.

Note that setting this value higher could cause jobs submitted at the same time to hang indefinitely.