Accessing YARN logs

When a client (Studio or the DP CLI) launches a Data Processing workflow, a Spark job is created to run the actual Data Processing job.

This Spark job is run by an arbitrary node in the Hadoop cluster (node is chosen by YARN). To find the Data Processing logs, use Cloudera Manager.

To access YARN logs:

  1. From the Cloudera Manager home page, click YARN (MR2 Included).
  2. In the YARN menu, click the ResourceManager Web UI quick link.
  3. The All Applications page lists the status of all submitted jobs. Click on the ID field to list job information.

    Note that failed jobs will list exceptions in the Diagnostics field.

  4. To show log information, click on the appropriate log in the Logs field at the bottom of the Applications page.

The Data Processing log also contains the locations of the Spark worker STDOUT and STDERR logs. These locations are listed in the "YARN executor launch context" section of the log. Search for the "SPARK_LOG_URL_STDOUT" and "SPARK_LOG_URL_STDERR" strings, each of which will have a URL associated with it. The URLs are for the worker logs.

Also note that if a workflow invoked the Data Enrichment modules, the YARN logs will contain the results of the enrichments, such as which columns were created.