Accessing YARN logs

When a client (Studio or the DP CLI) launches a Data Processing workflow, a Spark job is created to run the actual Data Processing job.

This Spark job is run by an arbitrary node in the Hadoop cluster (node is chosen by YARN). To find the Data Processing logs, use Cloudera Manager (for CDH jobs), Ambari (for HDP jobs), or MapR Control System (for MapR jobs).

To access the YARN logs on CDH or HDP:

Use the appropriate Web UI:
- From the Cloudera Manager home page, click YARN (MR2 Included).
- From the Ambari home page, click YARN.
In the YARN menu, click the ResourceManager Web UI quick link.
The All Applications page lists the status of all submitted jobs. Click on the ID field to list job information. Note that failed jobs will list exceptions in the Diagnostics field.
To show log information, click on the appropriate log in the Logs field at the bottom of the Applications page.

To view logs for completed MapR applications (assuming you have enabled the YARN Log Aggregation option):

Log on to the MapR Control System.
In the Navigation Pane, click JobHistoryServer.
Click the Job ID link for the job that you want to view the logs.
In the Logs column of the Application Master section, click the logs link.

The Data Processing log also contains the locations of the Spark worker STDOUT and STDERR logs. These locations are listed in the "YARN executor launch context" section of the log. Search for the "SPARK_LOG_URL_STDOUT" and "SPARK_LOG_URL_STDERR" strings, each of which will have a URL associated with it. The URLs are for the worker logs.

Also note that if a workflow invoked the Data Enrichment modules, the YARN logs will contain the results of the enrichments, such as which columns were created.