DP logging overview

This topic provides an overview of the Data Processing logging files.

Location of the log files

Each run of Data Processing produces one or more log files on each machine that is involved in the Data Processing job. The log files are in these locations:
  • On the client machine, the location of the log files is set by the log4j.appender.edpMain.Path property in the DP log4j.properties configuration file. The default location is the $BDD_HOME/logs/edp directory. These log files apply to workflows initiated by both Studio and the DP CLI. When the DP component starts, it also writes a start-up log here.
  • On the client machine, Studio workflows are also logged in the $BDD_DOMAIN/servers/<serverName>/logs/bdd-studio.log file.
  • On the Hadoop nodes, logs are generated by the Spark-on-YARN processes.

Local log files

The Data Processing log files (in the $BDD_HOME/logs/edp directory) are named edpLog*.log. The naming pattern is set in the logging.properties configuration.

The default naming pattern for each log file is
edp_%timestamp_%unique.log
where:
  • %timestamp provides a timestamp in the format: yyyyMMddHHmmssSSS
  • %unique provides a uniquified string
For example:
edp_20150728100110505_0bb9c1a2-ce73-4909-9de0-a10ec83bfd8b.log

The log4j.appender.edpMain.MaxSegmentSize property sets the maximum size of a log file, which is 100MB by default. Logs that reach the maximum size roll over to the next log file. The maximum amount of disk space used by the main log file and the logging rollover files is about 1GB by default.

YARN logs

When a client (Studio or the DP CLI) launches a Data Processing workflow, a Spark job is created to run the actual Data Processing job. This job is run by an arbitrary node in the CDH or HDP cluster (node is chosen by YARN). To find the Data Processing logs, you can use Cloudera Manager (for CDH jobs) or Ambari (for HDP jobs).

To access the YARN logs:
  1. Use the appropriate Web UI:
    • From the Cloudera Manager home page, click YARN (MR2 Included).
    • From the Ambari home page, click YARN.
  2. In the YARN menu, click the ResourceManager Web UI quick link.
  3. The All Applications page lists the status of all submitted jobs. Click on the ID field to list job information. Note that failed jobs will exceptions in the Diagnostics field.
  4. To show log information, click on the appropriate log in the Logs field at the bottom of the Applications page.

Note that if a workflow invoked the Data Enrichment modules, the YARN logs will contain the results of the enrichments, such as which columns were created. For more information, see About the Data Enrichment modules.