This topic provides an overview of the Data Processing logging
files.
Location of the log files
Each run of Data Processing produces one or more log files on each
machine that is involved in the Data Processing job. The log files are in these
locations:
- On the client machine, the
location of the log files is set by the
log4j.appender.edpMain.Path property in the DP
log4j.properties configuration file. The default
location is the
$BDD_HOME/logs/edp directory. These log files
apply to workflows initiated by both Studio and the DP CLI. When the DP
component starts, it also writes a start-up log here.
- On the client machine,
Studio workflows are also logged in the
$BDD_DOMAIN/servers/<serverName>/logs/bdd-studio.log
file.
- On the Hadoop nodes, logs
are generated by the Spark-on-YARN processes.
Local log files
The Data Processing log files (in the
$BDD_HOME/logs/edp directory) are named
edpLog*.log. The naming pattern is set in the
logging.properties configuration.
The default naming pattern for each log file is
edp_%timestamp_%unique.log
where:
- %timestamp provides a
timestamp in the format: yyyyMMddHHmmssSSS
- %unique
provides a uniquified string
For example:
edp_20150728100110505_0bb9c1a2-ce73-4909-9de0-a10ec83bfd8b.log
The
log4j.appender.edpMain.MaxSegmentSize property sets
the maximum size of a log file, which is 100MB by default. Logs that reach the
maximum size roll over to the next log file. The maximum amount of disk space
used by the main log file and the logging rollover files is about 1GB by
default.
YARN logs
When a client (Studio or the DP CLI) launches a Data Processing
workflow, a Spark job is created to run the actual Data Processing job. This
job is run by an arbitrary node in the CDH or HDP cluster (node is chosen by
YARN). To find the Data Processing logs, you can use Cloudera Manager (for CDH
jobs) or Ambari (for HDP jobs).
To access the YARN logs:
- Use the appropriate Web
UI:
- From the Cloudera
Manager home page, click
YARN (MR2 Included).
- From the Ambari home
page, click
YARN.
- In the YARN menu, click
the
ResourceManager Web UI quick link.
- The All Applications page
lists the status of all submitted jobs. Click on the
ID field to list job information. Note that
failed jobs will exceptions in the
Diagnostics field.
- To show log information,
click on the appropriate log in the
Logs field at the bottom of the Applications
page.
Note that if a workflow invoked the Data Enrichment modules, the YARN
logs will contain the results of the enrichments, such as which columns were
created. For more information, see
About the Data Enrichment modules.