This topic provides an overview of the Data Processing logging
files.
Location of the log files
Each run of Data Processing produces a new log file into the OS
temp directory of each machine that is involved in the
Data Processing job. The Data Processing log files are located on each node
that has been involved in a Data Processing job. These include:
- The client that started
the job (which could be nodes running the DP CLI or nodes running Studio)
- An Oozie (YARN) worker
node
- Spark worker nodes
The logging location on each node is defined by the
edpJarDir property in the
data_processing-CLI file. By default, this is the
/opt/bdd/edp/data directory.
Log files
The Data Processing log files are named
edpLog*.log. The naming pattern is set in the
logging.properties configuration. The default
pattern is
edpLog%u%g.log, where
%u is a unique number to resolve conflicts between
simultaneous Java processes and
%g is the generation number to distinguish between
rotating logs. The generation number is rotated, thus the latest run of Data
Processing will be generation number 0. The configuration defaults produce
10,000 log files with a maximum file size of 1MB. Logs larger than 1MB roll
over to the next log file.
A sample error log message is:
[2015/01/15 14:14:15] INFO: Starting Data Processing on Hive Table: default.claims
[2015/01/15 14:14:15] SEVERE: Error runnning EDP
java.lang.Exception Example Error Log Message
at com.oracle.endeca.pdi.EdpMain.main(EdpMain.java:38)
...
Finding the Data Processing logs
When a client launches a Data Processing workflow, an Oozie job is
created to run the actual Data Processing job. This job is run by an arbitrary
node in the CDH cluster (node is chosen by YARN). To find the Data Processing
logs, you should track down this specific cluster node using the Oozie Job ID.
The Oozie Job ID is printed out to the console when the DP CLI runs, or you can
find it in the Studio logs.
To find the Data Processing logs:
- Go to the Oozie Web UI and
find the corresponding job using the Oozie Job ID.
- Click on the job to bring
up detailed Oozie information.
- Under the
Actions pane, click the
DataProcessingJavaTask action.
- In the
Action Info tab of the
Action pane, find the
External ID. The external ID matches a YARN
Job ID.
- Go to the
YARN HistoryServer Web UI and find the
corresponding job using the Oozie External ID. To do so:
- Browse the Cloudera
Manager and click the YARN service in the left pane.
- In the
Quick Links section in the top left, click
HistoryServer Web UI.
- Click the job to bring up
detailed MapReduce information. The Node property indicates which machine ran
the Data Processing job.
- Log into the machine and
go to the Data Processing directory on the cluster. By default, this is the
/opt/bdd/edp/data directory. All the logs for
Data Processing should reside in this directory.
- To find a specific log,
you may need to use
grep (or other similar tool) for the corresponding
workflow information.