5.4.4 Reading the Job Analyzer Report

Job Analyzer writes its report in two formats: HTML for you, and XML for Perfect Balance. You can open the report in a browser, either directly in HDFS or after copying it to the local file system

To open a Job Analyzer report in HDFS in a browser:

  1. Open the HDFS web interface on port 50070 of a NameNode node (node01 or node02), using a URL like the following:

    http://bda1node01.example.com:50070
    
  2. From the Utilities menu, choose Browse the File System.

  3. Navigate to the job_output_dir/_balancer directory.

To open a Job Analyzer report in the local file system in a browser:

  1. Copy the report from HDFS to the local file system:

    $ hadoop fs -get job_output_dir/_balancer/jobanalyzer-report.html /home/jdoe
    
  2. Switch to the local directory:

    $ cd /home/jdoe
    
  3. Open the file in a browser:

    $ firefox jobanalyzer-report.html
    

When inspecting the Job Analyzer report, look for indicators of skew such as:

  • The execution time of some reducers is longer than others.

  • Some reducers process more records or bytes than others.

  • Some map output keys have more records than others.

  • Some map output records have more bytes than others.

The following figure shows the beginning of the analyzer report for the inverted index (invindx) example. It displays the key load coefficient recommendations, because this job ran with the appropriate configuration settings. See "Collecting Additional Metrics."

The task IDs are links to tables that show the analysis of specific tasks, enabling you to drill down for more details from the first, summary table.

This example uses an extremely small data set, but notice the differences between tasks 7 and 8: The input records range from 3% to 29%, and their corresponding elapsed times range from 5 to 15 seconds. This variation indicates skew.

Figure 5-1 Job Analyzer Report for Unbalanced Inverted Index Job

Description of Figure 5-1 follows
Description of "Figure 5-1 Job Analyzer Report for Unbalanced Inverted Index Job"