5 Executing Oozie Workflows

This chapter provides information about how to set up the Oozie Engine and explains how to execute Oozie Workflows using Oracle Data Integrator. It also tells you how to audit Hadoop logs.

This chapter includes the following sections:

5.1 Executing Oozie Workflows with Oracle Data Integrator

The following table summarizes the steps you need to perform to execute Oozie Workflows with Oracle Data Integrator.


Table 5-1 Executing Oozie Workflows

Step Description

Set up the Oozie runtime engine

Set up the Oozie runtime engine to configure the connection to the Hadoop data server where the Oozie engine is installed. This Oozie runtime engine is used to execute ODI Design Objects or Scenarios on the Oozie engine as Oozie workflows.

See Setting Up and Initializing the Oozie Runtime Engine.

Execute or deploy an Oozie workflow

Run the ODI Design Objects or Scenarios using the Oozie runtime engine created in the previous step to execute or deploy an Oozie workflow.

See Executing or Deploying an Oozie Workflow.

Audit Hadoop Logs

Audit the Hadoop Logs to monitor the execution of the Oozie workflows from within Oracle Data Integrator.

See Auditing Hadoop Logs.


5.2 Setting Up and Initializing the Oozie Runtime Engine

Before you set up the Oozie runtime engine, ensure that the Hadoop data server where the Oozie engine is deployed is available in the topology. The Oozie engine needs to be associated to this Hadoop data server.

To set up the Oozie runtime engine:

  1. In the Topology Navigator, right-click the Oozie Runtime Engine node in the Physical Architecture navigation tree and click New.
  2. In the Definition tab, specify the values in the fields for the defining the Oozie runtime engine.

    See Oozie Runtime Engine Definition for the description of the fields.

  3. In the Properties tab, specify the properties for the Oozie Runtime Engine.

    See Oozie Runtime Engine Properties for the description of the properties.

  4. Click Test to test the connections and configurations of the actual Oozie server and the associated Hadoop data server.
  5. Click Initialize to initialize the Oozie runtime engine.

    Initializing the Oozie runtime engine deploys the log retrieval workflows and coordinator workflows to the HDFS file system and starts the log retrieval coordinator and workflow jobs on the actual Oozie server. The log retrieval flow and coordinator for a repository and oozie engine will have the names OdiRetrieveLog_<EngineName>_<ReposId>_F and OdiLogRetriever_<EngineName>_<ReposId>_C respectively.

    It also deploys the ODI libraries and classes.

  6. Click Save.

5.2.1 Oozie Runtime Engine Definition

The following table describes the fields that you need to specify on the Definition tab when defining a new Oozie runtime engine. An Oozie runtime engine models an actual Oozie server in a Hadoop environment.


Table 5-2 Oozie Runtime Engine Definition

Field Values

Name

Name of the Oozie runtime engine that appears in Oracle Data Integrator.

Host

Name or IP address of the machine on which the Oozie runtime agent has been launched.

Port

Listening port used by the Oozie runtime engine. Default Oozie port value is 11000.

Web application context

Name of the web application context. Type oozie as the value of this field, as required by the Oozie service process running in an Hadoop environment.

Protocol

Protocol used for the connection. Possible values are http or https. Default is http.

Hadoop Server

Name of the Hadoop server where the oozie engine is installed. This Hadoop server is associated with the oozie runtime engine.

Poll Frequency

Frequency at which the Hadoop audit logs are retrieved and stored in ODI repository as session logs.

The poll frequency can be specified in seconds (s), minutes (m), hours (h), days (d), and years (d). For example, 5m or 4h.

Lifespan

Time period for which the Hadoop audit logs retrieval coordinator stays enabled to schedule audit logs retrieval workflows.

Lifespan can be specified in minutes (m), hours (h), days (d), and years (d). For example, 4h or 2d.

Schedule Frequency

Frequency at which the Hadoop audit logs retrieval workflow is scheduled as an Oozie Coordinator Job.

Schedule workflow can be specified in minutes (m), hours (h), days (d), and years (d). For example, 20m or 5h.


5.2.2 Oozie Runtime Engine Properties

The following table describes the properties that you can configure on the Properties tab when defining a new Oozie runtime engine.


Table 5-3 Oozie Runtime Engine Properties

Field Values

OOZIE_WF_GEN_MAX_DETAIL

Limits the maximum detail (session level or fine-grained task level) allowed when generating ODI Oozie workflows for an Oozie engine.

Set the value of this property to TASK to generate an Oozie action for every ODI task or to SESSION to generate an Oozie action for the entire session.


5.3 Creating a Logical Oozie Engine

To create a logical oozie agent:

  1. In Topology Navigator right-click the Oozie Runtime Engine node in the Logical Architecture navigation tree.
  2. Select New Logical Agent.
  3. Fill in the Agent Name.
  4. For each Context in the left column, select an existing Physical Agent in the right column. This Physical Agent is automatically associated to the logical agent in this context.
  5. From the File menu, click Save.

5.4 Executing or Deploying an Oozie Workflow

You can run an ODI design object or scenario using the Oozie runtime engine to execute an Oozie Workflow on the Oozie engine. When running the ODI design object or scenario, you can choose to only deploy the Oozie workflow without executing it.

To deploy or execute an ODI Oozie workflow:

  1. From the Projects menu of the Designer navigator, right-click the mapping that you want to execute as an Oozie workflow and click Run.
  2. From the Run Using drop-down list, select the Oozie runtime engine.
  3. Select Deploy Only check box to only deploy the Oozie workflow without executing it.
  4. Click OK.

    The Information dialog appears.

  5. Check if the session started and click OK on the Information dialog.

5.5 Auditing Hadoop Logs

When the ODI Oozie workflows are executed, log information is retrieved and captured according to the frequency properties on the Oozie runtime engine. This information relates to the state, progress, and performance of the Oozie job.

You can retrieve the log data of an active Oozie session by clicking the Retrieve Log Data in the Operator menu. Also, you can view information regarding the oozie session in the oozie webconsole or the MapReduce webconsole by clicking the URL available in the Definition tab of the Session Editor.

The Details tab in the Session Editor, Session Step Editor, and Session Task Editor provides a summary of the oozie and MapReduce job.

Executing Oozie Workflows with Oracle Data Integrator

5.6 Userlib jars support for running ODI Oozie workflows

Support of userlib jars for ODI Oozie workflows allows a user to copy jar files into a userlib HDFS directory, which is referenced by ODI Oozie workflows that are generated and submitted with the oozie.libpath property.

This avoids replicating the libs/jars in each of the workflow app's lib HDFS directory. The userlib directory is located in HDFS in the following location:

<ODI HDFS Root>/odi_<version>/userlib

Executing Oozie Workflows with Oracle Data Integrator