Executing H2H on Spark

The configurations required for executing H2H on Spark are:

Register a cluster from DMT Configurations > Register Cluster with the following details:
- Name - Enter the name of the target information domain of the H2H mapping.
- Description - Enter a description for the cluster.
- Livy Service URL - Enter the Livy Service URL used to connect to Spark from OFSAA.
To execute H2H on Spark, set the EXECUTION_ENGINE_MODE parameter as SPARK from ICC or RRF.
- Execution through Operations module- Pass [EXECUTION_ENGINE_MODE]=SPARK while defining the H2H tasks from the Task Definition window.
  For more information, see Component: LOAD DATA section.
- Execution through RRF module- Pass the following as a parameter while defining H2H as jobs from the Component Selector window:
  “EXECUTION_ENGINE_MODE”,”SPARK”
Spark Session Management- In a batch execution, a new Spark session is created when the first H2H-spark task is encountered and the same spark session is reused for the rest of the H2H-spark tasks in the same Run. For the spark session to close at the end of the run, set the CLOSE_SPARK_SESSION to YES in the last H2H-spark task in the batch.
- Execution through Operations module - Pass [CLOSE_SPARK_SESSION]=YES while defining the last H2H-Spark task from the Task Definition window.
  For more information, see Component: LOAD DATA section.
- Execution through RRF module- Pass the following as a parameter while defining the last H2H-spark job from the Component Selector window:
  “CLOSE_SPARK_SESSION”,”YES”
Note:
1. Ensure that the task with “CLOSE_SPARK_SESSION”,”YES” has less precedence set from all the rest of the H2H-spark tasks.
2. By default, the created spark session will be closed when any of the H2H-spark tasks fail.
3. Execution of H2H with a large number of mappings may fail because Spark restricts the length of the SQL code in the spark.sql file to a maximum of 65535 (2^16 - 1).
4. When you run an H2H Load with Hive and Apache Spark, it fails with the following:
  error: Error executing statement : java.lang.RuntimeException: Cannot create staging directory 'hdfs://<HOST_NAME>/user/hive/warehouse/hivedatadom.db/dim_account/.hive-staging_hive_2020-07-06_22-44-57_448_3115454008595470139-1': Permission denied: user=<USER_NAME>, access=WRITE, inode="/user/hive/warehouse/hivedatadom.db/dim_account":hive:hive:drwxrwxr-x
  
  Provide the required permissions to the logged-in user in the Hive Database Storage, which enables the user to access and perform tasks in the storage.