9.5 Loading Data from HDFS File to Spark

Provides the steps to load data from HDFS file to Spark.

  1. Create a Data Model for complex file.
  2. Create a HIVE table Data Store.
  3. In the Storage panel, set the Storage Format.
  4. Create a mapping with HDFS file as source and target.
  5. Use the LKM HDFS to Spark or LKM Spark to HDFS specified in the physical diagram of the mapping.

    Note:

    For AVRO format, you can specify the schema file location. Refer to Reverse Engineering Hive Tables for information on Reverse Engineering. There are two ways of loading Avro file to Spark either with AVSC file or without AVSC file.