This appendix provides information about the Spark knowledge modules.
This chapter includes the following sections:
This KM will load data from a file into a Spark Python variable and can be defined on the AP between the execution units, source technology File, target technology Spark Python.
The following tables describes the options for LKM File to Spark.
Option | Description |
---|---|
Storage Function |
The storage function to be used to load/store data. |
CACHE_DATA |
Persist the data with the default storage level. |
InputFormatClass |
Classname of Hadoop InputFormat. For example, org.apache.hadoop.mapreduce.lib.input.TextInputFormat. |
KeyClass |
Fully qualified classname of key Writable class. For example, org.apache.hadoop.io.Text. |
ValueClass |
Fully qualified classname of value Writable class. For example, org.apache.hadoop.io.LongWritable. |
KeyConverter |
Fully qualified classname of key converter class. |
ValueConverter |
Fully qualified classname of value converter class. |
Job Configuration |
Hadoop configuration. For example, {'hbase.zookeeper.quorum': 'HOST', 'hbase.mapreduce.inputtable': 'TAB'} |
This KM will store data into a file from a Spark Python variable and can be defined on the AP between the execution units, source technology Spark Python, target technology File.
The following tables describes the options for LKM Spark to File.
Option | Description |
---|---|
Storage Function |
The storage function to be used to load/store data. |
InputFormatClass |
Classname of Hadoop InputFormat. For example, org.apache.hadoop.mapreduce.lib.input.TextInputFormat. |
KeyClass |
Fully qualified classname of key Writable class. For example, org.apache.hadoop.io.Text. |
ValueClass |
Fully qualified classname of value Writable class. For example, org.apache.hadoop.io.LongWritable. |
KeyConverter |
Fully qualified classname of key converter class. |
ValueConverter |
Fully qualified classname of value converter class. |
Job Configuration |
Hadoop configuration. For example, {'hbase.zookeeper.quorum': 'HOST', 'hbase.mapreduce.inputtable': 'TAB'} |
This KM will load data from a Hive table into a Spark Python variable and can be defined on the AP between the execution units, source technology Hive, target technology Spark Python.
This KM will store data into a Hive table from a Spark Python variable and can be defined on the AP between the execution units, source technology Spark, target technology Hive.
The following tables describes the options for LKM Spark to Hive.
Summarize rows, for example, using SUM and GROUP BY.
The following tables describes the options for XKM Spark Aggregate.
Eliminates duplicates in data.
The following tables describes the options for XKM Spark Distinct.
Define expressions to be reused across a single mapping.
Produce a subset of data by a filter condition.
The following tables describes the options for XKM Spark Filter.
Un-nest the complex data according to the given options.
The following tables describes the options for XKM Spark Flatten.
Option | Description |
---|---|
Default Expression |
Default expression for null nested table objects. For example, rating_table(obj_rating('-1', 'Unknown')). This is used to return a row with default values for each null nested table object. |
CACHE_DATA |
When set to TRUE, persist the results with Spark default storage level. Default is FALSE. |
Joins more than one input sources based on the join condition.
The following tables describes the options for XKM Spark Join.
Lookup data for a driving data source.
The following tables describes the options for XKM Spark Lookup.
Take data in separate rows, aggregates it and converts it into columns.
The following tables describes the options for XKM Spark Pivot.
Perform UNION, MINUS or other set operations.
Sort data using an expression.
The following tables describes the options for XKM Spark Sort.
Split data into multiple paths with multiple conditions.
The following tables describes the options for XKM Spark Split.
Spark table function access.
The following tables describes the options for XKM Spark Table Function.
Spark table function as target.
The following tables describes the options for IKM Spark Table Function.