T
-
public class SpatialJavaRDD<T>
extends org.apache.spark.api.java.JavaRDD<T>
This class represents a spatially enabled RDD. A SpatialJavaRDD encapsulates an existing RDD and adds spatial transformations and functions.
Spatial information is extracted from the source RDD records using an implementation of SparkRecordInfoProvider
provided by the user. The SparkRecordInfoProvider is expected to return the geometry of spatial records.
The following example shows how to create a SpatialJavaRDD from a text RDD, using a SparkRecordInfo which extracts the spatial information from text records
JavaRDD<String> rdd = sc.textFile("someFile.txt");
SparkRecordInfoProvider recordInfoProvider = new MySparkRecordInfoProvider();
SpatialJavaRDD<String> spatialRDD = SpatialJavaRDD.fromJavaRDD(rdd, recordInfoProvider, String.class);
The previous example creates a SpatialJavaRDD whose records are of type String as the source RDD. Every time a spatial operation is performed, the provided SparkRecordInfoProvider instance will be used to extract the spatial information. For some data set formats, like JSON, extracting spatial information may require parsing each record every time the spatial information is needed for a spatial operation and it may be executed several times if multiple spatial transformations are applied.
A way to ensure records will be parsed only once by the SparkRecordInfoProvider is to create a SpatialJavaRDD which records are of type SparkRecordInfo
. A SparkRecordInfo holds spatial information and any desired additional data from the source RDD records. The SpatialJavaRDD static method fromJavaRDD(JavaRDD, SparkRecordInfoProvider)
can be used to create JavaSpatialRDD of type SparkRecordInfo.
Spatial versions of existing transformations are provided. Spatial transformations take the following parameters:
SpatialOperationConfig
which contains the information of the spatial operation to be performed to filter records.Modifier and Type | Method and Description |
---|---|
long |
count() |
org.apache.spark.sql.DataFrame |
createSpatialDataFrame(org.apache.spark.sql.SQLContext sqlContext, java.util.List<java.lang.String> fieldsList)
Creates data frame based on the SpatialJavaRDD
|
SpatialTransformationContext<T> |
createSpatialTransformationContext()
Creates an instance of
SpatialTransformationContext |
SpatialTransformationContext<T> |
createSpatialTransformationContext(SpatialOperationConfig spatialOperationConf)
Creates an instance of
SpatialTransformationContext associated to the given SpatialOperationConfig . |
SpatialJavaRDD<T> |
filter(org.apache.spark.api.java.function.Function<T,java.lang.Boolean> f, SpatialOperationConfig spatialOpConf)
Returns a new spatial RDD containing only the elements that satisfy both, the filtering function and the spatial operation
|
<R> org.apache.spark.api.java.JavaRDD<R> |
flatMap(org.apache.spark.api.java.function.Function<T,java.lang.Iterable<R>> m, SpatialOperationConfig spatialOpConf)
Returns a new RDD by first spatially filtering the RDD elements using the spatial operation given by spatialOpConf, then a function is applied to all the remaining elements.
|
static <T> SpatialJavaRDD<SparkRecordInfo> |
fromJavaRDD(org.apache.spark.api.java.JavaRDD<T> rdd, SparkRecordInfoProvider<T> recordInfoProvider)
Creates a spatial RDD of type SparkRecordInfo from the given rdd.
|
static <T> SpatialJavaRDD<T> |
fromJavaRDD(org.apache.spark.api.java.JavaRDD<T> rdd, SparkRecordInfoProvider<T> recordInfoProvider, java.lang.Class<T> recordType)
Creates a spatial RDD from the given java rdd.
|
static <T> SpatialJavaRDD<T> |
fromRDD(org.apache.spark.rdd.RDD<T> rdd, SparkRecordInfoProvider<T> recordInfoProvider)
Creates a spatial RDD from the given rdd.
|
double[] |
getMBR()
Gets the minimum bounding rectangle of the RDD
|
SparkRecordInfoProvider<T> |
getRecordInfoProvider()
Gets the RDD's
SparkRecordInfoProvider instance |
java.lang.Class<T> |
getType()
Gets the type of the records in the RDD
|
java.util.List<scala.Tuple2<java.lang.Double,T>> |
nearestNeighbors(oracle.spatial.geometry.JGeometry qryWindow, int k, double tol)
Returns the k elements which are closest to the given query window
|
aggregate, cache, cartesian, checkpoint, classTag, coalesce, coalesce, collect, collectAsync, collectPartitions, context, countApprox, countApprox, countApproxDistinct, countAsync, countByValue, countByValueApprox, countByValueApprox, distinct, distinct, filter, first, flatMap, flatMapToDouble, flatMapToPair, fold, foreach, foreachAsync, foreachPartition, foreachPartitionAsync, fromRDD, getCheckpointFile, getStorageLevel, glom, groupBy, groupBy, id, intersection, isCheckpointed, isEmpty, iterator, keyBy, map, mapPartitions, mapPartitions, mapPartitionsToDouble, mapPartitionsToDouble, mapPartitionsToPair, mapPartitionsToPair, mapPartitionsWithIndex, mapPartitionsWithIndex$default$2, mapToDouble, mapToPair, max, min, name, partitions, persist, pipe, pipe, pipe, randomSplit, randomSplit, rdd, reduce, repartition, sample, sample, saveAsObjectFile, saveAsTextFile, saveAsTextFile, setName, sortBy, splits, subtract, subtract, subtract, take, takeAsync, takeOrdered, takeOrdered, takeSample, takeSample, toArray, toDebugString, toLocalIterator, top, top, toRDD, toString, treeAggregate, treeAggregate, treeReduce, treeReduce, union, unpersist, unpersist, wrapRDD, zip, zipPartitions, zipWithIndex, zipWithUniqueId
public static <T> SpatialJavaRDD<T> fromJavaRDD(org.apache.spark.api.java.JavaRDD<T> rdd, SparkRecordInfoProvider<T> recordInfoProvider, java.lang.Class<T> recordType)
rdd
- an existing JavaRDDrecordInfoProvider
- an implementation of SparkRecordInfoProvider
recordType
- the type of the source rdd recordspublic static <T> SpatialJavaRDD<T> fromRDD(org.apache.spark.rdd.RDD<T> rdd, SparkRecordInfoProvider<T> recordInfoProvider)
rdd
- an existing RDDrecordInfoProvider
- an implementation of SparkRecordInfoProvider
public static <T> SpatialJavaRDD<SparkRecordInfo> fromJavaRDD(org.apache.spark.api.java.JavaRDD<T> rdd, SparkRecordInfoProvider<T> recordInfoProvider)
rdd
- an existing RDDrecordInfoProvider
- an implementation of SparkRecordInfoProvider
public SparkRecordInfoProvider<T> getRecordInfoProvider()
SparkRecordInfoProvider
instanceSparkRecordInfoProvider
instancepublic java.lang.Class<T> getType()
public SpatialTransformationContext<T> createSpatialTransformationContext(SpatialOperationConfig spatialOperationConf)
SpatialTransformationContext
associated to the given SpatialOperationConfig
.spatialOperationConf
- a spatial operation used to filter recordsSpatialTransformationContext
public SpatialTransformationContext<T> createSpatialTransformationContext()
SpatialTransformationContext
SpatialTransformationContext
public SpatialJavaRDD<T> filter(org.apache.spark.api.java.function.Function<T,java.lang.Boolean> f, SpatialOperationConfig spatialOpConf)
f
- a filtering functionspatialOpConf
- a spatial operation used to filter recordsSpatialJavaRDD
public <R> org.apache.spark.api.java.JavaRDD<R> flatMap(org.apache.spark.api.java.function.Function<T,java.lang.Iterable<R>> m, SpatialOperationConfig spatialOpConf)
m
- a function to apply to each elementspatialOpConf
- a spatial operation used to filter recordspublic java.util.List<scala.Tuple2<java.lang.Double,T>> nearestNeighbors(oracle.spatial.geometry.JGeometry qryWindow, int k, double tol)
qryWindow
- a geometry from where the nearest neighbors will be calculatedk
- the number of neighborstol
- the tolerance usedpublic double[] getMBR()
public long count()
public org.apache.spark.sql.DataFrame createSpatialDataFrame(org.apache.spark.sql.SQLContext sqlContext, java.util.List<java.lang.String> fieldsList)
sqlContext
- the sqlContextfieldsList
- the extra fields to include. The geometry will always be the first field.Copyright © 2016 Oracle and/or its affiliates. All Rights Reserved.