About Spatial Pipeline
The spatial pipeline extends the existing scikit-learn
pipeline to include spatial information such as geometry data and spatial
weights.
The SpatialPipeline
class can easily chain together both spatial and
non-spatial steps, and is composed of estimators. An estimator can be one of the
following:
- Transformer: An estimator with the
fit
andtransform
methods that are described in the following table.Method Description fit
The fit
method computes statistics and other properties from the training data.transform
The transform
method applies the values calculated in the fit method to change the data.fit_transform
Calls the fit
andtransform
methods sequentially with the training data.One typical example of a transformer is the
StandardScaler
, which standardizes the data so that each feature has zero mean and unit variance. Usually, transformers are part of the pre-processing step in a pipeline. - Classifier/Regressor: This estimator must be the last step in a
pipeline. It can be either a regression or a classification task. The methods
available in a pipeline correspond to those in the final step. In this case, it has
the
fit
,predict
, andscore
methods along with the other methods associated with the estimator. Usually, the pipeline goes through multiple transformers before reaching this estimator. - Composite Estimator: These estimators can combine multiple
estimators and can be chained with other estimators. For example, having a
pre-processing pipeline to execute multiple transformations to the data and then
making this pipeline part of another pipeline for a regression task. There are three
composite estimators:
Estimator Description SpatialPipeline
A pipeline that includes spatial information. SpatialFeatureUnion
Concatenate resulting columns (features) from different estimators to create a single input while sharing spatial information. SpatialColumnTransformer
Selects a subset of columns (features) from the input and passes these columns to an estimator while sharing spatial information.
A spatial pipeline can take the same input as a regular
scikit-learn
pipeline plus the spatial information which is
required by spatial processes (spatial transformers and spatial models or predictors).
This additional spatial information can be divided into two categories:
- Data location/geometries: The geometry associated with each
sample in the input data,
X
, is a vector of geometries. This vector can be embedded inX
ifX
is either a geopandasGeoDataFrame
or aSpatialDataFrame
. It can also be defined in the parameter geometries. - Spatial parameters: These are additional parameters used to provide context about geometries (CRS), describe/quantify spatial relationships (spatial weights definition, spatial weights objects), or help perform faster spatial searches (spatial index).
The following figure shows the data flow in a spatial pipeline.
As seen in the preceding figure, the input data comprising
X
, y
, and (optionally) spatial parameters are
received by the spatial pipeline. Note that the input X
can be split
into X'
(non-spatial data) and geometries. Then, the spatial parameters
and the geometries are extracted and passed to all the spatial steps in the
pipeline.