About Spatial Pipeline
The spatial pipeline extends the existing scikit-learn
pipeline to include spatial information such as geometry data and spatial
weights.
The SpatialPipeline class can easily chain together both spatial and
non-spatial steps, and is composed of estimators. An estimator can be one of the
following:
- Transformer: An estimator with the
fitandtransformmethods that are described in the following table.Method Description fitThe fitmethod computes statistics and other properties from the training data.transformThe transformmethod applies the values calculated in the fit method to change the data.fit_transformCalls the fitandtransformmethods sequentially with the training data.One typical example of a transformer is the
StandardScaler, which standardizes the data so that each feature has zero mean and unit variance. Usually, transformers are part of the pre-processing step in a pipeline. - Classifier/Regressor: This estimator must be the last step in a
pipeline. It can be either a regression or a classification task. The methods
available in a pipeline correspond to those in the final step. In this case, it has
the
fit,predict, andscoremethods along with the other methods associated with the estimator. Usually, the pipeline goes through multiple transformers before reaching this estimator. - Composite Estimator: These estimators can combine multiple
estimators and can be chained with other estimators. For example, having a
pre-processing pipeline to execute multiple transformations to the data and then
making this pipeline part of another pipeline for a regression task. There are three
composite estimators:
Estimator Description SpatialPipelineA pipeline that includes spatial information. SpatialFeatureUnionConcatenate resulting columns (features) from different estimators to create a single input while sharing spatial information. SpatialColumnTransformerSelects a subset of columns (features) from the input and passes these columns to an estimator while sharing spatial information.
A spatial pipeline can take the same input as a regular
scikit-learn pipeline plus the spatial information which is
required by spatial processes (spatial transformers and spatial models or predictors).
This additional spatial information can be divided into two categories:
- Data location/geometries: The geometry associated with each
sample in the input data,
X, is a vector of geometries. This vector can be embedded inXifXis either a geopandasGeoDataFrameor aSpatialDataFrame. It can also be defined in the parameter geometries. - Spatial parameters: These are additional parameters used to provide context about geometries (CRS), describe/quantify spatial relationships (spatial weights definition, spatial weights objects), or help perform faster spatial searches (spatial index).
The following figure shows the data flow in a spatial pipeline.
As seen in the preceding figure, the input data comprising
X, y, and (optionally) spatial parameters are
received by the spatial pipeline. Note that the input X can be split
into X' (non-spatial data) and geometries. Then, the spatial parameters
and the geometries are extracted and passed to all the spatial steps in the
pipeline.
