Spatial Pipeline

The SpatialPipeline class shares spatial information through a pipeline of transformers, other estimators, and a final estimator.

Note that the final estimator step of the pipeline is not optional in this case. A typical scenario consists of having a preprocessing pipeline in charge of different tasks, such as cleaning the data, filling missing values, and standardizing the data. Then, the preprocessing pipeline is part of another pipeline with a final estimator, either a regressor or a classifier.

The following table describes the main methods of the SpatialPipeline class.

Method	Description
`fit`	Calls the `fit` method of the pipeline transformers and the final estimator.
`fit_predict`	Calls the `fit` and `transform` methods of the pipeline transformer and the `fit` and `predict` methods of the final estimator.
`predict`	Calls the `transform` method of all the transformers in the pipeline and calls the `predict` method of the final estimator.

See the SpatialPipeline class in Python API Reference for Oracle Spatial AI for more information.

The following example uses the block_groups SpatialDataFrame and SpatialColumnTransformer to define a feature-engineering step, which creates new columns representing the spatial lag of specific columns. Then, the feature-engineering step is added into a SpatialPipeline, along with a pre-processing step that standardizes the data and a final estimator consisting of a spatial error regression model.

from oraclesai.pipeline import SpatialColumnTransformer
from oraclesai.weights import KNNWeightsDefinition
from oraclesai.preprocessing import SpatialLagTransformer
from oraclesai.regression import SpatialErrorRegressor
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
 
# Define target and explanatory variables
X = block_groups[["MEAN_AGE", "HOUSE_VALUE", "MEDIAN_INCOME", "geometry"]]
 
# Define spatial weights
weights_definition = KNNWeightsDefinition(k=10)
 
# Define a Spatial Lag Transformer
spatial_lag_transformer = SpatialLagTransformer(spatial_weights_definition=weights_definition)
 
# Create an instance of SpatialErrorRegressor
spatial_error_regressor = SpatialErrorRegressor(spatial_weights_definition=weights_definition)
 
# Use SpatialColumnTransformer to concatenate column subsets
feature_engineering_step = SpatialColumnTransformer([
    ("imputer", SimpleImputer(), ["MEAN_AGE", "HOUSE_VALUE"]),
    ("spatial_lag", spatial_lag_transformer, ["HOUSE_VALUE"])])
 
# Create a pipeline with three steps: Feature-Engineering, Scaler, Regressor
regression_pipeline = SpatialPipeline([
    ("feature_engineering", feature_engineering_step),
    ("scaler", StandardScaler()),
    ("regressor", spatial_error_regressor)
])
 
# Train the model
regression_pipeline.fit(X, y="MEDIAN_INCOME")
 
# Print the score of the training set
print(f"r2_score = {regression_pipeline.score(X, y='MEDIAN_INCOME')}")

The output consists of the R-squared metric from the final estimator. The example calls the score method to run the transform methods of all the transformers in the pipeline.

r2_score = 0.5559292598577543