Spatial Pipeline

The SpatialPipeline class shares spatial information through a pipeline of transformers, other estimators, and a final estimator.

Note that the final estimator step of the pipeline is not optional in this case. A typical scenario consists of having a preprocessing pipeline in charge of different tasks, such as cleaning the data, filling missing values, and standardizing the data. Then, the preprocessing pipeline is part of another pipeline with a final estimator, either a regressor or a classifier.

The following table describes the main methods of the SpatialPipeline class.

Method Description
fit Calls the fit method of the pipeline transformers and the final estimator.
fit_predict Calls the fit and transform methods of the pipeline transformer and the fit and predict methods of the final estimator.
predict Calls the transform method of all the transformers in the pipeline and calls the predict method of the final estimator.

See the SpatialPipeline class in Python API Reference for Oracle Spatial AI for more information.

The following example uses the block_groups SpatialDataFrame and SpatialColumnTransformer to define a feature-engineering step, which creates new columns representing the spatial lag of specific columns. Then, the feature-engineering step is added into a SpatialPipeline, along with a pre-processing step that standardizes the data and a final estimator consisting of a spatial error regression model.

from oraclesai.pipeline import SpatialColumnTransformer
from oraclesai.weights import KNNWeightsDefinition
from oraclesai.preprocessing import SpatialLagTransformer
from oraclesai.regression import SpatialErrorRegressor
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
 
# Define target and explanatory variables
X = block_groups[["MEAN_AGE", "HOUSE_VALUE", "MEDIAN_INCOME", "geometry"]]
 
# Define spatial weights
weights_definition = KNNWeightsDefinition(k=10)
 
# Define a Spatial Lag Transformer
spatial_lag_transformer = SpatialLagTransformer(spatial_weights_definition=weights_definition)
 
# Create an instance of SpatialErrorRegressor
spatial_error_regressor = SpatialErrorRegressor(spatial_weights_definition=weights_definition)
 
# Use SpatialColumnTransformer to concatenate column subsets
feature_engineering_step = SpatialColumnTransformer([
    ("imputer", SimpleImputer(), ["MEAN_AGE", "HOUSE_VALUE"]),
    ("spatial_lag", spatial_lag_transformer, ["HOUSE_VALUE"])])
 
# Create a pipeline with three steps: Feature-Engineering, Scaler, Regressor
regression_pipeline = SpatialPipeline([
    ("feature_engineering", feature_engineering_step),
    ("scaler", StandardScaler()),
    ("regressor", spatial_error_regressor)
])
 
# Train the model
regression_pipeline.fit(X, y="MEDIAN_INCOME")
 
# Print the score of the training set
print(f"r2_score = {regression_pipeline.score(X, y='MEDIAN_INCOME')}")

The output consists of the R-squared metric from the final estimator. The example calls the score method to run the transform methods of all the transformers in the pipeline.

r2_score = 0.5559292598577543