Spatial Lag Transformer

The spatial lag of a particular feature reflects the average value of that feature in the neighborhood around each observation.

For example, in a given neighborhood, the spatial lag of the house price is the average house price surrounding a specific house or location. This is a feature engineering method which computes spatial lag values that can be directly used to train any machine learning models.

The SpatialLagTransformer class calculates the spatial lag of training data and changes the value of an observation to its spatial lag. In other words, it changes an observation's value to the average value of its neighbors.

To create an instance of SpatialLagTransformer, it is necessary to define the spatial_weights_definition parameter, which establishes the relationship between neighboring locations.

The main methods of the class are described in the following table.

Method Description
fit Computes the spatial lag for all the features in the training set.
transform Changes the spatial lag value depending on the use_fit_lag parameter. If use_fit_lag=True, then it calculates the spatial lag from the training set. Otherwise, it computes the spatial lag from the data passed into the transform method. The function returns a NumPy array.
fit_transform Calls the fit and transform methods in sequence with the training data.

See the SpatialLagTransformer class in Python API Reference for Oracle Spatial AI for more information.

The following example uses the block_groups SpatialDataFrame and the SpatialLagTransformer method to change the MEAN_AGE and HOUSE_VALUE features values to determine their spatial lag values. Note that the MEDIAN_INCOME feature is ignored since it is defined as the target variable. The geometry feature is used to calculate the spatial lag, but it is not part of the output from the transformer.

from oraclesai.weights import KNNWeightsDefinition
from oraclesai.preprocessing import SpatialLagTransformer
 
# Define the variables
X = block_groups[["MEDIAN_INCOME", "MEAN_AGE", "HOUSE_VALUE", "geometry"]]
 
# Print original data
print(f">> Original data:\n {X[['MEAN_AGE', 'HOUSE_VALUE']].get_values()[:5]}")
 
# Define spatial weights
weights_definition = KNNWeightsDefinition(k=5)
 
# Create an instance of SpatialLagTransformer
spatial_lag_transformer = SpatialLagTransformer(spatial_weights_definition=weights_definition)
 
# Print the transformed data
X_spatial_lag = spatial_lag_transformer.fit_transform(X, y="MEDIAN_INCOME", geometries="geometry")
print(f"\n>> Transformed data:\n {X_spatial_lag[:5, :]}")

The resulting output is a NumPy array with the spatial lag of the MEAN_AGE and HOUSE_VALUE.

>> Original data:
 [[4.75847626e+01 4.56300000e+05]
 [3.88231812e+01 8.36300000e+05]
 [4.78076096e+01 1.12630000e+06]
 [4.65636330e+01 9.60400000e+05]
 [5.11550865e+01 1.01090000e+06]]

>> Transformed data:
 [[4.03809292e+01 6.23460000e+05]
 [3.95882790e+01 8.20100000e+05]
 [4.69466225e+01 1.22280000e+06]
 [4.25439751e+01 1.04664000e+06]
 [4.43390564e+01 1.14368000e+06]]