Spatial Error Model
The Spatial Error Model (SEM) introduces a spatial lag in the error term of the linear equation.
By adding the spatial lag of the residual, the neighbors' errors influence the observation error. This leads to an extra parameter to be associated with the spatial lag of the error term as shown in the following formula:
In the preceding formula, W is the spatial weights
matrix.
The SpatialErrorRegressor class implements the spatial
error model, which requires the definition of the
spatial_weights_definition parameter to use it. The following table
describes the main methods of the SpatialErrorRegressor class.
| Method | Description |
|---|---|
fit |
Trains the SpatialErrorRegressor model
from the given training data. The model includes a parameter for the
spatial lag of the error term.
|
predict |
Uses the trained parameters, including the one associated with the spatial lag of the error term, to estimate the target variable of the given data. |
fit_predict |
Calls the fit and
predict methods sequentially with the training
data.
|
score |
Returns the R-squared statistic for the given data. |
See the SpatialErrorRegressor class in Python API Reference for Oracle Spatial AI for more information.
The following example uses the block_groups
. It creates an instance of the
SpatialDataFrameSpatialErrorRegressor class defining the
spatial_weights_definition parameter which establishes the
relationship between neighboring observations. Then, it adds the model in a spatial
pipeline along with a preprocessing step to standardize the data. The model is trained
using a training set (X_train) and the MEDIAN_INCOME
column as the target variable. Finally, it calls the predict and
score methods with the test set (X_test) to
estimate the values of the target variable and the model's R-Square score
respectively.
from oraclesai.preprocessing import spatial_train_test_split
from oraclesai.weights import KNNWeightsDefinition
from oraclesai.regression import SpatialErrorRegressor
from oraclesai.pipeline import SpatialPipeline
from sklearn.preprocessing import StandardScaler
# Define features
X = block_groups[["MEDIAN_INCOME", "MEAN_AGE", "HOUSE_VALUE", "INTERNET", "geometry"]]
# Define training and test sets
X_train, X_test, _, _, _, _ = spatial_train_test_split(X, y="MEDIAN_INCOME", test_size=0.2, random_state=32)
# Create an instance of SpatialErrorRegressor
spatial_error_model = SpatialErrorRegressor(spatial_weights_definition=KNNWeightsDefinition(k=5))
# Add the model in a Spatial Pipeline along with a preprocessing step
spatial_error_pipeline = SpatialPipeline([("scaler", StandardScaler()), ("spatial_error", spatial_error_model)])
# Train the model with MEDIAN_INCOME as the target variable
spatial_error_pipeline.fit(X_train, "MEDIAN_INCOME")
# Print the predictions with the test set
spatial_error_predictions_test = spatial_error_pipeline.predict(X_test.drop(["MEDIAN_INCOME"])).flatten()
print(f"\n>> predictions (X_test):\n {spatial_lag_predictions_test[:10]}")
# Print the R-squared metric with the test set
spatial_error_r2_score = spatial_error_pipeline.score(X_test, y="MEDIAN_INCOME")
print(f"\n>> r2_score (X_test):\n {spatial_error_r2_score}")The program produces the following output:
>> predictions (X_test):
[ 92285.13545208 100551.0381313 30910.61123168 45166.3218764
177515.68764358 44088.89962954 98205.35728383 27788.19879028
72553.17695035 24875.81828048]
>> r2_score (X_test):
0.635646418630968Note that printing the property summary of the trained model displays an extra
lambda parameter. This parameter is associated with the spatial lag
of the error term.
REGRESSION
----------
SUMMARY OF OUTPUT: MAXIMUM LIKELIHOOD SPATIAL ERROR (METHOD = FULL)
-------------------------------------------------------------------
Data set : unknown
Weights matrix : unknown
Dependent Variable : dep_var Number of Observations: 2750
Mean dependent var : 69703.4815 Number of Variables : 4
S.D. dependent var : 39838.5789 Degrees of Freedom : 2746
Pseudo R-squared : 0.6285
Sigma-square ML :472895616.755 Log likelihood : -31440.423
S.E of regression : 21746.163 Akaike info criterion : 62888.846
Schwarz criterion : 62912.523
------------------------------------------------------------------------------------
Variable Coefficient Std.Error z-Statistic Probability
------------------------------------------------------------------------------------
CONSTANT 70397.9327157 855.6991730 82.2694878 0.0000000
MEAN_AGE 4337.6721310 537.9090592 8.0639507 0.0000000
HOUSE_VALUE 20927.8165549 706.2614165 29.6318276 0.0000000
INTERNET 10643.3244395 580.3422845 18.3397363 0.0000000
lambda 0.5152500 0.0215703 23.8869736 0.0000000
------------------------------------------------------------------------------------
================================ END OF REPORT =====================================