Spatial Cross-Regressive Model
The Spatial Cross-Regressive (SLX) regression model executes a regular liner regression involving a feature engineering step to add features that provide a spatial context to the data.
This is according to Tobler's law that closer things are more related than distant things. The algorithm adds one or more columns with the spatial lag of certain features, representing the average from neighboring observations.
The SLXRegressor
class requires the definition of the
spatial weights with the spatial_weights_definition
parameter to
establish how the neighboring observations interact. The following table describes the
main methods of the SLXRegressor
class.
Method | Description |
---|---|
fit |
The parameters of the fit method are the
same for most of the regression algorithms, except for the
column_ids parameter, which specify the columns
that are used to compute the spatial lag.
The algorithm estimates the parameters of the explanatory variables plus the parameters associated with those added with the spatial lag. |
predict |
The predict method calculates the
spatial lag of the dataset using the same columns defined in the
fit process and returns the value of the OLS
equation evaluated in the extended dataset.
By setting the
|
fit_predict |
Calls the fit and
predict methods sequentially with the training
data.
|
score |
Returns the R-squared statistic for the given data.
By
setting the |
See the SLXRegressor class in Python API Reference for Oracle Spatial AI for more information.
The following example uses the block_groups
and the SpatialDataFrame
SLXRegressor
class to train an SLX regression model with training data (X_train
)
using the MEDIAN_INCOME
column as the target variable. The
MEAN_AGE
, MEAN_EDUCATION_LEVEL
, and
HOUSE_VALUE
columns are used to calculate the spatial lag.
Using the test set (X_test
), the code calls the
predict
and score
methods to estimate the values
of the target variable and the R-squared metric respectively.
from oraclesai.preprocessing import spatial_train_test_split
from oraclesai.weights import KNNWeightsDefinition
from oraclesai.regression import SLXRegressor
from oraclesai.pipeline import SpatialPipeline
from sklearn.preprocessing import StandardScaler
# Define the explanatory variables
X = block_groups[['MEDIAN_INCOME', 'MEAN_AGE', 'MEAN_EDUCATION_LEVEL', 'HOUSE_VALUE', 'INTERNET', 'geometry']]
# Define the training and test sets
X_train, X_test, _, _, _, _ = spatial_train_test_split(X, y="MEDIAN_INCOME", test_size=0.2, random_state=32)
# Define the spatial weights
weights_definition = KNNWeightsDefinition(k=10)
# Create a SXL Regressor model
slx_model = SLXRegressor(spatial_weights_definition=weights_definition)
# Add the model to a pipeline along with a preprocessing step
slx_pipeline = SpatialPipeline([('scale', StandardScaler()), ('slx_regression', slx_model)])
# Train the model
slx_pipeline.fit(X_train, "MEDIAN_INCOME", slx_regression__column_ids=["MEAN_AGE", "MEAN_EDUCATION_LEVEL", "HOUSE_VALUE"])
# Print the predictions with the test set
slx_predictions_test = slx_pipeline.predict(X_test.drop(["MEDIAN_INCOME"])).flatten()
print(f"\n>> predictions (X_test):\n {slx_predictions_test[:10]}")
# Print the score with the test set
slx_r2_score = slx_pipeline.score(X_test, y="MEDIAN_INCOME")
print(f"\n>> r2_score (X_test):\n {slx_r2_score}")
The program produces the following output:
>> predictions (X_test):
[102070.14467552 103393.34495125 18080.13247972 28780.88885959
166553.11466239 47847.19216301 97311.05264284 28621.06664768
86030.99787827 18315.17778001]
>> r2_score (X_test):
0.6520502048458249
Note that printing the property summary of the trained model displays new parameters which are associated with the spatial lag of the columns specified in the training process.
REGRESSION
----------
SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES
-----------------------------------------
Data set : unknown
Weights matrix : unknown
Dependent Variable : dep_var Number of Observations: 2750
Mean dependent var : 69703.4815 Number of Variables : 8
S.D. dependent var : 39838.5789 Degrees of Freedom : 2742
R-squared : 0.6404
Adjusted R-squared : 0.6395
Sum squared residual:1569034862694.453 F-statistic : 697.5148
Sigma-square :572222779.976 Prob(F-statistic) : 0
S.E. of regression : 23921.178 Log likelihood : -31625.004
Sigma-square ML :570558131.889 Akaike info criterion : 63266.007
S.E of regression ML: 23886.3587 Schwarz criterion : 63313.362
------------------------------------------------------------------------------------
Variable Coefficient Std.Error t-Statistic Probability
------------------------------------------------------------------------------------
CONSTANT 69454.0719691 458.7116277 151.4111868 0.0000000
MEAN_AGE 3407.3842392 632.8239483 5.3844110 0.0000001
MEAN_EDUCATION_LEVEL 11619.0976034 1254.9099676 9.2589093 0.0000000
HOUSE_VALUE 20550.0723247 970.0583796 21.1843666 0.0000000
INTERNET 10089.1251192 670.1690078 15.0545982 0.0000000
SLX-MEAN_AGE 106.5803082 136.9729582 0.7781120 0.4365701
SLX-MEAN_EDUCATION_LEVEL -995.5040769 172.6431756 -5.7662521 0.0000000
SLX-HOUSE_VALUE 3.1809763 136.4013684 0.0233207 0.9813962
------------------------------------------------------------------------------------
REGRESSION DIAGNOSTICS
MULTICOLLINEARITY CONDITION NUMBER 9.435
TEST ON NORMALITY OF ERRORS
TEST DF VALUE PROB
Jarque-Bera 2 1258.500 0.0000
DIAGNOSTICS FOR HETEROSKEDASTICITY
RANDOM COEFFICIENTS
TEST DF VALUE PROB
Breusch-Pagan test 7 1083.843 0.0000
Koenker-Bassett test 7 436.439 0.0000
DIAGNOSTICS FOR SPATIAL DEPENDENCE
TEST MI/DF VALUE PROB
Moran's I (error) 0.2586 31.945 0.0000
Lagrange Multiplier (lag) 1 1044.952 0.0000
Robust LM (lag) 1 55.266 0.0000
Lagrange Multiplier (error) 1 997.181 0.0000
Robust LM (error) 1 7.495 0.0062
Lagrange Multiplier (SARMA) 2 1052.447 0.0000
================================ END OF REPORT =====================================