oraclesai.regression
- class GWRRegressor(spatial_weights_definition=None, bandwidth=None, fixed=True)
The GWR model trains a local regression model for every observation in the dataset by incorporating the target and explanatory variables from the observations within their neighborhood, allowing the relationships between the independent and dependent variables to vary by locality.
- Parameters:
spatial_weights_definition – SpatialWeightsDefinition, default=None. Specifies the spatial relationship among neighbors.
bandwidth – scalar, default=None. Bandwidth value consisting of either a distance or K nearest neighbors. If not None, ignores the
spatial_weights_definitionparameter and defines spatial weights according tofixed; iffixedis True, it usesDistanceBandWeightsDefinition, otherwise, it usesKNNWeightsDefinition.fixed – boolean, default=True. True for based distance based kernel function and False for adaptive (nearest neighbor) kernel function.
- property betas
- Returns:
A 2D-array with the estimated parameters (n x k) for the trained GWR model
- property diagnostics
- Returns:
A SpatialDiagnostics instance containing statistics of the trained model. If no model is defined, it returns None
- fit(X, y, geometries=None, crs=None)
Executes local linear regressions for every sample on the dataset, incorporating the dependent and independent variables of locations falling within a specified bandwidth. If
spatial_weights_definitionis defined, it ignores the parameterbandwidthand obtains the bandwidth from the spatial weights. Ifspatial_weights_definitionis not defined, it estimates the bandwidth using the coordinates.- Parameters:
X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables
y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame.
geometries – shapely array, default=None. Geometry data for each sample in
X.crs – pyproj.crs.CRS, default=None. Coordinate reference system. Only used when
Xis a numpy array. It is ignored when CRS information is available inX(i.e. a SpatialDataFrame or GeoDataFrame)
- Returns:
self. Fitted estimator.
- property k
- Returns:
The number of variables for which coefficients are estimated (including the constant, excluding lambda)
- property model_type
- Returns:
The regression model defined
- predict(X, geometries=None)
Evaluates the GWR model using the given data. If no model is defined returns None. It builds a local model for each row of the prediction set. Returns a 1D numpy array with the predictions of each local model.
- Parameters:
X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables
y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame.
geometries – shapely array, default=None. Geometry data for each sample in X.
crs – pyproj.crs.CRS, default=None. Coordinate reference system. Only used when X is a numpy array. It is ignored when CRS information is available in X (i.e. X is a SpatialDataFrame or GeoDataFrame)
- Returns:
The prediction of the target variable for the given data.
- property predy
- Returns:
An array with the predictions for the training data
- score(X, y, sample_weight=None, geometries=None)
Returns the value of the regression score function or R-Squared. Represents the proportion of the variation in the dependent variable that is predictable from the independent variable(s). Best possible score is 1.0, and it can be negative, since the model can be arbitrarily worse. A constant model that always predicts the expected value of y, disregarding the input, would get an R-Squared score of 0.
- Parameters:
X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables
y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame.
sample_weight – Weighted contribution to the score for each sample.
geometries – shapely array, default=None. Geometry data for each sample in
X.
- Returns:
Returns the value of the regression score function or R-Squared. Represents the proportion of the variation in the dependent variable that is predictable from the independent variable(s).
- property summary
- Returns:
The summary of the trained model
- property u
- Returns:
An array with the residuals of the trained model
- class GeographicalRegressor(global_model=None, model_cls=None, spatial_weights_definition=None, bandwidth=None, fixed=True, local_weight=0.25, **kwargs)
Geographical regression algotithm. It uses a global model and several local models to perform regression.
- Parameters:
global_model – A scikit-learn estimator instance, default=None. A trained model used as global model. Local models will be of the same type as this model. Required when
model_clsis None.model_cls – Class of scikit-learn estimator, default=None. Type of the global model and local models. When
model_clsis provided (instead ofglobal_model), a global model will be trained. Required whenglobal_model=None.model_clscreation parameters are specified askwargs.spatial_weights_definition – SpatialWeightsDefinition, default=None. Spatial relationship specification. This criteria is used to group data into neighborhoods and train local models.
bandwidth – int or float, default=None. Distance (fixed=True) or number of nearest neighbors (fixed=False). bandwidth + fixed is another way to set the spatial relationship specification. It is ignored if
spatial_weights_definitionwas set.fixed – bool, default=True. True if bandwidth represents a distance. False for number of nearest neighbors.
local_weight – float (0.0 to 1.0), default=0.25. Weight associated to the local models predictions.
kwargs – Additional parameters for the inner models created with parameter
model_cls.
- fit(X, y, geometries=None, crs=None, spatial_weights=None, spatial_weights_definition=None, fit_global_model=True, n_jobs=1, backend=None, batch_size=None)
Train a geographical regression model. Internally, a global model (if
fit_global_model=True) and several local models are trained. A local model is created for each neighborhood. A neighborhood is a spatial region containing multiple samples fromXthat are spatially related. Neighborhoods are built using the spatial relationship specified at model’s creation time (spatial_weights_definition,bandwidth) or using the spatial weights matrix object passed as parameter for training.- Parameters:
X – A SpatialDataFrame, DataFrame, GeoDataFrame or a 2d numpy array. Expected shape is (n_samples, n_features). Predicting data. For SpatialDataFrame or GeoDataFrame, the geometries can be found in
X, as a column. IfXcontains the column y, the parameterymust specify the name of that column.y – A 1d array or string. Target values. If
Xcontains a column with the target values this parameter will specify the name of that column instead.geometries – A list of Shapely geometries, a string (column name) or None, default=None. The geometries associated to
X. IfXis a SpatialDataFrame or a GeoDataFrame andXcontains the geometries as one of its columns, this parameter may contain the name of that column, or it can be None (in caseXhas a column called ‘geometry’).crs – pyproj.crs.CRS or string, default=None. Spatial reference system of geometries. Only used when
Xis a numpy array. It is ignored when CRS information is available inX(i.e. a SpatialDataFrame or GeoDataFrame)spatial_weights – SpatialWeightsDefinition or pysal weights object, default=None. A pre computed spatial weights matrix for the training data. If not None any spatial relationship specification was provided at the model’s creation time will be ignored.
fit_global_model – bool, default=True. If False, the global model will not be trained.
n_jobs – int, default=1. Number of processor cores used to parallelize local models training. Set -1 to use all the available cores.
backend – string, default=None. The Joblib backend to use when
n_jobs != 1. If None, Joblib’s default backend will be used (typically loki)batch_size – ‘auto’ or int, default=’auto’. Number of batch tasks per parallel job.
- Returns:
self. Fitted estimator.
- predict(X, y=None, geometries=None, crs=None)
Predict the target value for X using the global model and the local models that are closer to geometries. The returned prediction is calculated as follows: local_model_prediction * local_weight + global_model.prediction * (1.0 - local_weight)
- Parameters:
X – A SpatialDataFrame, DataFrame, GeoDataFrame or a 2d numpy array. Expected shape is (n_samples, n_features). Predicting data. For SpatialDataFrame or GeoDataFrame, the geometries can be found in X, as a column. If X contains the column y (e.g., a proxy or DataFrame used for training or testing), the parameter y must specify the name of that column so it can be excluded.
y – A string or None, default=None. If X contains a column with the target values, this parameter will specify the name of that column so it can be excluded for the prediction, otherwise, this parameter is not used.
geometries – A list of Shapely geometries, a string (column name) or None, default=None. The geometries associated to X samples. If X is a SpatialDataFrame or a GeoDataFrame and X contains the geometries as one of its columns, this parameter may contain the name of that column or it can be omitted (in case X has a column called ‘geometry’).
- Returns:
An array of shape n_samples, containing the predictions of the target variable for the given data.
- score(X, y, geometries=None, sample_weight=None, crs=None)
Returns the value of the regression score function or R-Squared. Represents the proportion of the variation in the dependent variable that is predictable from the independent variable(s). Best possible score is 1.0, and it can be negative, since the model can be arbitrarily worse. A constant model that always predicts the expected value of y, disregarding the input, would get an R-Squared score of 0.
- Parameters:
X – A SpatialDataFrame, DataFrame, GeoDataFrame or a 2d numpy array. Expected shape is (n_samples, n_features). Predicting data. For SpatialDataFrame or GeoDataFrame, the geometries can be found in X, as a column. If X contains the column y (e.g., a proxy or DataFrame used for training or testing), the parameter y must specify the name of that column so it can be excluded.
y – A string or None, default=None. If X contains a column with the target values, this parameter will specify the name of that column so it can be excluded for the prediction, otherwise, this parameter is not used.
geometries – A list of Shapely geometries, a string (column name) or None, default=None. The geometries associated to X samples. If X is a SpatialDataFrame or a GeoDataFrame and X contains the geometries as one of its columns, this parameter may contain the name of that column or it can be omitted (in case X has a column called ‘geometry’).
crs – pyproj.crs.CRS, default=None. Coordinate reference system.
- Returns:
The R-squared metric for the given data.
- class OLSRegressor(spatial_weights_definition=None)
The Ordinary Least Square (OLS) algorithm fits a line that minimizes the Mean Squared Error (MSE) from the training set to predict new values. By defining the parameter
spatial_weights_definition, it is possible to get spatial statistics after training the model; these statistics help identify the presence of spatial dependence or spatial heterogeneity- Parameters:
spatial_weights_definition – SpatialWeightsDefinition, default=None. Specifies the spatial relationship among neighbors. If defined it will include Spatial diagnostics, such as the Lagrange Multiplier tests and Moran’s I.
- property betas
- Returns:
An array with the estimated parameters of the trained model
- property diagnostics
If the OLS model was trained specifying spatial weights then we can retrieve spatial diagnostics.
- Returns:
A SpatialDiagnostics instance containing statistics of the trained model. If no model is defined, it returns None
- fit(X, y, geometries=None, crs=None, spatial_weights=None, spatial_weights_definition=None)
Trains the model using the training dataset.
- Parameters:
X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables
y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame.
geometries – shapely array, default=None. Geometry data for each sample in
X.crs – pyproj.crs.CRS, default=None. Coordinate reference system. Only used when
Xis a numpy array. It is ignored when CRS information is available inX(i.e. a SpatialDataFrame or GeoDataFrame).spatial_weights – SpatialWeights, default=None. A spatial weights matrix.
- Returns:
self. Fitted estimator.
- property k
- Returns:
The number of variables for which coefficients are estimated (including the constant, excluding lambda)
- property model_type
- Returns:
The regression model defined
- predict(X, geometries=None)
Estimates the target variable for the given data. If no model is defined, it returns None.
- Parameters:
X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables
geometries – shapely array, default=None. Geometry data for each sample in
X.
- Returns:
The prediction of the target variable for the given data.
- property predy
- Returns:
An array with the predictions for the training data
- score(X, y, sample_weight=None, geometries=None)
Returns the value of the regression score function or R-Squared. Represents the proportion of the variation in the dependent variable that is predictable from the independent variable(s). Best possible score is 1.0, and it can be negative, since the model can be arbitrarily worse. A constant model that always predicts the expected value of y, disregarding the input, would get an R-Squared score of 0.
- Parameters:
X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables
y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame.
sample_weight – Weighted contribution to the score for each sample.
geometries – shapely array, default=None. Geometry data for each sample in
X.
- Returns:
Returns the value of the regression score function or R-Squared. Represents the proportion of the variation in the dependent variable that is predictable from the independent variable(s).
- property summary
- Returns:
The summary of the trained model
- property u
- Returns:
An array with the residuals of the trained model
- class SLXRegressor(spatial_weights_definition=None)
The SLX regression model executes a regular Liner Regression involving a feature engineering step to add features that provide a spatial context to the data since, according to Tobler’s law, closer things are more related than distant things. The algorithm adds one or more columns with the spatial lag of certain features, representing the average from neighboring observations.
- Parameters:
spatial_weights_definition – SpatialWeightsDefinition, default=None. Specifies the spatial relationship among neighbors.
- property betas
- Returns:
An array with the estimated parameters of the trained model
- property diagnostics
- Returns:
A SpatialDiagnostics instance containing statistics of the trained model. If no model is defined, it returns None
- fit(X, y, geometries=None, crs=None, spatial_weights=None, column_ids=None)
Trains an OLS model using a combination of the independent variables in
Xand the spatial lag of the columns specified incolumn_ids- Parameters:
X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables
y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame.
geometries – shapely array, default=None. Geometry data for each sample in X.
crs – pyproj.crs.CRS, default=None. Coordinate reference system. Only used when
Xis a numpy array. It is ignored when CRS information is available inX(i.e. a SpatialDataFrame or GeoDataFrame)spatial_weights – SpatialWeights, default=None. A spatial weights matrix.
column_ids – List of strings or list of integers, default=None. A list of column names or column indexes, indicating the columns that will be used to compute the spatial lag.
- Returns:
self. Fitted estimator.
- property k
- Returns:
The number of variables for which coefficients are estimated (including the constant, excluding lambda)
- property model_type
- Returns:
The regression model defined
- predict(X, geometries=None, spatial_weights=None, use_fit_lag=False)
Estimates the target variable for the given data. If
use_fit_lag=Falseit calculates the spatial lag from the prediction data, otherwise, it will compute the spatial lag from the training data.- Parameters:
X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables
geometries – shapely array, default=None. Geometry data for each sample in
X.spatial_weights – SpatialWeights, default=None. A spatial weights matrix.
use_fit_lag – boolean, default=False. If false, it will use the spatial lag from tre prediction data, otherwise, it will use the training data to calculate the spatial lag.
- Returns:
The prediction of the target variable for the given data.
- property predy
- Returns:
An array with the predictions for the training data
- score(X, y, sample_weight=None, geometries=None, use_fit_lag=False)
Returns the R-Squared metric. Represents the proportion of the variation in the dependent variable that is predictable from the independent variable(s). If
use_fit_lag=Falseit calculates the spatial lag from the given data, otherwise, it will compute the spatial lag from the training data.- Parameters:
X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables
y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame
sample_weight – Weighted contribution to the score for each sample
geometries – shapely array, default=None. Geometry data for each sample in X.
use_fit_lag – boolean, default=False. If false, it will use the spatial lag from the prediction data, otherwise, it will use the training data to calculate the spatial lag.
- Returns:
The R-Squared metric for the given data.
- property summary
- Returns:
The summary of the trained model
- property u
- Returns:
An array with the residuals of the trained model
- class SpatialAdaptiveRegressor(spatial_weights_definition=None)
Consists of an automated approach that finds the regression algorithm that better fits the data. From spatial diagnostics, the algorithm gets the Moran’s I. A positive value of Moran’s I indicate the presence of spatial dependence, or spatial clustering, and an algorithm that includes this spatial dependence is preferred. If the Moran’s is negative, it indicates the presence of regional variance or spatial heteroskedasticity, and a local method is more suitable. If the parameter
spatial_weights_definitionis not specified, it suggests the OLS model.- Parameters:
spatial_weights_definition – SpatialWeightsDefinition, default=None. Specifies the spatial relationship among neighbors.
- property betas
- Returns:
An array with the estimated parameters of the trained model
- property diagnostics
- Returns:
A SpatialDiagnostics instance containing statistics of the trained model. If no model is defined, it returns None
- fit(X, y, geometries=None, crs=None, spatial_weights=None)
Selects the regression model that better fits the data and trains it with the given training data.
- Parameters:
X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables
y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame.
geometries – shapely array, default=None. Geometry data for each sample in
X.crs – pyproj.crs.CRS, default=None. Coordinate reference system. Only used when
Xis a numpy array. It is ignored when CRS information is available inX(i.e. a SpatialDataFrame or GeoDataFrame).spatial_weights – SpatialWeights, default=None. A spatial weights matrix.
- Returns:
self. Fitted estimator.
- property k
- Returns:
The number of variables for which coefficients are estimated (including the constant, excluding lambda)
- property model_type
- Returns:
The regression model defined
- predict(X, geometries=None)
Estimates the target variable for the given data. If no model is defined, it returns None.
- Parameters:
X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables
geometries – shapely array, default=None. Geometry data for each sample in
X.
- Returns:
The prediction of the target variable for the given data.
- property predy
- Returns:
An array with the predictions for the training data
- score(X, y, sample_weight=None, geometries=None)
Returns the value of the regression score function or R-Squared. Represents the proportion of the variation in the dependent variable that is predictable from the independent variable(s). Best possible score is 1.0, and it can be negative, since the model can be arbitrarily worse. A constant model that always predicts the expected value of y, disregarding the input, would get an R-Squared score of 0.
- Parameters:
X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables
y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame.
sample_weight – Weighted contribution to the score for each sample.
geometries – shapely array, default=None. Geometry data for each sample in
X.
- Returns:
Returns the value of the regression score function or R-Squared. Represents the proportion of the variation in the dependent variable that is predictable from the independent variable(s).
- property summary
- Returns:
The summary of the trained model
- property u
- Returns:
An array with the residuals of the trained model
- class SpatialErrorRegressor(spatial_weights_definition=None)
The Spatial Error model introduces a spatial lag in the error term of the linear equation. By adding the spatial lag in the residual, the neighbors’ errors influence the observation error; this results in an extra parameter associated with the spatial lag of the error term
- Parameters:
spatial_weights_definition – SpatialWeightsDefinition, default=None. Specifies the spatial relationship among neighbors.
- property betas
- Returns:
An array with the estimated parameters of the trained model
- property diagnostics
- Returns:
A SpatialDiagnostics instance containing statistics of the trained model. If no model is defined, it returns None
- fit(X, y, geometries=None, crs=None, spatial_weights=None)
Trains the model using the training dataset.
- Parameters:
X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables
y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame.
geometries – shapely array, default=None. Geometry data for each sample in
X.crs – pyproj.crs.CRS, default=None. Coordinate reference system. Only used when
Xis a numpy array. It is ignored when CRS information is available inX(i.e. a SpatialDataFrame or GeoDataFrame).spatial_weights – SpatialWeights, default=None. A spatial weights matrix.
- Returns:
self. Fitted estimator.
- property k
- Returns:
The number of variables for which coefficients are estimated (including the constant, excluding lambda)
- property model_type
- Returns:
The regression model defined
- predict(X, geometries=None)
Estimates the target variable for the given data. If no model is defined, it returns None.
- Parameters:
X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables
geometries – shapely array, default=None. Geometry data for each sample in
X.
- Returns:
The prediction of the target variable for the given data.
- property predy
- Returns:
An array with the predictions for the training data
- score(X, y, sample_weight=None, geometries=None)
Returns the value of the regression score function or R-Squared. Represents the proportion of the variation in the dependent variable that is predictable from the independent variable(s). Best possible score is 1.0, and it can be negative, since the model can be arbitrarily worse. A constant model that always predicts the expected value of y, disregarding the input, would get an R-Squared score of 0.
- Parameters:
X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables
y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame.
sample_weight – Weighted contribution to the score for each sample.
geometries – shapely array, default=None. Geometry data for each sample in
X.
- Returns:
Returns the value of the regression score function or R-Squared. Represents the proportion of the variation in the dependent variable that is predictable from the independent variable(s).
- property summary
- Returns:
The summary of the trained model
- property u
- Returns:
An array with the residuals of the trained model
- class SpatialFixedEffectsRegressor(spatial_weights_definition=None)
The Spatial Fixed Effects algorithm is a simplified version of Spatial Regimes, computing an intercept or constant parameter for each regime, while the other model parameters remain constant.
- Parameters:
spatial_weights_definition – SpatialWeightsDefinition, default=None. Specifies the spatial relationship among neighbors.
- property betas
- Returns:
An array with the estimated parameters of the trained model
- property diagnostics
- Returns:
A SpatialDiagnostics instance containing statistics of the trained model. If no model is defined, it returns None
- fit(X, y, geometries=None, crs=None, spatial_weights=None, regimes=None)
Trains an OLS model where the intercept parameter changes depending on the regime, the rest of the parameters remain constant.
- Parameters:
X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables
y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame.
geometries – shapely array, default=None. Geometry data for each sample in
X.crs – pyproj.crs.CRS, default=None. Coordinate reference system. Only used when
Xis a numpy array. It is ignored when CRS information is available inX(i.e. a SpatialDataFrame or GeoDataFrame)spatial_weights – SpatialWeights, default=None. A spatial weights matrix.
regimes – List, default=None. A list with n values with the mapping of each observation to a regime. The value of n must be equal to the number of rows in
X.
- Returns:
self. Fitted estimator.
- property k
- Returns:
The number of variables for which coefficients are estimated (including the constant, excluding lambda)
- property model_type
- Returns:
The regression model defined
- predict(X, regimes=None, geometries=None, spatial_weights=None)
Evaluates the model using the given data and the regimes. If no model is defined returns None. It gets the corresponding intercept for the model according to the regimes and evaluates the model using the prediction data
- Parameters:
X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables
regimes – List, default=None. A list with n values with the mapping of each observation to a regime. The value of n must be equal to the number of rows in
X.geometries – shapely array, default=None. Geometry data for each sample in
X.spatial_weights – SpatialWeights, default=None. A spatial weights matrix.
- Returns:
The prediction of the target variable for the given data.
- property predy
- Returns:
An array with the predictions for the training data
- score(X, y, sample_weight=None, regimes=None, geometries=None, spatial_weights=None)
Returns R-Squared metric. It gets the corresponding intercept for the model according to the regimes and estimates the target variable with the given data.
- Parameters:
X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables.
sample_weight – Weighted contribution to the score for each sample.
regimes – List, default=None. A list with n values with the mapping of each observation to a regime. The value of n must be equal to the number of rows in
X.geometries – shapely array, default=None. Geometry data for each sample in
X.spatial_weights – SpatialWeights, default=None. A spatial weights matrix.
- Returns:
The R-squared metric for the given data.
- property summary
- Returns:
The summary of the trained model
- property u
- Returns:
An array with the residuals of the trained model
- class SpatialLagRegressor(spatial_weights_definition=None)
The Spatial Lag regression model considers spatial dependence over the target variable, meaning that the value of a region’s target variable is related to its neighbors’ target variable. The Spatial Lag model includes the spatial lag of the dependent variable into the linear equation; this results in an extra parameter associated with the spatial lag of the dependent variable
- Parameters:
spatial_weights_definition – SpatialWeightsDefinition, default=None. Specifies the spatial relationship among neighbors.
- property betas
- Returns:
An array with the estimated parameters of the trained model
- property diagnostics
- Returns:
A SpatialDiagnostics instance containing statistics of the trained model. If no model is defined, it returns None
- fit(X, y, geometries=None, crs=None, spatial_weights=None)
Trains the model using the training dataset.
- Parameters:
X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables
y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame.
geometries – shapely array, default=None. Geometry data for each sample in
X.crs – pyproj.crs.CRS, default=None. Coordinate reference system. Only used when
Xis a numpy array. It is ignored when CRS information is available inX(i.e. a SpatialDataFrame or GeoDataFrame).spatial_weights – SpatialWeights, default=None. A spatial weights matrix.
- Returns:
self. Fitted estimator.
- property k
- Returns:
The number of variables for which coefficients are estimated (including the constant and rho)
- property model_type
- Returns:
The regression model defined
- predict(X, geometries=None)
Estimates the target variable for the given data. If no model is defined, it returns None.
- Parameters:
X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables
geometries – shapely array, default=None. Geometry data for each sample in
X.
- Returns:
The prediction of the target variable for the given data.
- property predy
- Returns:
An array with the predictions for the training data
- score(X, y, sample_weight=None, geometries=None)
Returns the value of the regression score function or R-Squared. Represents the proportion of the variation in the dependent variable that is predictable from the independent variable(s). Best possible score is 1.0, and it can be negative, since the model can be arbitrarily worse. A constant model that always predicts the expected value of y, disregarding the input, would get an R-Squared score of 0.
- Parameters:
X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables
y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame.
sample_weight – Weighted contribution to the score for each sample.
geometries – shapely array, default=None. Geometry data for each sample in
X.
- Returns:
Returns the value of the regression score function or R-Squared. Represents the proportion of the variation in the dependent variable that is predictable from the independent variable(s).
- property summary
- Returns:
The summary of the trained model
- property u
- Returns:
An array with the residuals of the trained model
- class SpatialRegimesRegressor(spatial_weights_definition=None)
The regression equation parameters are estimated according to a categorical variable called regime; this categorical variable can represent different things, such as a region in a spatial context. Neighborhoods, such as district or block names, can be used to define regimes. The model consists of linear regression models where the terms of the linear equation vary depending on the regime.
- Parameters:
spatial_weights_definition – SpatialWeightsDefinition, default=None. Specifies the spatial relationship among neighbors.
- property betas
- Returns:
An array with the estimated parameters of the trained model
- property diagnostics
- Returns:
A SpatialDiagnostics instance containing statistics of the trained model. If no model is defined, it returns None
- fit(X, y, geometries=None, crs=None, spatial_weights=None, regimes=None)
The parameter
regimesindicates the categorical variable used as regime. An OLS model is trained for each regime, obtaining a different set of parameters for each.- Parameters:
X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables
y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame.
geometries – shapely array, default=None. Geometry data for each sample in
X.crs – pyproj.crs.CRS, default=None. Coordinate reference system. Only used when X is a numpy array. It is ignored when CRS information is available in
X(i.e. a SpatialDataFrame or GeoDataFrame)spatial_weights – SpatialWeights, default=None. A spatial weights matrix.
regimes – List, default=None. A list with n values with the mapping of each observation to a regime. The value of n must be equal to the number of rows in X.
- Returns:
self. Fitted estimator.
- property k
- Returns:
The number of variables for which coefficients are estimated (including the constant, excluding lambda)
- property model_type
- Returns:
The regression model defined
- predict(X, regimes=None, geometries=None, spatial_weights=None)
Evaluates the model using the given data and the regimes. If no model is defined returns None. It gets the corresponding set of parameters for the model according to the regimes and evaluates the model using the prediction data
- Parameters:
X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables
regimes – List, default=None. A list with n values with the mapping of each observation to a regime. The value of n must be equal to the number of rows in
X.geometries – shapely array, default=None. Geometry data for each sample in
X.spatial_weights – SpatialWeights, default=None. A spatial weights matrix.
- Returns:
The prediction of the target variable for the given data.
- property predy
- Returns:
An array with the predictions for the training data
- score(X, y, sample_weight=None, regimes=None, geometries=None, spatial_weights=None)
Returns R-Squared metric. It gets the corresponding set of parameters for the model according to the regimes and estimates the target variable with the given data.
- Parameters:
X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables.
sample_weight – Weighted contribution to the score for each sample.
regimes – List, default=None. A list with n values with the mapping of each observation to a regime. The value of n must be equal to the number of rows in
X.geometries – shapely array, default=None. Geometry data for each sample in
X.spatial_weights – SpatialWeights, default=None. A spatial weights matrix.
- Returns:
The R-squared metric for the given data.
- property summary
- Returns:
The summary of the trained model
- property u
- Returns:
An array with the residuals of the trained model