oraclesai.regression

class GWRRegressor(spatial_weights_definition=None, bandwidth=None, fixed=True)

The GWR model trains a local regression model for every observation in the dataset by incorporating the target and explanatory variables from the observations within their neighborhood, allowing the relationships between the independent and dependent variables to vary by locality.

Parameters:
  • spatial_weights_definition – SpatialWeightsDefinition, default=None. Specifies the spatial relationship among neighbors.

  • bandwidth – scalar, default=None. Bandwidth value consisting of either a distance or K nearest neighbors. If not None, ignores the spatial_weights_definition parameter and defines spatial weights according to fixed; if fixed is True, it uses DistanceBandWeightsDefinition, otherwise, it uses KNNWeightsDefinition.

  • fixed – boolean, default=True. True for based distance based kernel function and False for adaptive (nearest neighbor) kernel function.

property betas
Returns:

A 2D-array with the estimated parameters (n x k) for the trained GWR model

property diagnostics
Returns:

A SpatialDiagnostics instance containing statistics of the trained model. If no model is defined, it returns None

fit(X, y, geometries=None, crs=None)

Executes local linear regressions for every sample on the dataset, incorporating the dependent and independent variables of locations falling within a specified bandwidth. If spatial_weights_definition is defined, it ignores the parameter bandwidth and obtains the bandwidth from the spatial weights. If spatial_weights_definition is not defined, it estimates the bandwidth using the coordinates.

Parameters:
  • X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables

  • y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame.

  • geometries – shapely array, default=None. Geometry data for each sample in X.

  • crs – pyproj.crs.CRS, default=None. Coordinate reference system. Only used when X is a numpy array. It is ignored when CRS information is available in X (i.e. a SpatialDataFrame or GeoDataFrame)

Returns:

self. Fitted estimator.

property k
Returns:

The number of variables for which coefficients are estimated (including the constant, excluding lambda)

property model_type
Returns:

The regression model defined

predict(X, geometries=None)

Evaluates the GWR model using the given data. If no model is defined returns None. It builds a local model for each row of the prediction set. Returns a 1D numpy array with the predictions of each local model.

Parameters:
  • X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables

  • y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame.

  • geometries – shapely array, default=None. Geometry data for each sample in X.

  • crs – pyproj.crs.CRS, default=None. Coordinate reference system. Only used when X is a numpy array. It is ignored when CRS information is available in X (i.e. X is a SpatialDataFrame or GeoDataFrame)

Returns:

The prediction of the target variable for the given data.

property predy
Returns:

An array with the predictions for the training data

score(X, y, sample_weight=None, geometries=None)

Returns the value of the regression score function or R-Squared. Represents the proportion of the variation in the dependent variable that is predictable from the independent variable(s). Best possible score is 1.0, and it can be negative, since the model can be arbitrarily worse. A constant model that always predicts the expected value of y, disregarding the input, would get an R-Squared score of 0.

Parameters:
  • X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables

  • y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame.

  • sample_weight – Weighted contribution to the score for each sample.

  • geometries – shapely array, default=None. Geometry data for each sample in X.

Returns:

Returns the value of the regression score function or R-Squared. Represents the proportion of the variation in the dependent variable that is predictable from the independent variable(s).

property summary
Returns:

The summary of the trained model

property u
Returns:

An array with the residuals of the trained model

class GeographicalRegressor(global_model=None, model_cls=None, spatial_weights_definition=None, bandwidth=None, fixed=True, local_weight=0.25, **kwargs)

Geographical regression algotithm. It uses a global model and several local models to perform regression.

Parameters:
  • global_model – A scikit-learn estimator instance, default=None. A trained model used as global model. Local models will be of the same type as this model. Required when model_cls is None.

  • model_cls – Class of scikit-learn estimator, default=None. Type of the global model and local models. When model_cls is provided (instead of global_model), a global model will be trained. Required when global_model=None. model_cls creation parameters are specified as kwargs.

  • spatial_weights_definition – SpatialWeightsDefinition, default=None. Spatial relationship specification. This criteria is used to group data into neighborhoods and train local models.

  • bandwidth – int or float, default=None. Distance (fixed=True) or number of nearest neighbors (fixed=False). bandwidth + fixed is another way to set the spatial relationship specification. It is ignored if spatial_weights_definition was set.

  • fixed – bool, default=True. True if bandwidth represents a distance. False for number of nearest neighbors.

  • local_weight – float (0.0 to 1.0), default=0.25. Weight associated to the local models predictions.

  • kwargs – Additional parameters for the inner models created with parameter model_cls.

fit(X, y, geometries=None, crs=None, spatial_weights=None, spatial_weights_definition=None, fit_global_model=True, n_jobs=1, backend=None, batch_size=None)

Train a geographical regression model. Internally, a global model (if fit_global_model=True) and several local models are trained. A local model is created for each neighborhood. A neighborhood is a spatial region containing multiple samples from X that are spatially related. Neighborhoods are built using the spatial relationship specified at model’s creation time (spatial_weights_definition, bandwidth) or using the spatial weights matrix object passed as parameter for training.

Parameters:
  • X – A SpatialDataFrame, DataFrame, GeoDataFrame or a 2d numpy array. Expected shape is (n_samples, n_features). Predicting data. For SpatialDataFrame or GeoDataFrame, the geometries can be found in X, as a column. If X contains the column y, the parameter y must specify the name of that column.

  • y – A 1d array or string. Target values. If X contains a column with the target values this parameter will specify the name of that column instead.

  • geometries – A list of Shapely geometries, a string (column name) or None, default=None. The geometries associated to X. If X is a SpatialDataFrame or a GeoDataFrame and X contains the geometries as one of its columns, this parameter may contain the name of that column, or it can be None (in case X has a column called ‘geometry’).

  • crs – pyproj.crs.CRS or string, default=None. Spatial reference system of geometries. Only used when X is a numpy array. It is ignored when CRS information is available in X (i.e. a SpatialDataFrame or GeoDataFrame)

  • spatial_weights – SpatialWeightsDefinition or pysal weights object, default=None. A pre computed spatial weights matrix for the training data. If not None any spatial relationship specification was provided at the model’s creation time will be ignored.

  • fit_global_model – bool, default=True. If False, the global model will not be trained.

  • n_jobs – int, default=1. Number of processor cores used to parallelize local models training. Set -1 to use all the available cores.

  • backend – string, default=None. The Joblib backend to use when n_jobs != 1. If None, Joblib’s default backend will be used (typically loki)

  • batch_size – ‘auto’ or int, default=’auto’. Number of batch tasks per parallel job.

Returns:

self. Fitted estimator.

predict(X, y=None, geometries=None, crs=None)

Predict the target value for X using the global model and the local models that are closer to geometries. The returned prediction is calculated as follows: local_model_prediction * local_weight + global_model.prediction * (1.0 - local_weight)

Parameters:
  • X – A SpatialDataFrame, DataFrame, GeoDataFrame or a 2d numpy array. Expected shape is (n_samples, n_features). Predicting data. For SpatialDataFrame or GeoDataFrame, the geometries can be found in X, as a column. If X contains the column y (e.g., a proxy or DataFrame used for training or testing), the parameter y must specify the name of that column so it can be excluded.

  • y – A string or None, default=None. If X contains a column with the target values, this parameter will specify the name of that column so it can be excluded for the prediction, otherwise, this parameter is not used.

  • geometries – A list of Shapely geometries, a string (column name) or None, default=None. The geometries associated to X samples. If X is a SpatialDataFrame or a GeoDataFrame and X contains the geometries as one of its columns, this parameter may contain the name of that column or it can be omitted (in case X has a column called ‘geometry’).

Returns:

An array of shape n_samples, containing the predictions of the target variable for the given data.

score(X, y, geometries=None, sample_weight=None, crs=None)

Returns the value of the regression score function or R-Squared. Represents the proportion of the variation in the dependent variable that is predictable from the independent variable(s). Best possible score is 1.0, and it can be negative, since the model can be arbitrarily worse. A constant model that always predicts the expected value of y, disregarding the input, would get an R-Squared score of 0.

Parameters:
  • X – A SpatialDataFrame, DataFrame, GeoDataFrame or a 2d numpy array. Expected shape is (n_samples, n_features). Predicting data. For SpatialDataFrame or GeoDataFrame, the geometries can be found in X, as a column. If X contains the column y (e.g., a proxy or DataFrame used for training or testing), the parameter y must specify the name of that column so it can be excluded.

  • y – A string or None, default=None. If X contains a column with the target values, this parameter will specify the name of that column so it can be excluded for the prediction, otherwise, this parameter is not used.

  • geometries – A list of Shapely geometries, a string (column name) or None, default=None. The geometries associated to X samples. If X is a SpatialDataFrame or a GeoDataFrame and X contains the geometries as one of its columns, this parameter may contain the name of that column or it can be omitted (in case X has a column called ‘geometry’).

  • crs – pyproj.crs.CRS, default=None. Coordinate reference system.

Returns:

The R-squared metric for the given data.

class OLSRegressor(spatial_weights_definition=None)

The Ordinary Least Square (OLS) algorithm fits a line that minimizes the Mean Squared Error (MSE) from the training set to predict new values. By defining the parameter spatial_weights_definition, it is possible to get spatial statistics after training the model; these statistics help identify the presence of spatial dependence or spatial heterogeneity

Parameters:

spatial_weights_definition – SpatialWeightsDefinition, default=None. Specifies the spatial relationship among neighbors. If defined it will include Spatial diagnostics, such as the Lagrange Multiplier tests and Moran’s I.

property betas
Returns:

An array with the estimated parameters of the trained model

property diagnostics

If the OLS model was trained specifying spatial weights then we can retrieve spatial diagnostics.

Returns:

A SpatialDiagnostics instance containing statistics of the trained model. If no model is defined, it returns None

fit(X, y, geometries=None, crs=None, spatial_weights=None, spatial_weights_definition=None)

Trains the model using the training dataset.

Parameters:
  • X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables

  • y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame.

  • geometries – shapely array, default=None. Geometry data for each sample in X.

  • crs – pyproj.crs.CRS, default=None. Coordinate reference system. Only used when X is a numpy array. It is ignored when CRS information is available in X (i.e. a SpatialDataFrame or GeoDataFrame).

  • spatial_weights – SpatialWeights, default=None. A spatial weights matrix.

Returns:

self. Fitted estimator.

property k
Returns:

The number of variables for which coefficients are estimated (including the constant, excluding lambda)

property model_type
Returns:

The regression model defined

predict(X, geometries=None)

Estimates the target variable for the given data. If no model is defined, it returns None.

Parameters:
  • X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables

  • geometries – shapely array, default=None. Geometry data for each sample in X.

Returns:

The prediction of the target variable for the given data.

property predy
Returns:

An array with the predictions for the training data

score(X, y, sample_weight=None, geometries=None)

Returns the value of the regression score function or R-Squared. Represents the proportion of the variation in the dependent variable that is predictable from the independent variable(s). Best possible score is 1.0, and it can be negative, since the model can be arbitrarily worse. A constant model that always predicts the expected value of y, disregarding the input, would get an R-Squared score of 0.

Parameters:
  • X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables

  • y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame.

  • sample_weight – Weighted contribution to the score for each sample.

  • geometries – shapely array, default=None. Geometry data for each sample in X.

Returns:

Returns the value of the regression score function or R-Squared. Represents the proportion of the variation in the dependent variable that is predictable from the independent variable(s).

property summary
Returns:

The summary of the trained model

property u
Returns:

An array with the residuals of the trained model

class SLXRegressor(spatial_weights_definition=None)

The SLX regression model executes a regular Liner Regression involving a feature engineering step to add features that provide a spatial context to the data since, according to Tobler’s law, closer things are more related than distant things. The algorithm adds one or more columns with the spatial lag of certain features, representing the average from neighboring observations.

Parameters:

spatial_weights_definition – SpatialWeightsDefinition, default=None. Specifies the spatial relationship among neighbors.

property betas
Returns:

An array with the estimated parameters of the trained model

property diagnostics
Returns:

A SpatialDiagnostics instance containing statistics of the trained model. If no model is defined, it returns None

fit(X, y, geometries=None, crs=None, spatial_weights=None, column_ids=None)

Trains an OLS model using a combination of the independent variables in X and the spatial lag of the columns specified in column_ids

Parameters:
  • X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables

  • y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame.

  • geometries – shapely array, default=None. Geometry data for each sample in X.

  • crs – pyproj.crs.CRS, default=None. Coordinate reference system. Only used when X is a numpy array. It is ignored when CRS information is available in X (i.e. a SpatialDataFrame or GeoDataFrame)

  • spatial_weights – SpatialWeights, default=None. A spatial weights matrix.

  • column_ids – List of strings or list of integers, default=None. A list of column names or column indexes, indicating the columns that will be used to compute the spatial lag.

Returns:

self. Fitted estimator.

property k
Returns:

The number of variables for which coefficients are estimated (including the constant, excluding lambda)

property model_type
Returns:

The regression model defined

predict(X, geometries=None, spatial_weights=None, use_fit_lag=False)

Estimates the target variable for the given data. If use_fit_lag=False it calculates the spatial lag from the prediction data, otherwise, it will compute the spatial lag from the training data.

Parameters:
  • X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables

  • geometries – shapely array, default=None. Geometry data for each sample in X.

  • spatial_weights – SpatialWeights, default=None. A spatial weights matrix.

  • use_fit_lag – boolean, default=False. If false, it will use the spatial lag from tre prediction data, otherwise, it will use the training data to calculate the spatial lag.

Returns:

The prediction of the target variable for the given data.

property predy
Returns:

An array with the predictions for the training data

score(X, y, sample_weight=None, geometries=None, use_fit_lag=False)

Returns the R-Squared metric. Represents the proportion of the variation in the dependent variable that is predictable from the independent variable(s). If use_fit_lag=False it calculates the spatial lag from the given data, otherwise, it will compute the spatial lag from the training data.

Parameters:
  • X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables

  • y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame

  • sample_weight – Weighted contribution to the score for each sample

  • geometries – shapely array, default=None. Geometry data for each sample in X.

  • use_fit_lag – boolean, default=False. If false, it will use the spatial lag from the prediction data, otherwise, it will use the training data to calculate the spatial lag.

Returns:

The R-Squared metric for the given data.

property summary
Returns:

The summary of the trained model

property u
Returns:

An array with the residuals of the trained model

class SpatialAdaptiveRegressor(spatial_weights_definition=None)

Consists of an automated approach that finds the regression algorithm that better fits the data. From spatial diagnostics, the algorithm gets the Moran’s I. A positive value of Moran’s I indicate the presence of spatial dependence, or spatial clustering, and an algorithm that includes this spatial dependence is preferred. If the Moran’s is negative, it indicates the presence of regional variance or spatial heteroskedasticity, and a local method is more suitable. If the parameter spatial_weights_definition is not specified, it suggests the OLS model.

Parameters:

spatial_weights_definition – SpatialWeightsDefinition, default=None. Specifies the spatial relationship among neighbors.

property betas
Returns:

An array with the estimated parameters of the trained model

property diagnostics
Returns:

A SpatialDiagnostics instance containing statistics of the trained model. If no model is defined, it returns None

fit(X, y, geometries=None, crs=None, spatial_weights=None)

Selects the regression model that better fits the data and trains it with the given training data.

Parameters:
  • X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables

  • y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame.

  • geometries – shapely array, default=None. Geometry data for each sample in X.

  • crs – pyproj.crs.CRS, default=None. Coordinate reference system. Only used when X is a numpy array. It is ignored when CRS information is available in X (i.e. a SpatialDataFrame or GeoDataFrame).

  • spatial_weights – SpatialWeights, default=None. A spatial weights matrix.

Returns:

self. Fitted estimator.

property k
Returns:

The number of variables for which coefficients are estimated (including the constant, excluding lambda)

property model_type
Returns:

The regression model defined

predict(X, geometries=None)

Estimates the target variable for the given data. If no model is defined, it returns None.

Parameters:
  • X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables

  • geometries – shapely array, default=None. Geometry data for each sample in X.

Returns:

The prediction of the target variable for the given data.

property predy
Returns:

An array with the predictions for the training data

score(X, y, sample_weight=None, geometries=None)

Returns the value of the regression score function or R-Squared. Represents the proportion of the variation in the dependent variable that is predictable from the independent variable(s). Best possible score is 1.0, and it can be negative, since the model can be arbitrarily worse. A constant model that always predicts the expected value of y, disregarding the input, would get an R-Squared score of 0.

Parameters:
  • X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables

  • y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame.

  • sample_weight – Weighted contribution to the score for each sample.

  • geometries – shapely array, default=None. Geometry data for each sample in X.

Returns:

Returns the value of the regression score function or R-Squared. Represents the proportion of the variation in the dependent variable that is predictable from the independent variable(s).

property summary
Returns:

The summary of the trained model

property u
Returns:

An array with the residuals of the trained model

class SpatialErrorRegressor(spatial_weights_definition=None)

The Spatial Error model introduces a spatial lag in the error term of the linear equation. By adding the spatial lag in the residual, the neighbors’ errors influence the observation error; this results in an extra parameter associated with the spatial lag of the error term

Parameters:

spatial_weights_definition – SpatialWeightsDefinition, default=None. Specifies the spatial relationship among neighbors.

property betas
Returns:

An array with the estimated parameters of the trained model

property diagnostics
Returns:

A SpatialDiagnostics instance containing statistics of the trained model. If no model is defined, it returns None

fit(X, y, geometries=None, crs=None, spatial_weights=None)

Trains the model using the training dataset.

Parameters:
  • X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables

  • y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame.

  • geometries – shapely array, default=None. Geometry data for each sample in X.

  • crs – pyproj.crs.CRS, default=None. Coordinate reference system. Only used when X is a numpy array. It is ignored when CRS information is available in X (i.e. a SpatialDataFrame or GeoDataFrame).

  • spatial_weights – SpatialWeights, default=None. A spatial weights matrix.

Returns:

self. Fitted estimator.

property k
Returns:

The number of variables for which coefficients are estimated (including the constant, excluding lambda)

property model_type
Returns:

The regression model defined

predict(X, geometries=None)

Estimates the target variable for the given data. If no model is defined, it returns None.

Parameters:
  • X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables

  • geometries – shapely array, default=None. Geometry data for each sample in X.

Returns:

The prediction of the target variable for the given data.

property predy
Returns:

An array with the predictions for the training data

score(X, y, sample_weight=None, geometries=None)

Returns the value of the regression score function or R-Squared. Represents the proportion of the variation in the dependent variable that is predictable from the independent variable(s). Best possible score is 1.0, and it can be negative, since the model can be arbitrarily worse. A constant model that always predicts the expected value of y, disregarding the input, would get an R-Squared score of 0.

Parameters:
  • X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables

  • y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame.

  • sample_weight – Weighted contribution to the score for each sample.

  • geometries – shapely array, default=None. Geometry data for each sample in X.

Returns:

Returns the value of the regression score function or R-Squared. Represents the proportion of the variation in the dependent variable that is predictable from the independent variable(s).

property summary
Returns:

The summary of the trained model

property u
Returns:

An array with the residuals of the trained model

class SpatialFixedEffectsRegressor(spatial_weights_definition=None)

The Spatial Fixed Effects algorithm is a simplified version of Spatial Regimes, computing an intercept or constant parameter for each regime, while the other model parameters remain constant.

Parameters:

spatial_weights_definition – SpatialWeightsDefinition, default=None. Specifies the spatial relationship among neighbors.

property betas
Returns:

An array with the estimated parameters of the trained model

property diagnostics
Returns:

A SpatialDiagnostics instance containing statistics of the trained model. If no model is defined, it returns None

fit(X, y, geometries=None, crs=None, spatial_weights=None, regimes=None)

Trains an OLS model where the intercept parameter changes depending on the regime, the rest of the parameters remain constant.

Parameters:
  • X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables

  • y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame.

  • geometries – shapely array, default=None. Geometry data for each sample in X.

  • crs – pyproj.crs.CRS, default=None. Coordinate reference system. Only used when X is a numpy array. It is ignored when CRS information is available in X (i.e. a SpatialDataFrame or GeoDataFrame)

  • spatial_weights – SpatialWeights, default=None. A spatial weights matrix.

  • regimes – List, default=None. A list with n values with the mapping of each observation to a regime. The value of n must be equal to the number of rows in X.

Returns:

self. Fitted estimator.

property k
Returns:

The number of variables for which coefficients are estimated (including the constant, excluding lambda)

property model_type
Returns:

The regression model defined

predict(X, regimes=None, geometries=None, spatial_weights=None)

Evaluates the model using the given data and the regimes. If no model is defined returns None. It gets the corresponding intercept for the model according to the regimes and evaluates the model using the prediction data

Parameters:
  • X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables

  • regimes – List, default=None. A list with n values with the mapping of each observation to a regime. The value of n must be equal to the number of rows in X.

  • geometries – shapely array, default=None. Geometry data for each sample in X.

  • spatial_weights – SpatialWeights, default=None. A spatial weights matrix.

Returns:

The prediction of the target variable for the given data.

property predy
Returns:

An array with the predictions for the training data

score(X, y, sample_weight=None, regimes=None, geometries=None, spatial_weights=None)

Returns R-Squared metric. It gets the corresponding intercept for the model according to the regimes and estimates the target variable with the given data.

Parameters:
  • X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables.

  • sample_weight – Weighted contribution to the score for each sample.

  • regimes – List, default=None. A list with n values with the mapping of each observation to a regime. The value of n must be equal to the number of rows in X.

  • geometries – shapely array, default=None. Geometry data for each sample in X.

  • spatial_weights – SpatialWeights, default=None. A spatial weights matrix.

Returns:

The R-squared metric for the given data.

property summary
Returns:

The summary of the trained model

property u
Returns:

An array with the residuals of the trained model

class SpatialLagRegressor(spatial_weights_definition=None)

The Spatial Lag regression model considers spatial dependence over the target variable, meaning that the value of a region’s target variable is related to its neighbors’ target variable. The Spatial Lag model includes the spatial lag of the dependent variable into the linear equation; this results in an extra parameter associated with the spatial lag of the dependent variable

Parameters:

spatial_weights_definition – SpatialWeightsDefinition, default=None. Specifies the spatial relationship among neighbors.

property betas
Returns:

An array with the estimated parameters of the trained model

property diagnostics
Returns:

A SpatialDiagnostics instance containing statistics of the trained model. If no model is defined, it returns None

fit(X, y, geometries=None, crs=None, spatial_weights=None)

Trains the model using the training dataset.

Parameters:
  • X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables

  • y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame.

  • geometries – shapely array, default=None. Geometry data for each sample in X.

  • crs – pyproj.crs.CRS, default=None. Coordinate reference system. Only used when X is a numpy array. It is ignored when CRS information is available in X (i.e. a SpatialDataFrame or GeoDataFrame).

  • spatial_weights – SpatialWeights, default=None. A spatial weights matrix.

Returns:

self. Fitted estimator.

property k
Returns:

The number of variables for which coefficients are estimated (including the constant and rho)

property model_type
Returns:

The regression model defined

predict(X, geometries=None)

Estimates the target variable for the given data. If no model is defined, it returns None.

Parameters:
  • X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables

  • geometries – shapely array, default=None. Geometry data for each sample in X.

Returns:

The prediction of the target variable for the given data.

property predy
Returns:

An array with the predictions for the training data

score(X, y, sample_weight=None, geometries=None)

Returns the value of the regression score function or R-Squared. Represents the proportion of the variation in the dependent variable that is predictable from the independent variable(s). Best possible score is 1.0, and it can be negative, since the model can be arbitrarily worse. A constant model that always predicts the expected value of y, disregarding the input, would get an R-Squared score of 0.

Parameters:
  • X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables

  • y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame.

  • sample_weight – Weighted contribution to the score for each sample.

  • geometries – shapely array, default=None. Geometry data for each sample in X.

Returns:

Returns the value of the regression score function or R-Squared. Represents the proportion of the variation in the dependent variable that is predictable from the independent variable(s).

property summary
Returns:

The summary of the trained model

property u
Returns:

An array with the residuals of the trained model

class SpatialRegimesRegressor(spatial_weights_definition=None)

The regression equation parameters are estimated according to a categorical variable called regime; this categorical variable can represent different things, such as a region in a spatial context. Neighborhoods, such as district or block names, can be used to define regimes. The model consists of linear regression models where the terms of the linear equation vary depending on the regime.

Parameters:

spatial_weights_definition – SpatialWeightsDefinition, default=None. Specifies the spatial relationship among neighbors.

property betas
Returns:

An array with the estimated parameters of the trained model

property diagnostics
Returns:

A SpatialDiagnostics instance containing statistics of the trained model. If no model is defined, it returns None

fit(X, y, geometries=None, crs=None, spatial_weights=None, regimes=None)

The parameter regimes indicates the categorical variable used as regime. An OLS model is trained for each regime, obtaining a different set of parameters for each.

Parameters:
  • X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables

  • y – {pandas.DataFrame, numpy 1D array or string}. If specified as string, X is expected to be a DataFrame.

  • geometries – shapely array, default=None. Geometry data for each sample in X.

  • crs – pyproj.crs.CRS, default=None. Coordinate reference system. Only used when X is a numpy array. It is ignored when CRS information is available in X (i.e. a SpatialDataFrame or GeoDataFrame)

  • spatial_weights – SpatialWeights, default=None. A spatial weights matrix.

  • regimes – List, default=None. A list with n values with the mapping of each observation to a regime. The value of n must be equal to the number of rows in X.

Returns:

self. Fitted estimator.

property k
Returns:

The number of variables for which coefficients are estimated (including the constant, excluding lambda)

property model_type
Returns:

The regression model defined

predict(X, regimes=None, geometries=None, spatial_weights=None)

Evaluates the model using the given data and the regimes. If no model is defined returns None. It gets the corresponding set of parameters for the model according to the regimes and evaluates the model using the prediction data

Parameters:
  • X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables

  • regimes – List, default=None. A list with n values with the mapping of each observation to a regime. The value of n must be equal to the number of rows in X.

  • geometries – shapely array, default=None. Geometry data for each sample in X.

  • spatial_weights – SpatialWeights, default=None. A spatial weights matrix.

Returns:

The prediction of the target variable for the given data.

property predy
Returns:

An array with the predictions for the training data

score(X, y, sample_weight=None, regimes=None, geometries=None, spatial_weights=None)

Returns R-Squared metric. It gets the corresponding set of parameters for the model according to the regimes and estimates the target variable with the given data.

Parameters:
  • X – {numpy array, geopandas dataframe, vector dataframe} of shape (n_samples, n_features). Independent variables.

  • sample_weight – Weighted contribution to the score for each sample.

  • regimes – List, default=None. A list with n values with the mapping of each observation to a regime. The value of n must be equal to the number of rows in X.

  • geometries – shapely array, default=None. Geometry data for each sample in X.

  • spatial_weights – SpatialWeights, default=None. A spatial weights matrix.

Returns:

The R-squared metric for the given data.

property summary
Returns:

The summary of the trained model

property u
Returns:

An array with the residuals of the trained model