Spatial Colocation Analysis

Spatial colocation measures and analyzes relationships between point features of different classes from the same spatial layer and, most often, from different spatial layers.

A typical example is determining whether different restaurants, such as McDonald's and Chipotle, are colocated. Further analysis is needed to identify whether McDonald's restaurants are colocated with metro stations and shopping centers and how they relate to population density and income levels. These help companies in site selection, site optimization, and also to minimize costs.

Colocation analysis is a tool that measures proximity patterns between two categories of point features, A and B, using the Local Colocation Quotient (LCLQ) statistic. For each feature of the Category of Interest (category A), it calculates its LCLQ score.

  • Points of category A with a LCLQ score greater than one are more likely (than random) to have points of category B within their neighborhood.
  • Points of category A with a LCLQ score less than one are less likely (than random) to have points of category B within their neighborhood.
  • A point with a LCLQ score equal to one indicates that the proportion of categories within its neighborhood represents the proportion of the categories throughout the entire study area.

The LCLQ score indicates if a feature point is colocated, isolated, or undefined. The following table describes the possible scenarios.

LCLQ Type Description
Colocated - Significant The LCLQ score is greater than 1 with either a p-value less than 0.05 or a p-value of None.
Colocated – Not Significant The LCLQ score is greater than 1 with a p-value equal to or greater than 0.05.
Isolated - Significant The LCLQ score is equal to or less than 1 with either a p-value less than 0.05 or a p-value of None.
Isolated – Not Significant The LCLQ score is equal to or less than 1 with a p-value equal to or greater than 0.05.
Undefined The feature point did not have any neighbors from the neighboring category.

The colocation relationship is not symmetric. The LCLQ scores calculated when comparing category A to category B will be different than the LCLQ scores calculated when comparing category B to category A.

Some of the parameters required to execute colocation analysis are described in the following table.

Parameters Description
feature_data Data with point features.
spatial_weights_definition Defines the relationship between neighboring locations. It is necessary to retrieve information from the neighbors.
interest_category Two values that indicates the field and value of the category of interest. If the interest_category and neighbor_category parameters are defined, then the colocation analysis is executed using these values from the data specified in feature_data.
neighbor_category Two values that indicates the field and value of the neighboring category.
neighbor_feature_data If defined, ignores the interest_category and neighbor_category parameters. The category of interest is the point features in feature_data, while the other category is the point features defined in this parameter.
n_permutations The number of permutations used to calculate the significance level of the colocation quotient scores. If None, then the significance level is not computed, and None is returned. Increasing the number of permutations also increases the processing time.
is_time_window_analysis A Boolean parameter indicating if time-window analysis is required.
interest_time_window Indicates the field, start-time, and end-time of the category of interest.
neighbor_time_widow Indicates the field, start-time, and end-time of the neighboring category.

The result of the colocation analysis consists of a Pandas DataFrame with the following columns:

Column Description
row_index The observation’s index in the DataFrame containing the category of interest.
lclq The LCLQ score.
t_stat The t-statistic associated with the LCLQ score.
p_value The p-value associated with the LCLQ score, indicating the significance level of the LCLQ score.
lclq_type The colocation type (as described in the earlier table).

See the spatial_colocation_analysis function in Python API Reference for Oracle Spatial AI for more information.

The following example uses the schools SpatialDataFrame and splits the data into two instances of SpatialDataFrame - X and Y, which represent two different classes and then executes a colocation analysis between the two classes. It requires to define the spatial weights since colocation analysis uses neighboring locations to calculate the LCLQ scores.

from oraclesai.weights import KernelBasedWeightsDefinition
from oraclesai.preprocessing import spatial_train_test_split
from oraclesai.analysis import spatial_colocation_analysis

# Split the data to create two different classes.
X, Y, _, _, _, _ = spatial_train_test_split(schools, y="", test_size=0.3, random_state=32)

# Define spatial weights
spatial_weights_definition = KernelBasedWeightsDefinition(k=25, fixed=False, function="gaussian")

# Execute colocation analysis between the two classes
colocation_analysis = spatial_colocation_analysis(X, spatial_weights_definition, neighbor_feature_data=Y, n_permutations=20)

# Print the result
print(colocation_analysis[:10])

The preceding code prints the results of the colocation analysis for the first ten observations in X, which contains the LCLQ score and the significance level.

   row_index      lclq    t_stat   p_value              lclq_type
0          2  1.054835       NaN       NaN  COLOCATED_SIGNIFICANT
1          3  0.948375  4.358899  0.000338   ISOLATED_SIGNIFICANT
2          4  0.944819  4.358899  0.000338   ISOLATED_SIGNIFICANT
3          6  1.113996  4.358899  0.000338  COLOCATED_SIGNIFICANT
4          7  1.013247       NaN       NaN  COLOCATED_SIGNIFICANT
5          9  0.999595 -4.358899  0.000338   ISOLATED_SIGNIFICANT
6         10  1.078993 -4.358899  0.000338  COLOCATED_SIGNIFICANT
7         11  1.012721       NaN       NaN  COLOCATED_SIGNIFICANT
8         14  0.978001  4.358899  0.000338   ISOLATED_SIGNIFICANT
9         20  0.961042 -4.358899  0.000338   ISOLATED_SIGNIFICANT