Spatial Colocation Analysis

Spatial colocation measures and analyzes relationships between point features of different classes from the same spatial layer and, most often, from different spatial layers.

A typical example is determining whether different restaurants, such as McDonald's and Chipotle, are colocated. Further analysis is needed to identify whether McDonald's restaurants are colocated with metro stations and shopping centers and how they relate to population density and income levels. These help companies in site selection, site optimization, and also to minimize costs.

Colocation analysis is a tool that measures proximity patterns between two categories of point features, A and B, using the Local Colocation Quotient (LCLQ) statistic. For each feature of the Category of Interest (category A), it calculates its LCLQ score.

Points of category A with a LCLQ score greater than one are more likely (than random) to have points of category B within their neighborhood.
Points of category A with a LCLQ score less than one are less likely (than random) to have points of category B within their neighborhood.
A point with a LCLQ score equal to one indicates that the proportion of categories within its neighborhood represents the proportion of the categories throughout the entire study area.

The LCLQ score indicates if a feature point is colocated, isolated, or undefined. The following table describes the possible scenarios.

LCLQ Type	Description
Colocated - Significant	The LCLQ score is greater than `1` with either a p-value less than `0.05` or a p-value of `None`.
Colocated – Not Significant	The LCLQ score is greater than `1` with a p-value equal to or greater than `0.05`.
Isolated - Significant	The LCLQ score is equal to or less than `1` with either a p-value less than `0.05` or a p-value of `None`.
Isolated – Not Significant	The LCLQ score is equal to or less than `1` with a p-value equal to or greater than `0.05`.
Undefined	The feature point did not have any neighbors from the neighboring category.

The colocation relationship is not symmetric. The LCLQ scores calculated when comparing category A to category B will be different than the LCLQ scores calculated when comparing category B to category A.

Some of the parameters required to execute colocation analysis are described in the following table.

Parameters	Description
`feature_data`	Data with point features.
`spatial_weights_definition`	Defines the relationship between neighboring locations. It is necessary to retrieve information from the neighbors.
`interest_category`	Two values that indicates the field and value of the category of interest. If the `interest_category` and `neighbor_category` parameters are defined, then the colocation analysis is executed using these values from the data specified in `feature_data`.
`neighbor_category`	Two values that indicates the field and value of the neighboring category.
`neighbor_feature_data`	If defined, ignores the `interest_category` and `neighbor_category` parameters. The category of interest is the point features in `feature_data`, while the other category is the point features defined in this parameter.
`n_permutations`	The number of permutations used to calculate the significance level of the colocation quotient scores. If `None`, then the significance level is not computed, and `None` is returned. Increasing the number of permutations also increases the processing time.
`is_time_window_analysis`	A Boolean parameter indicating if time-window analysis is required.
`interest_time_window`	Indicates the field, `start-time`, and `end-time` of the category of interest.
`neighbor_time_widow`	Indicates the field, `start-time`, and `end-time` of the neighboring category.

The result of the colocation analysis consists of a Pandas DataFrame with the following columns:

Column	Description
`row_index`	The observation’s index in the `DataFrame` containing the category of interest.
`lclq`	The LCLQ score.
`t_stat`	The t-statistic associated with the LCLQ score.
`p_value`	The p-value associated with the LCLQ score, indicating the significance level of the LCLQ score.
`lclq_type`	The colocation type (as described in the earlier table).

See the spatial_colocation_analysis function in Python API Reference for Oracle Spatial AI for more information.

The following example uses the schools SpatialDataFrame and splits the data into two instances of SpatialDataFrame - X and Y, which represent two different classes and then executes a colocation analysis between the two classes. It requires to define the spatial weights since colocation analysis uses neighboring locations to calculate the LCLQ scores.

from oraclesai.weights import KernelBasedWeightsDefinition
from oraclesai.preprocessing import spatial_train_test_split
from oraclesai.analysis import spatial_colocation_analysis

# Split the data to create two different classes.
X, Y, _, _, _, _ = spatial_train_test_split(schools, y="", test_size=0.3, random_state=32)

# Define spatial weights
spatial_weights_definition = KernelBasedWeightsDefinition(k=25, fixed=False, function="gaussian")

# Execute colocation analysis between the two classes
colocation_analysis = spatial_colocation_analysis(X, spatial_weights_definition, neighbor_feature_data=Y, n_permutations=20)

# Print the result
print(colocation_analysis[:10])

The preceding code prints the results of the colocation analysis for the first ten observations in X, which contains the LCLQ score and the significance level.

   row_index      lclq    t_stat   p_value              lclq_type
0          2  1.054835       NaN       NaN  COLOCATED_SIGNIFICANT
1          3  0.948375  4.358899  0.000338   ISOLATED_SIGNIFICANT
2          4  0.944819  4.358899  0.000338   ISOLATED_SIGNIFICANT
3          6  1.113996  4.358899  0.000338  COLOCATED_SIGNIFICANT
4          7  1.013247       NaN       NaN  COLOCATED_SIGNIFICANT
5          9  0.999595 -4.358899  0.000338   ISOLATED_SIGNIFICANT
6         10  1.078993 -4.358899  0.000338  COLOCATED_SIGNIFICANT
7         11  1.012721       NaN       NaN  COLOCATED_SIGNIFICANT
8         14  0.978001  4.358899  0.000338   ISOLATED_SIGNIFICANT
9         20  0.961042 -4.358899  0.000338   ISOLATED_SIGNIFICANT