Spatial Colocation Analysis
Spatial colocation measures and analyzes relationships between point features of different classes from the same spatial layer and, most often, from different spatial layers.
A typical example is determining whether different restaurants, such as McDonald's and Chipotle, are colocated. Further analysis is needed to identify whether McDonald's restaurants are colocated with metro stations and shopping centers and how they relate to population density and income levels. These help companies in site selection, site optimization, and also to minimize costs.
Colocation analysis is a tool that measures proximity patterns between two categories of point features, A and B, using the Local Colocation Quotient (LCLQ) statistic. For each feature of the Category of Interest (category A), it calculates its LCLQ score.
- Points of category A with a LCLQ score greater than one are more likely (than random) to have points of category B within their neighborhood.
- Points of category A with a LCLQ score less than one are less likely (than random) to have points of category B within their neighborhood.
- A point with a LCLQ score equal to one indicates that the proportion of categories within its neighborhood represents the proportion of the categories throughout the entire study area.
The LCLQ score indicates if a feature point is colocated, isolated, or undefined. The following table describes the possible scenarios.
LCLQ Type | Description |
---|---|
Colocated - Significant | The LCLQ score is greater than 1 with
either a p-value less than 0.05 or a p-value of
None .
|
Colocated – Not Significant | The LCLQ score is greater than 1 with a
p-value equal to or greater than 0.05 .
|
Isolated - Significant | The LCLQ score is equal to or less than
1 with either a p-value less than
0.05 or a p-value of None .
|
Isolated – Not Significant | The LCLQ score is equal to or less than
1 with a p-value equal to or greater than
0.05 .
|
Undefined | The feature point did not have any neighbors from the neighboring category. |
The colocation relationship is not symmetric. The LCLQ scores calculated when comparing category A to category B will be different than the LCLQ scores calculated when comparing category B to category A.
Some of the parameters required to execute colocation analysis are described in the following table.
Parameters | Description |
---|---|
feature_data |
Data with point features. |
spatial_weights_definition |
Defines the relationship between neighboring locations. It is necessary to retrieve information from the neighbors. |
interest_category |
Two values that indicates the field and value of the category of
interest. If the interest_category and
neighbor_category parameters are defined, then the
colocation analysis is executed using these values from the data
specified in feature_data .
|
neighbor_category |
Two values that indicates the field and value of the neighboring category. |
neighbor_feature_data |
If defined, ignores the interest_category and
neighbor_category parameters. The category of
interest is the point features in feature_data , while
the other category is the point features defined in this
parameter.
|
n_permutations |
The number of permutations used to calculate the significance level
of the colocation quotient scores. If None , then the
significance level is not computed, and None is
returned. Increasing the number of permutations also increases the
processing time.
|
is_time_window_analysis |
A Boolean parameter indicating if time-window analysis is required. |
interest_time_window |
Indicates the field, start-time , and
end-time of the category of interest.
|
neighbor_time_widow |
Indicates the field, start-time , and
end-time of the neighboring category.
|
The result of the colocation analysis consists of a Pandas DataFrame
with the following columns:
Column | Description |
---|---|
row_index |
The observation’s index in the DataFrame containing
the category of interest.
|
lclq |
The LCLQ score. |
t_stat |
The t-statistic associated with the LCLQ score. |
p_value |
The p-value associated with the LCLQ score, indicating the significance level of the LCLQ score. |
lclq_type |
The colocation type (as described in the earlier table). |
See the spatial_colocation_analysis function in Python API Reference for Oracle Spatial AI for more information.
The following example uses the schools
SpatialDataFrame
and splits the data into two instances of
SpatialDataFrame
- X
and Y
, which
represent two different classes and then executes a colocation analysis between the two
classes. It requires to define the spatial weights since colocation analysis uses
neighboring locations to calculate the LCLQ scores.
from oraclesai.weights import KernelBasedWeightsDefinition
from oraclesai.preprocessing import spatial_train_test_split
from oraclesai.analysis import spatial_colocation_analysis
# Split the data to create two different classes.
X, Y, _, _, _, _ = spatial_train_test_split(schools, y="", test_size=0.3, random_state=32)
# Define spatial weights
spatial_weights_definition = KernelBasedWeightsDefinition(k=25, fixed=False, function="gaussian")
# Execute colocation analysis between the two classes
colocation_analysis = spatial_colocation_analysis(X, spatial_weights_definition, neighbor_feature_data=Y, n_permutations=20)
# Print the result
print(colocation_analysis[:10])
The preceding code prints the results of the colocation analysis for the first ten observations in X, which contains the LCLQ score and the significance level.
row_index lclq t_stat p_value lclq_type
0 2 1.054835 NaN NaN COLOCATED_SIGNIFICANT
1 3 0.948375 4.358899 0.000338 ISOLATED_SIGNIFICANT
2 4 0.944819 4.358899 0.000338 ISOLATED_SIGNIFICANT
3 6 1.113996 4.358899 0.000338 COLOCATED_SIGNIFICANT
4 7 1.013247 NaN NaN COLOCATED_SIGNIFICANT
5 9 0.999595 -4.358899 0.000338 ISOLATED_SIGNIFICANT
6 10 1.078993 -4.358899 0.000338 COLOCATED_SIGNIFICANT
7 11 1.012721 NaN NaN COLOCATED_SIGNIFICANT
8 14 0.978001 4.358899 0.000338 ISOLATED_SIGNIFICANT
9 20 0.961042 -4.358899 0.000338 ISOLATED_SIGNIFICANT