LISA Hotspot
Local Indicators of Spatial Association (LISAs) are widely used to identify geographical clusters as well as finding geographical outliers. This clustering approach is called LISA Hoptspot clustering.
Possible use cases for this spatial clustering include finding hot spots of crime to help police to make staffing and patrolling decisions, identifying patterns of car accidents or pedestrian deaths to help optimize arrangements of red lights and road networks.
The LISA Hotspot clustering algorithm does local autocorrelation analysis and summarizes the co-variation between observations and their immediate surroundings. It allows us to identify areas of high values (hot spots) and areas of low values (cold spots). For each region, there are four different labels representing each of the quadrants.
- HH (High-High). A High value surrounded by high values.
- LH (Low-High). A low value surrounded by high values.
- LL (Low-Low). A low value surrounded by low values.
- HL (High-Low). A high value around low values.
The LISA Hotspot clustering algorithm computes a local Moran’s I (that is, LISA) for each location.
- A location with a positive local Moran’s I statistic indicates the presence of neighbors with similar values (either high or low values), representing hot or cold spots.
- A location with a negative local Moran’s value indicates neighbor locations with different values; it can be a high value surrounded by low values or a low value surrounded by high values, representing spatial outliers.
The LISAHotspotClustering class implements the LISA Hotspot clustering,
and the following table describes its parameters.
| Parameters | Description |
|---|---|
column |
A number that indicates a column, with which the algorithm uses the data associated to run the LISA Hotspot clustering method. If the column is not defined, the algorithm expects a dataset with a single column. |
spatial_weights_definition |
Defines the relationship between neighboring locations. It is required to retrieve information from the neighbors. |
max_p_value |
Used to label regions with a p-value below this
threshold value. For regions with p-values equal to or greater than this
threshold value, the algorithm assigns the label
-1.
|
supported_quadrants |
A list indicating that only observations from these
quadrants are labeled. Values indicate quadrant location: 1
(High-High), 2 (Low-High), 3
(Low-Low), 4 (High-Low). The remaining
observations are assigned a label -1.
|
seed |
Ensures reproducibility of conditional randomization. |
n_jobs |
The maximum number of concurrently running jobs. |
Once an instance of LISAHotspotClustering is created, the clustering
algorithm is executed by calling the fit method. The label assigned to
each observation can be retrieved with the labels_ property. The
following table describes the main properties available from this class.
| Parameters | Description |
|---|---|
labels_ |
The label assigned to each observation. Labels indicate
quadrant location: 1 (High-High), 2
(Low-High), 3 (Low-Low), 4
(High-Low). Depending on the parameters passed to the
object, observations can have a label -1.
|
regions_ |
A dictionary representing observations with the same label that are geographically connected according to the spatial weights. |
Is |
The Local Moran’s I of each observation. |
ps |
The p-value of the Local Moran’s I of each observation. |
See the LISAHotspotClustering class in Python API Reference for Oracle Spatial AI for more information.
The following example executes a hot spot analysis over the
MEDIAN_INCOME column of the
block_groups
SpatialDataFrame. It does not require defining a target variable
when passing the training data to the fit method. The
labels_ property returns the label assigned to each observation,
with hot spots marked with 1, cold spots with 3, and
outliers represented by 2 and 4.
from oraclesai.weights import DistanceBandWeightsDefinition
from oraclesai.clustering import LISAHotspotClustering
# Define variables and CRS
X = block_groups[['MEDIAN_INCOME', 'MEAN_AGE', 'MEAN_EDUCATION_LEVEL', 'geometry']].to_crs('epsg:3857')
# Create an instance of LISAHotspotClustering
lisa_model = LISAHotspotClustering(column="MEDIAN_INCOME",
max_p_value=0.05,
spatial_weights_definition=DistanceBandWeightsDefinition(threshold=2500))
# Train the model
lisa_model.fit(X)
# Print the labels
print(f"labels = {lisa_model.labels_[:10]}")The program prints the labels of the first ten observations.
labels = [ 2 2 1 1 1 1 -1 -1 -1 -1]