LISA Hotspot

Local Indicators of Spatial Association (LISAs) are widely used to identify geographical clusters as well as finding geographical outliers. This clustering approach is called LISA Hoptspot clustering.

Possible use cases for this spatial clustering include finding hot spots of crime to help police to make staffing and patrolling decisions, identifying patterns of car accidents or pedestrian deaths to help optimize arrangements of red lights and road networks.

The LISA Hotspot clustering algorithm does local autocorrelation analysis and summarizes the co-variation between observations and their immediate surroundings. It allows us to identify areas of high values (hot spots) and areas of low values (cold spots). For each region, there are four different labels representing each of the quadrants.

  1. HH (High-High). A High value surrounded by high values.
  2. LH (Low-High). A low value surrounded by high values.
  3. LL (Low-Low). A low value surrounded by low values.
  4. HL (High-Low). A high value around low values.

The LISA Hotspot clustering algorithm computes a local Moran’s I (that is, LISA) for each location.

  • A location with a positive local Moran’s I statistic indicates the presence of neighbors with similar values (either high or low values), representing hot or cold spots.
  • A location with a negative local Moran’s value indicates neighbor locations with different values; it can be a high value surrounded by low values or a low value surrounded by high values, representing spatial outliers.

The LISAHotspotClustering class implements the LISA Hotspot clustering, and the following table describes its parameters.

Parameters Description
column A number that indicates a column, with which the algorithm uses the data associated to run the LISA Hotspot clustering method. If the column is not defined, the algorithm expects a dataset with a single column.
spatial_weights_definition Defines the relationship between neighboring locations. It is required to retrieve information from the neighbors.
max_p_value Used to label regions with a p-value below this threshold value. For regions with p-values equal to or greater than this threshold value, the algorithm assigns the label -1.
supported_quadrants A list indicating that only observations from these quadrants are labeled. Values indicate quadrant location: 1 (High-High), 2 (Low-High), 3 (Low-Low), 4 (High-Low). The remaining observations are assigned a label -1.
seed Ensures reproducibility of conditional randomization.
n_jobs The maximum number of concurrently running jobs.

Once an instance of LISAHotspotClustering is created, the clustering algorithm is executed by calling the fit method. The label assigned to each observation can be retrieved with the labels_ property. The following table describes the main properties available from this class.

Parameters Description
labels_ The label assigned to each observation. Labels indicate quadrant location: 1 (High-High), 2 (Low-High), 3 (Low-Low), 4 (High-Low). Depending on the parameters passed to the object, observations can have a label -1.
regions_ A dictionary representing observations with the same label that are geographically connected according to the spatial weights.
Is The Local Moran’s I of each observation.
ps The p-value of the Local Moran’s I of each observation.

See the LISAHotspotClustering class in Python API Reference for Oracle Spatial AI for more information.

The following example executes a hot spot analysis over the MEDIAN_INCOME column of the block_groups SpatialDataFrame. It does not require defining a target variable when passing the training data to the fit method. The labels_ property returns the label assigned to each observation, with hot spots marked with 1, cold spots with 3, and outliers represented by 2 and 4.

from oraclesai.weights import DistanceBandWeightsDefinition 
from oraclesai.clustering import LISAHotspotClustering 

# Define variables and CRS 
X = block_groups[['MEDIAN_INCOME', 'MEAN_AGE', 'MEAN_EDUCATION_LEVEL', 'geometry']].to_crs('epsg:3857') 

# Create an instance of LISAHotspotClustering
lisa_model = LISAHotspotClustering(column="MEDIAN_INCOME", 
max_p_value=0.05, 
spatial_weights_definition=DistanceBandWeightsDefinition(threshold=2500)) 

# Train the model
lisa_model.fit(X) 

# Print the labels
print(f"labels = {lisa_model.labels_[:10]}")

The program prints the labels of the first ten observations.

labels = [ 2  2  1  1  1  1 -1 -1 -1 -1]