Local Outlier Factor

Local Outlier Factor (LOF) measures the LOF score for each observation, representing the local deviation of the density of that observation concerning its neighbors.

The LOF score depends on how isolated an observation is with respect to the surrounding neighborhood. The larger the LOF score, the more isolated is the observation.

Using the k-nearest neighbors, the algorithm compares the local density of a sample to the local densities of its neighbors. Those samples with a significantly lower density than their neighbors are considered outliers.

In a spatial context, the LOF score helps to identify geographically isolated samples. For example, analyzing locations with a high concentration of car accidents and labeling isolated accident locations as outliers, which can be targeted for further examination.

The LOF method (see [2] for more details on the LOF method) consists of the following steps:

  1. Define a method to measure the distance between two observations - according to either features or geography. If spatial weights are defined, then the distance comes from the corresponding weight, except for binary weights, where the distance is calculated based on the geometries.
  2. Define a method that computes the k-distance of an observation. The k-distance is the distance to the furthermost neighbor or the K-th neighbor from KNN. If spatial weights are defined, then the neighboring observations are obtained according to the spatial weights, and the distance between two observations comes from the corresponding weight, except for binary weights, where the distance is calculated based on their geometries.
  3. Define a method to compute the reachability-distance between two observations.


    Description of compute_reachability_dist.png follows
    Description of the illustration compute_reachability_dist.png

  4. Compute the Local Reachibility Distance (LRD) of each observation according to the following formula, where Nbi presents the neighbors of the i-th observation. If spatial weights are defined, then the neighbors of the i-th observation come from the spatial weights. Otherwise, the algorithm uses the k-nearest neighbors method.


    Description of local_reachability_distance.png follows
    Description of the illustration local_reachability_distance.png

  5. Compute the LOF score of each observation.


    Description of lof_score.png follows
    Description of the illustration lof_score.png

Using LocalOutlierFactor for novelty detection requires setting the novelty parameter to True and calling the predict method to identify whether the unseen or new data are outliers.

See the LocalOutlierFactor class in Python API Reference for Oracle Spatial AI for more information.

The following example uses a dataset based on the report of accidents in a city. It contains the location and severity of the car accidents. The example first creates an instance of SpatialDataFrame based on a database table.

from oraclesai import SpatialDataFrame, DBSpatialDataset
import oml

accidents_pdf = SpatialDataFrame.create(DBSpatialDataset(table='chicago_accidents', schema='oml_user'))

The goal is to identify outliers in accidents where people get injured based on location. The dataset contains a categorical variable, INJURY_RATING, that indicates the severity of the car accident. The example focuses on accidents with INJURY_RATING greater than or equal to 3.

The following code uses the LocalOutlierFactor to calculate the LOF score of each observation based on the neighboring locations according to the spatial_weights_definition parameter.

from oraclesai.weights import KNNWeightsDefinition
from oraclesai.outliers import LocalOutlierFactor

# Get records with INJURY_RATING >= 3
accidents_injury_pdf = accidents_pdf[accidents_pdf["INJURY_RATING"] >= 3]

# Keep columns INJURY_RATING and geometry, and use a geodetic coordinate system
X = accidents_injury_pdf[["INJURY_RATING", "geometry"]].to_crs("epsg:3857")

# Create an instance of the LOF model defining the spatial weights
slof_model = LocalOutlierFactor(spatial_weights_definition=KNNWeightsDefinition(k=20))

# Train the model
slof_model.fit(X)

# Get and print the LOF scores for each observation
slof_scores = -1 * slof_model.negative_outlier_factor_
print(slof_scores[:10])

The program prints the LOF score of the first 10 observations. Note that the negative_outlier_factor_ property returns the negative LOF score.

[1.12403072 1.49269168 1.18196622 1.23728049 0.89957071 1.00487086
 1.03445893 0.98740889 1.01636585 1.00944292]