Expectation Maximization for Anomaly Detection

EM identifies anomalies based on probability density, ensuring accurate anomaly detection for better data integrity.

An object is identified as an outlier in an EM Anomaly model if its anomaly probability is greater than 0.5. A label of 1 denotes normal, while a label of 0 denotes anomaly. The EM technique models the underlying data distribution of a data set, and the probability density of a data record is translated into an anomaly probability.

The following example displays the code snippet used for anomaly detection using the Expectation Maximization algorithm. Specify the EMCS_OUTLIER_RATE setting to capture the desired rate of outliers in the training data set.


-- SET OUTLIER RATE IN SETTINGS TABLE - DEFAULT IS 0.05
--

BEGIN DBMS_DATA_MINING.DROP_MODEL('CUSTOMERS360MODEL_AD');
EXCEPTION WHEN OTHERS THEN NULL; END;
/
DECLARE
  v_setlst DBMS_DATA_MINING.SETTING_LIST;
BEGIN
  v_setlst('ALGO_NAME')         := 'ALGO_EXPECTATION_MAXIMIZATION';
  v_setlst('PREP_AUTO')         := 'ON';
  v_setlst('EMCS_OUTLIER_RATE') := '0.1';
        
  DBMS_DATA_MINING.CREATE_MODEL2(
        MODEL_NAME          => 'CUSTOMERS360MODEL_AD',
        MINING_FUNCTION     => 'CLASSIFICATION',
        DATA_QUERY          => 'SELECT * FROM CUSTOMERS360_V',
        CASE_ID_COLUMN_NAME => 'CUST_ID',
        SET_LIST            => v_setlst,
        TARGET_COLUMN_NAME  => NULL); -- NULL target indicates anomaly detection      
END;
/