Deletions

You apply deletions during subsequent loads to reflect removals from the source (tracked or audited) or to clean up as explicitly instructed.

Deletion propagation includes:
  • Implicit Propagation: Deletions flow through the lineage to downstream datasets.
  • Explicit Prevention: Deletions are restricted from propagating.

Track Upstream Removals

TRACKDELETES: Automatically detects removals by comparing the current dataset with the original upstream copy.

Syntax:

See IMPORT Statement.

You can track removals in:
  • The imported source itself (the default behavior).
  • A named track set, that's tracked separately with a unique name.
Example:
IMPORT SOURCE SALES WITH TRACKDELETES
IMPORT SOURCE SALES WITH TRACKDELETES IN [SALESREMOVALS] AS SALES_DEL1 // Named Track Set

Note:

If the upstream system already tracks or audits deletions, you can leverage that audit to improve performance and reduce the cost of comparing and tracking.

Hard Deletions

You can permanently delete records from data warehouse datasets through propagation or explicit application.

You can apply deletions:
  • With Tracked Imported Source (default behavior): TRACKDELETES
    Example: Propagate automatically in SALES_F :
    IMPORT SOURCE SALES WITH TRACKDELETES
    DEFINE DATASET SALES_F FROM SALES END
    
  • Using a Named Tracked Set
    You can apply this to both the imported source and the dataset by using:
    • Then Delete: Explicit directive that you can apply to an Imported Source in IMPORT SOURCE statement.

      See IMPORT Statement.

      Example:
      IMPORT SOURCE DEL_SALES //delete records
      //Clean up SALES with DEL_SALES
      IMPORT SOURCE SALES THEN DELETE [DEL_SALES [SALES_ID] MATCHING [SALES_ID]] 
      
      // Delete for keys in DEL_SALES are automatically propagated to NET_SALES_F
      DEFINE DATASET NET_SALES_F FROM SALES END 
    • Delete Source: Explicit deletion that you can apply to a dataset in the Dataset Definition.

      The delete set (substraend) can be a delete named track set or can be any dataset.

      Syntax: See Generic Dataset Definition.

      Note:

      DELETESOURCE overrides all propagation for that dataset.

      Example: Subtraend from TRACKDELETES:

      IMPORT SOURCE SALES WITH TRACKDELETES IN [REMOVALS]
      DEFINE DATASET SALES_F
         ROWSOURCE SALES;
         THIS = SALES;
         DELETESOURCE REMOVALS [SALES_ID] MATCHING [SALES_ID]
      END
      

      Example: Subtraend from imported source, using DELETESOURCE in the dataset:

      IMPORT SOURCE SALES_REMOVALS //Audited from upstream, no tracking
      IMPORT SOURCE SALES
      DEFINE DATASET SALES_F
         ROWSOURCE SALES;
         THIS = SALES;
         DELETESOURCE SALES_REMOVALS [SALES_ID] MATCHING [SALES_ID];
      END
      

Soft Deletions

Instead of being physically removed, Data Augmentation Scripts flags removal records as deleted. The flag name defaults to ISDELETED if you don't specify otherwise.

Note:

  • You must filter out soft-deleted records when necessary from queries and downstream datasets.
  • Soft deletes retain data but require efficient query filtering to maintain system efficiency.

Syntax: See IMPORT Statement.

Similar to hard deletes, soft deletes utilize TRACKDELETE, THEN DELETE, and DELETESOURCE.

  • TRACKDELETE and Soft Delete

    The following are examples of soft delete on importing.

    Example: flag name not specified (defaulted) with TRACKDELETES.
    IMPORT SOURCE CUSTOMERS DELETETYPE[ SOFT ] WITH PRIMARYKEY[CUST_ID] TRACKDELETES
    DEFINE UPDATEABLE DATASET CUSTOMERS_SD_D FROM CUSTOMERS END
    
    Example: flag name provided
    IMPORT SOURCE SALES DELETETYPE[ SOFT[ISTRANDELETED] ] TRACKDELETES[ IN [THDELETE] ] 
    DEFINE UPDATEABLE DATASET CUSTOMERS_SD_D FROM CUSTOMERS END
    
  • THEN DELETE and Soft Delete
    Example: Soft delete on importing, using other imported data.
    IMPORT SOURCE SALESDEL
    IMPORT SOURCE SALES DELETETYPE[ SOFT ] THEN DELETE [ SALESDEL [SALES_ID] MATCHING [SALES_ID] ]
    DEFINE DATASET SALES_SD_F FROM SALES END
    
  • DELETESOURCE and Soft Delete

    Syntax: See Generic Dataset Definition.

    Example: Soft delete on dataset.
    IMPORT SOURCE SALESDEL
    IMPORT SOURCE SALES
    DEFINE UPDATEABLE DATASET SALES_F
        ROWSOURCE SALES;
        THIS = SALES;
        DELETETYPE[ SOFT[SALESDELETED] ];
        DELETESOURCE SALESDEL[SALE_ID] MATCHING [SALE_ID];
    END
    

Propagation Control

Propagation control consists of:

  • DISABLE DELETEPROPAGATION

    You can use disable DELETEPROPAGATION to prevent deletions from cascading to downstream datasets.

    • Default Behavior: By default, Data Augmentation Scripts applies removals to downstream datasets, unless you define propagation.
    • Disabled Propagation: Data Augmentation Scripts doesn't propagate deletions automatically, but still applies explicit deletions that use DELETESOURCE.
    Syntax:
    disable_delete_propagation ::= DISABLE DELETEPROPAGATION FOR { ALL DATASETS | DATASETS '[' table_name ']' }
    
    Example: Disable delete propagation for all datasets.
    DISABLE DELETEPROPAGATION FOR ALL DATASETS
    
    Example: Disable delete propagation for a product dimension dataset.
    DISABLE DELETEPROPAGATION FOR DATASETS[PRODUCTS_D]
    
  • REFRESH ON DELETES

    You can controls how deletions impact the dataset.

    Syntax: See Generic Dataset Definition.

    Example: In the following example, for incremental runs, only Sales deletes are considered.
    IMPORT SOURCE SALES WITH TRACKDELETES
    IMPORT SOURCE PRODUCTS
    DEFINE DATASET SALES_F
       ROWSOURCE PRODUCTS  INNER JOIN SALES  ON SALES.PROD_ID = PRODUCTS.PROD_ID;
       THIS = SALES;
       REFRESH ON DELETES IN[SALES];
       REFRESH ON UPSERTS IN[PRODUCTS]
    END
    
  • STABILITYPERIOD

    You can restrict processing data changes within a defined timeframe (or sliding window) since the previous load. You can apply the restriction to either the record’s Initial Extract Date (IED) or Last Updated Date (LUD) of upstream data.

    Syntax: See IMPORT Statement.

    The following is the change tracking behavior based on Last Updated Date (LUD):
    • If you haven't identified a LUD for the source, only changes to records created in the last n days at extract time are tracked.
    • If you've identified a LUD defined for the source, both changes since the last extract and changes to records created in the last n days at extract time are tracked.
    The following example specifies the Initial Extract Date- (IED) but not a Last Updated Date (LUD) :
    IMPORT SOURCE CUSTOMERS WITH IED [CUST_EFF_FROM] STABILITYPERIOD[30, CUST_EFF_FROM] TRACKDELETES
    DEFINE DATASET CUSTOMERS_D
       ROWSOURCE CUSTOMERS;
       THIS = CUSTOMERS;
    END
    
    The example code specifies that only changes (including deletions) in SALES from the last 30 days be processed through propagation in each load.

    Note:

    Even when you don't explicitly define the Last Updated Date (LUD) column in an IMPORT definition, if an incremental key is defined in the source metadata, Data Augmentation Scripts automatically uses it as the Last Updated Date (LUD). To override this behavior, you can use LUD[NULL].
    The following example ignores the Last Updated Date (LUD):
    IMPORT SOURCE CUSTOMERS WITH IED [CUST_EFF_FROM] LUD[null] STABILITYPERIOD[30, CUST_EFF_FROM] TRACKDELETES
    

Choose the Best Deletion Strategy

The following table maps a deletion scenario with the corresponding recommended deletion method:
Scenario Recommended Deletion Method
Data must be completely removed. Hard delete (THEN DELETE, DELETESOURCE, TRACKDELETES)
Need to retain deleted records for historical tracking. Soft delete (ISDELETED flag)
Source system doesn't track deletions. TRACKDELETES
Deletions must be explicitly provided by a named dataset. DELETESOURCE
Avoid cascading deletions in downstream datasets. DISABLE DELETEPROPAGATION
Optimize performance by applying necessary deletions on imported sources THEN DELETE