Deletions
You apply deletions during subsequent loads to reflect removals from the source (tracked or audited) or to clean up as explicitly instructed.
- Implicit Propagation: Deletions flow through the lineage to downstream datasets.
- Explicit Prevention: Deletions are restricted from propagating.
Track Upstream Removals
TRACKDELETES: Automatically detects removals by comparing the current dataset with the original upstream copy.
Syntax:
See IMPORT Statement.
- The imported source itself (the default behavior).
- A named track set, that's tracked separately with a unique name.
IMPORT SOURCE SALES WITH TRACKDELETES
IMPORT SOURCE SALES WITH TRACKDELETES IN [SALESREMOVALS] AS SALES_DEL1 // Named Track SetNote:
If the upstream system already tracks or audits deletions, you can leverage that audit to improve performance and reduce the cost of comparing and tracking.Hard Deletions
You can permanently delete records from data warehouse datasets through propagation or explicit application.
- With Tracked Imported Source (default behavior): TRACKDELETES Example: Propagate automatically in
SALES_F:IMPORT SOURCE SALES WITH TRACKDELETES DEFINE DATASET SALES_F FROM SALES END - Using a Named Tracked SetYou can apply this to both the imported source and the dataset by using:
- Then Delete: Explicit directive that you can apply to an Imported Source in IMPORT SOURCE statement.
See IMPORT Statement.
Example:IMPORT SOURCE DEL_SALES //delete records //Clean up SALES with DEL_SALES IMPORT SOURCE SALES THEN DELETE [DEL_SALES [SALES_ID] MATCHING [SALES_ID]] // Delete for keys in DEL_SALES are automatically propagated to NET_SALES_F DEFINE DATASET NET_SALES_F FROM SALES END - Delete Source: Explicit deletion that you can apply to a dataset in the Dataset Definition.
The delete set (substraend) can be a delete named track set or can be any dataset.
Syntax: See Generic Dataset Definition.
Note:
DELETESOURCE overrides all propagation for that dataset.Example: Subtraend from TRACKDELETES:
IMPORT SOURCE SALES WITH TRACKDELETES IN [REMOVALS] DEFINE DATASET SALES_F ROWSOURCE SALES; THIS = SALES; DELETESOURCE REMOVALS [SALES_ID] MATCHING [SALES_ID] ENDExample: Subtraend from imported source, using DELETESOURCE in the dataset:
IMPORT SOURCE SALES_REMOVALS //Audited from upstream, no tracking IMPORT SOURCE SALES DEFINE DATASET SALES_F ROWSOURCE SALES; THIS = SALES; DELETESOURCE SALES_REMOVALS [SALES_ID] MATCHING [SALES_ID]; END
- Then Delete: Explicit directive that you can apply to an Imported Source in IMPORT SOURCE statement.
Soft Deletions
Instead of being physically removed, Data Augmentation Scripts flags removal records as deleted. The flag name defaults to ISDELETED if you don't specify otherwise.
Note:
- You must filter out soft-deleted records when necessary from queries and downstream datasets.
- Soft deletes retain data but require efficient query filtering to maintain system efficiency.
Syntax: See IMPORT Statement.
Similar to hard deletes, soft deletes utilize TRACKDELETE, THEN DELETE, and DELETESOURCE.
- TRACKDELETE and Soft Delete
The following are examples of soft delete on importing.
Example: flag name not specified (defaulted) withTRACKDELETES.IMPORT SOURCE CUSTOMERS DELETETYPE[ SOFT ] WITH PRIMARYKEY[CUST_ID] TRACKDELETES DEFINE UPDATEABLE DATASET CUSTOMERS_SD_D FROM CUSTOMERS ENDExample: flag name providedIMPORT SOURCE SALES DELETETYPE[ SOFT[ISTRANDELETED] ] TRACKDELETES[ IN [THDELETE] ] DEFINE UPDATEABLE DATASET CUSTOMERS_SD_D FROM CUSTOMERS END - THEN DELETE and Soft Delete Example: Soft delete on importing, using other imported data.
IMPORT SOURCE SALESDEL IMPORT SOURCE SALES DELETETYPE[ SOFT ] THEN DELETE [ SALESDEL [SALES_ID] MATCHING [SALES_ID] ] DEFINE DATASET SALES_SD_F FROM SALES END - DELETESOURCE and Soft Delete
Syntax: See Generic Dataset Definition.
Example: Soft delete on dataset.IMPORT SOURCE SALESDEL IMPORT SOURCE SALES DEFINE UPDATEABLE DATASET SALES_F ROWSOURCE SALES; THIS = SALES; DELETETYPE[ SOFT[SALESDELETED] ]; DELETESOURCE SALESDEL[SALE_ID] MATCHING [SALE_ID]; END
Propagation Control
Propagation control consists of:
- DISABLE DELETEPROPAGATION
You can use disable
DELETEPROPAGATIONto prevent deletions from cascading to downstream datasets.- Default Behavior: By default, Data Augmentation Scripts applies removals to downstream datasets, unless you define propagation.
- Disabled Propagation: Data Augmentation Scripts doesn't propagate deletions automatically, but still applies explicit deletions that use
DELETESOURCE.
Syntax:disable_delete_propagation ::= DISABLE DELETEPROPAGATION FOR { ALL DATASETS | DATASETS '[' table_name ']' }Example: Disable delete propagation for all datasets.DISABLE DELETEPROPAGATION FOR ALL DATASETSExample: Disable delete propagation for a product dimension dataset.DISABLE DELETEPROPAGATION FOR DATASETS[PRODUCTS_D] - REFRESH ON DELETES
You can controls how deletions impact the dataset.
Syntax: See Generic Dataset Definition.
Example: In the following example, for incremental runs, only Sales deletes are considered.IMPORT SOURCE SALES WITH TRACKDELETES IMPORT SOURCE PRODUCTS DEFINE DATASET SALES_F ROWSOURCE PRODUCTS INNER JOIN SALES ON SALES.PROD_ID = PRODUCTS.PROD_ID; THIS = SALES; REFRESH ON DELETES IN[SALES]; REFRESH ON UPSERTS IN[PRODUCTS] END - STABILITYPERIOD
You can restrict processing data changes within a defined timeframe (or sliding window) since the previous load. You can apply the restriction to either the record’s Initial Extract Date (IED) or Last Updated Date (LUD) of upstream data.
Syntax: See IMPORT Statement.
The following is the change tracking behavior based on Last Updated Date (LUD):- If you haven't identified a LUD for the source, only changes to records created in the last
ndays at extract time are tracked. - If you've identified a LUD defined for the source, both changes since the last extract and changes to records created in the last
ndays at extract time are tracked.
The following example specifies the Initial Extract Date- (IED) but not a Last Updated Date (LUD) :IMPORT SOURCE CUSTOMERS WITH IED [CUST_EFF_FROM] STABILITYPERIOD[30, CUST_EFF_FROM] TRACKDELETES DEFINE DATASET CUSTOMERS_D ROWSOURCE CUSTOMERS; THIS = CUSTOMERS; ENDThe example code specifies that only changes (including deletions) in SALES from the last 30 days be processed through propagation in each load.Note:
Even when you don't explicitly define the Last Updated Date (LUD) column in an IMPORT definition, if an incremental key is defined in the source metadata, Data Augmentation Scripts automatically uses it as the Last Updated Date (LUD). To override this behavior, you can use LUD[NULL].The following example ignores the Last Updated Date (LUD):IMPORT SOURCE CUSTOMERS WITH IED [CUST_EFF_FROM] LUD[null] STABILITYPERIOD[30, CUST_EFF_FROM] TRACKDELETES - If you haven't identified a LUD for the source, only changes to records created in the last
Choose the Best Deletion Strategy
| Scenario | Recommended Deletion Method |
|---|---|
| Data must be completely removed. | Hard delete (THEN DELETE, DELETESOURCE, TRACKDELETES)
|
| Need to retain deleted records for historical tracking. | Soft delete (ISDELETED flag)
|
| Source system doesn't track deletions. | TRACKDELETES |
| Deletions must be explicitly provided by a named dataset. | DELETESOURCE |
| Avoid cascading deletions in downstream datasets. | DISABLE DELETEPROPAGATION |
| Optimize performance by applying necessary deletions on imported sources | THEN DELETE |