Assumptions

  1. Primary Key (PK) and Surrogate Key (SK) Columns are mandatory to map, else SCD will fail.
  2. Since Hive does not have PK functionality, you should map an ID Column as PK, on the basis of which STG and DIM tables will be matched for TYPE1 and TYPE2.
  3. SK column in destination (DIM) table will always be of data type INT/BIGINT.
  4. DIM_SCD_SEEDED table will be created automatically. You need to insert data manually as mentioned in the following table: Table 5: Seeded Key.

    Table 2-2 Seeded Key and Code

    Seeded_SKEY Seeded_CODE Seeded_dESC
    0 MSG Missing
    -1 OTH OTHER