14 Appendix Example of ETL

The transformed entity (node/edge) are compared and the changes are saved in form of insert and delete and then the full table is also updated so that graph can load from them in case of PGX server failure or restart.

For Example:

For the account Node: fcdm_account, the Batch 1 Transformed source data is displayed as provided in the table:

table: Node: fcdm_account

 

 node_id      

label  

original_id        

name  

risk

source

date      

1000000191101

     Account

AMLBMAC664         

BENOY

1

FCDM  

15-11-15

1000000191102

    Account

AMLBMAC420

PERRY

9

FCDM  

28-05-15

1000000191103

    Account

AMLBMAC504

RAMESH

5

FCDM

18-12-14

1000000191104

    Account

AMLBMAC654

DURKA

4

FCDM

14-12-14

On Batch 1 (first batch of ETL), since earlier data are absent, all the entries are also listed as 'insert' and saved into hive tables with name like '<entity_provider_name>_insert', example: 'fcdm_acct_insert' and delete tables with name like '<entity_provider_name>_delete' are not created.

table: fcdm_account_insert

 

 node_id      

label  

original_id        

name  

risk

source

date      

1000000191101

     Account

AMLBMAC664         

BENOY

1

FCDM  

15-11-15

1000000191102

    Account

AMLBMAC420

PERRY

9

FCDM  

28-05-15

1000000191103

    Account

AMLBMAC504

RAMESH

5

FCDM

18-12-14

1000000191104

    Account

AMLBMAC654

DURKA

4

FCDM

14-12-14

table: fcdm_account

 

 node_id      

label  

original_id        

name  

risk

source

date      

1000000191101

     Account

AMLBMAC664         

BENOY

1

FCDM  

15-11-15

1000000191102

    Account

AMLBMAC420

PERRY

9

FCDM  

28-05-15

1000000191103

    Account

AMLBMAC504

RAMESH

5

FCDM

18-12-14

1000000191104

    Account

AMLBMAC654

DURKA

4

FCDM

14-12-14

On subsequent batch for graph, for all the nodes or edges, the comparison is made between previous batch full data and current batch full data to identify insert and delete. The update are considered as deletion of old and addition of new.

For the account Node: fcdm_account, the Batch 2 Transformed source data is displayed as provided in the table:

table: Node: fcdm_account

 

 node_id      

label  

original_id        

name  

risk

source

date      

1000000191101

     Account

AMLBMAC664         

BENOY

1

FCDM  

15-11-15

1000000191102

    Account

AMLBMAC420

PERRY

9

FCDM  

28-05-15

1000000191104

    Account

AMLBMAC654

DURGA

4

FCDM

14-12-14

1000000191105

    Account

XXXACFRKITINGAC-009

THOMAS

7

FCDM  

05-02-15

Here node id ending with:

·        103 has been removed

·        105 has been added

·        104 has been updated

table: fcdm_account_insert

 

 node_id      

label  

original_id        

name  

risk

source

date      

1000000191104

    Account

AMLBMAC654

DURGA

4

FCDM

14-12-14

1000000191105

    Account

XXXACFRKITINGAC-009

THOMAS

7

FCDM  

05-02-15

table: fcdm_account_delete

 

 node_id      

1000000191103

1000000191104

 

table: Node: fcdm_account

 

 node_id      

label  

original_id        

name  

risk

source

date      

1000000191101

     Account

AMLBMAC664         

BENOY

1

FCDM  

15-11-15

1000000191102

    Account

AMLBMAC420

PERRY

9

FCDM  

28-05-15

1000000191104

    Account

AMLBMAC654

DURGA

4

FCDM

14-12-14

1000000191105

    Account

XXXACFRKITINGAC-009

THOMAS

7

FCDM  

05-02-15