The transformed entity (node/edge) are compared and the changes are saved in form of insert and delete and then the full table is also updated so that graph can load from them in case of PGX server failure or restart.
For Example:
For the account Node: fcdm_account, the Batch 1 Transformed source data is displayed as provided in the table:
table: Node: fcdm_account
node_id |
label |
original_id |
name |
risk |
source |
date |
---|---|---|---|---|---|---|
1000000191101 |
Account |
AMLBMAC664 |
BENOY |
1 |
FCDM |
15-11-15 |
1000000191102 |
Account |
AMLBMAC420 |
PERRY |
9 |
FCDM |
28-05-15 |
1000000191103 |
Account |
AMLBMAC504 |
RAMESH |
5 |
FCDM |
18-12-14 |
1000000191104 |
Account |
AMLBMAC654 |
DURKA |
4 |
FCDM |
14-12-14 |
On Batch 1 (first batch of ETL), since earlier data are absent, all the entries are also listed as 'insert' and saved into hive tables with name like '<entity_provider_name>_insert', example: 'fcdm_acct_insert' and delete tables with name like '<entity_provider_name>_delete' are not created.
table: fcdm_account_insert
node_id |
label |
original_id |
name |
risk |
source |
date |
---|---|---|---|---|---|---|
1000000191101 |
Account |
AMLBMAC664 |
BENOY |
1 |
FCDM |
15-11-15 |
1000000191102 |
Account |
AMLBMAC420 |
PERRY |
9 |
FCDM |
28-05-15 |
1000000191103 |
Account |
AMLBMAC504 |
RAMESH |
5 |
FCDM |
18-12-14 |
1000000191104 |
Account |
AMLBMAC654 |
DURKA |
4 |
FCDM |
14-12-14 |
table: fcdm_account
node_id |
label |
original_id |
name |
risk |
source |
date |
---|---|---|---|---|---|---|
1000000191101 |
Account |
AMLBMAC664 |
BENOY |
1 |
FCDM |
15-11-15 |
1000000191102 |
Account |
AMLBMAC420 |
PERRY |
9 |
FCDM |
28-05-15 |
1000000191103 |
Account |
AMLBMAC504 |
RAMESH |
5 |
FCDM |
18-12-14 |
1000000191104 |
Account |
AMLBMAC654 |
DURKA |
4 |
FCDM |
14-12-14 |
On subsequent batch for graph, for all the nodes or edges, the comparison is made between previous batch full data and current batch full data to identify insert and delete. The update are considered as deletion of old and addition of new.
For the account Node: fcdm_account, the Batch 2 Transformed source data is displayed as provided in the table:
table: Node: fcdm_account
node_id |
label |
original_id |
name |
risk |
source |
date |
---|---|---|---|---|---|---|
1000000191101 |
Account |
AMLBMAC664 |
BENOY |
1 |
FCDM |
15-11-15 |
1000000191102 |
Account |
AMLBMAC420 |
PERRY |
9 |
FCDM |
28-05-15 |
1000000191104 |
Account |
AMLBMAC654 |
DURGA |
4 |
FCDM |
14-12-14 |
1000000191105 |
Account |
XXXACFRKITINGAC-009 |
THOMAS |
7 |
FCDM |
05-02-15 |
Here node id ending with:
· 103 has been removed
· 105 has been added
· 104 has been updated
table: fcdm_account_insert
node_id |
label |
original_id |
name |
risk |
source |
date |
---|---|---|---|---|---|---|
1000000191104 |
Account |
AMLBMAC654 |
DURGA |
4 |
FCDM |
14-12-14 |
1000000191105 |
Account |
XXXACFRKITINGAC-009 |
THOMAS |
7 |
FCDM |
05-02-15 |
table: fcdm_account_delete
node_id |
---|
1000000191103 |
1000000191104 |
table: Node: fcdm_account
node_id |
label |
original_id |
name |
risk |
source |
date |
---|---|---|---|---|---|---|
1000000191101 |
Account |
AMLBMAC664 |
BENOY |
1 |
FCDM |
15-11-15 |
1000000191102 |
Account |
AMLBMAC420 |
PERRY |
9 |
FCDM |
28-05-15 |
1000000191104 |
Account |
AMLBMAC654 |
DURGA |
4 |
FCDM |
14-12-14 |
1000000191105 |
Account |
XXXACFRKITINGAC-009 |
THOMAS |
7 |
FCDM |
05-02-15 |