Running an Incremental update

This topic describes how to run an Incremental update operation.

This procedure assumes that the data set has been configured for Incremental updates (that is, a record identifier has been configured).

Note that the example in the procedure does not use the --table and --database flags, which means that the command will run against the original Hive table from which the data set was created.

To run an Incremental update on a data set:

  1. Obtain the Data Set Logical Name of the data set you want to incrementally update:
    1. In Studio, go to Project Settings and then Data Set Manager.
    2. In the Data Set Manager, select the data set and expand the options next to its name.
    3. Get the value from the Data Set Logical Name field.
  2. From a Linux command prompt, change to the $BDD_HOME/dataprocessing/edp_cli directory.
  3. Run the DP CLI with the --incrementalUpdate flag, the Data Set Logical Name, and the filter predicate. For example:
    ./data_processing_CLI --incrementalUpdate 10801:WarrantyClaims "claim_amount > 1000"
If the workflow was successfully invoked, the DP CLI prints these messages at the end of the stdout output:
...
jobId: b6a9fab0-7ca0-4d35-9950-a066520dd948
data_processing_CLI finished with state SUCCESS
The YARN Application Overview page should have a State of "FINISHED" and a FinalStatus of "SUCCEEDED". The Name field will have an entry similar to this example:
EDP: IncrementalUpdateConfig{collectionId=MdexCollectionIdentifier{
   databaseName=default_edp_e6bfc4c3-24cc-4141-96fe-53cb808b788d, 
   collectionName=default_edp_e6bfc4c3-24cc-4141-96fe-53cb808b788d}, 
   jobId=b6a9fab0-7ca0-4d35-9950-a066520dd948, 
   whereClause=claim_amount > 1000}
Note the following about the Name information:
  • IncrementalUpdateConfig is the name of the type of Incremental workflow.
  • whereClause lists the filter predicate used in the command.

You can also check the Dgraph HDFS Agent log for the status of the Dgraph ingest operation.

If the Incremental update determines that there are no records that fit the filter predicate criteria, a Dgraph ingest operation is not performed and the DP log will have this entry:
Nothing to be done here since no records are returned for the filter predicate.

Note that future Incremental updates on this data set will continue to use the same Data Set Logical Name. You will also use this name if you set up a Incremental update cron job for this data set.