This topic describes how to run an Incremental update operation.
This procedure assumes that the data set has been configured for Incremental updates (that is, a record identifier has been configured).
Note that the example in the procedure does not use the --table and --database flags, which means that the command will run against the original Hive table from which the data set was created.
To run an Incremental update on a data set:
... client token: N/A diagnostics: N/A ApplicationMaster host: web2014.example.com ApplicationMaster RPC port: 0 queue: root.fcalvill start time: 1437415956086 final status: SUCCEEDED tracking URL: http://web2014.example.com:8088/proxy/application_1436970078353_0041/A user: fcalvill data_processing_CLI finished with state SUCCESS
EDP: IncrementalUpdateConfig{collectionId=MdexCollectionIdentifier{ databaseName=default_edp_2c08eb40-8eff-4c7e-b05e-2e451434936d, collectionName=default_edp_2c08eb40-8eff-4c7e-b05e-2e451434936d}, whereClause=claim_date >= unix_timestamp('2006-01-01 00:00:00', 'yyy-MM-dd HH:mm:ss')}
IncrementalUpdateConfig
is the name of the type of Incremental workflow.whereClause
lists the filter predicate used in the command.You can also check the Dgraph HDFS Agent log for the status of the Dgraph ingest operation.
If the Incremental update determines that there are no records that fit the filter predicate criteria, the DP CLI exits gracefully with a message that no records are to be updated.
Note that future Incremental updates on this data set will continue to use the same Data Set Logical Name. You will also use this name if you set up a Incremental update cron
job for this data set.