This topic describes the syntax of the --incrementalUpdate flag.
./data_processing_CLI --incrementalUpdate <logicalName> <filter>or
./data_processing_CLI --incrementalUpdate <logicalName> <filter> --table <tableName>or
./data_processing_CLI --incrementalUpdate <logicalName> <filter> --table <tableName> --database <dbName>
The logicalName value is available in the Data Set Logical Name property in Studio. For details, see Obtaining the Data Set Logical Name.
Filter predicate format
"columnName operator filterValue"where:
columnName is the name of a column in the source Hive table.operator is one of the following comparison operators:
=<>>>=<<=filterValue is a primitive value. Only primitive data types are supported, which are: integers (TINYINT, SMALLINT, INT, and BIGINT), floating point numbers (FLOAT and DOUBLE), Booleans (BOOLEAN), and strings (STRING). Note that expressions (such as "amount+1") are not supported.You should enclose the entire filter predicate in either double quotes or single quotes. If you need to use quotes within the filter predicate, use the other quotation format. For example, if you use double quotes to enclose the filter predicate, then use single quotes within the predicate itself.
columnName is configured as a DATE or TIMESTAMP data type, you can use the unix_timestamp date function, with one of these syntaxes:
columnName operator unix_timestamp(dateValue) columnName operator unix_timestamp(dateValue, dateFormat)
// date-time format: yyyy-MM-dd HH:mm:ss // time-only format: HH:mm:ss
The date-time format is used for columns that map to Dgraph mdex:dateTime attributes, while the time-only format is used for columns that map to Dgraph mdex:time attributes.
If dateFormat is specified, use a pattern described here: http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html
Note on data types in the filter predicate
You should pay close attention to the Hive column data types when constructing a filter for Incremental update, because the results of a comparison can differ. This is especially important for columns of type String, because results of String comparison are different from results of Number comparison.
./data_processing_CLI -incremental 10133:WarrantyClaims "age<18"
Therefore, the number of filtered records will differ depending on the data type of the "age" column.
Also keep in mind that if the data set was originally created using File Upload in Studio, then the underlying Hive table for that data set will have all columns of type String.
Examples
./data_processing_CLI --incrementalUpdate 10133:WarrantyClaims "claimyear > 1970"In the example, only the records of claims made after 1970 are processed.
unix_timestamp function with a supplied date-time format:
./data_processing_CLI --incrementalUpdate 10133:WarrantyClaims
"factsales_shipdatekey_date >= unix_timestamp('2006-01-01 00:00:00', 'yyy-MM-dd HH:mm:ss')"
unix_timestamp function with a supplied date-time format:
./data_processing_CLI --incrementalUpdate 10133:WarrantyClaims
"creation_date >= unix_timestamp('2015-06-01 20:00:00', 'yyyy-MM-dd HH:mm:ss')"
unix_timestamp function with a date that does not contain a time:
./data_processing_CLI --incrementalUpdate 10133:WarrantyClaims
"claim_date >= unix_timestamp('2000-01-01')"
16:41:29.375 main ERROR: Failed to parse date / time value '2000-01-01' using the format 'yyyy-MM-dd HH:mm:ss'