Incremental flag syntax

The DP CLI flag syntax for an Incremental update operation is one of the following:

./data_processing_CLI --incrementalUpdate <dsKey> <filter>

or

./data_processing_CLI --incrementalUpdate <dsKey> <filter> --table <tableName>

or

./data_processing_CLI --incrementalUpdate <dsKey> <filter> --table <tableName> --database <dbName>

where:

--incrementalUpdate (abbreviated as -inremental) is mandatory and specifies the data set key (dsKey) of the data set to be updated. filter is a filter predicate that limits the records to be selected from the Hive table.
--table (abbreviated as -t) is optional and specifies a Hive table to be used for the source data. This flag allows you to override the source Hive table that was used to create the original data set (the name of the original Hive table is stored in the data set's metadata).
--database (abbreviated as -d) is optional and specifies the database of the Hive table specified with the --table flag. This flag allows you to override the database that was used to create the original data set). The --database flag can be used only if the --table flag is also used.

The dsKey value is available in the Data Set Key property in Studio. For details, see Obtaining data set keys.

Filter predicate format

A filter predicate is mandatory and is one simple Boolean expression (not compounded), with this format:

"columnName operator filterValue"

where:

columnName is the name of a column in the source Hive table.
operator is one of the following comparison operators:
- =
- <>
- >
- >=
- <
- <=
filterValue is a primitive value. Only primitive data types are supported, which are: integers (TINYINT, SMALLINT, INT, and BIGINT), floating point numbers (FLOAT and DOUBLE), Booleans (BOOLEAN), and strings (STRING). Note that expressions (such as "amount+1") are not supported.

You should enclose the entire filter predicate in either double quotes or single quotes. If you need to use quotes within the filter predicate, use the other quotation format. For example, if you use double quotes to enclose the filter predicate, then use single quotes within the predicate itself.

If columnName is configured as a DATE or TIMESTAMP data type, you can use the unix_timestamp date function, with one of these syntaxes:

columnName operator unix_timestamp(dateValue)

columnName operator unix_timestamp(dateValue, dateFormat)

If dateFormat is not specified, then the DP CLI uses one of two default data formats:

// date-time format:
yyyy-MM-dd HH:mm:ss

// time-only format:
HH:mm:ss

The date-time format is used for columns that map to Dgraph mdex:dateTime attributes, while the time-only format is used for columns that map to Dgraph mdex:time attributes.

If dateFormat is specified, use a pattern described here: http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html

Examples

Example 1: If the Hive "birthyear" column contains a year of birth for a person, then the command can be:

./data_processing_CLI --incrementalUpdate edp_cli_edp_f35ddabb-f011 "birthyear > 1970"

In the example, only the records of persons born after 1970 are processed.

Example 2: Using the unix_timestamp function with a supplied date-time format:

./data_processing_CLI --incrementalUpdate edp_cli_edp_f35ddabb-f011-427f-b4ff-a2e6e3f3f016_12266 
 "factsales_shipdatekey_date >= unix_timestamp('2006-01-01 00:00:00', 'yyy-MM-dd HH:mm:ss')"

Example 3: Another example of using the unix_timestamp function with a supplied date-time format:

./data_processing_CLI --incrementalUpdate edp_cli_edp_a4d38974-3bab-4ced-8166-9b0f46a59d2c_10163
"creation_date >= unix_timestamp('2015-06-01 20:00:00', 'yyyy-MM-dd HH:mm:ss')"

Example 4: An invalid example of using the unix_timestamp function with a date that does not contain a time:

./data_processing_CLI --incrementalUpdate edp_cli_edp_a4d38974-3bab-4ced-8166-9b0f46a59d2c_10163
"claim_date >= unix_timestamp('2000-01-01')"

The error will be:

16:41:29.375 main ERROR: Failed to parse date / time value '2000-01-01' using the format 'yyyy-MM-dd HH:mm:ss'