This topic describes the syntax of the
--incrementalUpdate flag.
The DP CLI flag syntax for an Incremental update operation is one of
the following:
./data_processing_CLI --incrementalUpdate <dsKey> <filter>
or
./data_processing_CLI --incrementalUpdate <dsKey> <filter> --table <tableName>
or
./data_processing_CLI --incrementalUpdate <dsKey> <filter> --table <tableName> --database <dbName>
where:
- --incrementalUpdate
(abbreviated as
-inremental) is mandatory and specifies the
data set key (dsKey) of the data set to be updated.
filter is a filter predicate that limits the
records to be selected from the Hive table.
- --table
(abbreviated as
-t) is optional and specifies a Hive table to
be used for the source data. This flag allows you to override the source Hive
table that was used to create the original data set (the name of the original
Hive table is stored in the data set's metadata).
- --database
(abbreviated as
-d) is optional and specifies the database of
the Hive table specified with the
--table flag. This flag allows you to override
the database that was used to create the original data set). The
--database flag can be used only if the
--table flag is also used.
The
dsKey value is available in the
Data Set Key property in Studio. For details,
see
Obtaining data set keys.
Filter predicate format
A filter predicate is mandatory and is one simple Boolean expression
(not compounded), with this format:
"columnName operator filterValue"
where:
- columnName is the
name of a column in the source Hive table.
- operator is one of
the following comparison operators:
- filterValue is a
primitive value. Only primitive data types are supported, which are: integers
(TINYINT,
SMALLINT,
INT, and
BIGINT), floating point numbers
(FLOAT and
DOUBLE), Booleans (BOOLEAN), and
strings (STRING). Note that expressions (such as "amount+1")
are not supported.
You should enclose the entire filter predicate in either double quotes
or single quotes. If you need to use quotes within the filter predicate, use
the other quotation format. For example, if you use double quotes to enclose
the filter predicate, then use single quotes within the predicate itself.
If
columnName is configured as a
DATE or
TIMESTAMP data type, you can use the
unix_timestamp date function, with one of these
syntaxes:
columnName operator unix_timestamp(dateValue)
columnName operator unix_timestamp(dateValue, dateFormat)
If
dateFormat is not specified, then the DP CLI uses
one of two default data formats:
// date-time format:
yyyy-MM-dd HH:mm:ss
// time-only format:
HH:mm:ss
The date-time format is used for columns that map to Dgraph
mdex:dateTime attributes, while the time-only format
is used for columns that map to Dgraph
mdex:time attributes.
If
dateFormat is specified, use a pattern described
here:
http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html
Examples
Example 1: If the Hive "birthyear" column contains a year of
birth for a person, then the command can be:
./data_processing_CLI --incrementalUpdate edp_cli_edp_f35ddabb-f011 "birthyear > 1970"
In the example, only the records of persons born after 1970 are
processed.
Example 2: Using the
unix_timestamp function with a supplied date-time
format:
./data_processing_CLI --incrementalUpdate edp_cli_edp_f35ddabb-f011-427f-b4ff-a2e6e3f3f016_12266
"factsales_shipdatekey_date >= unix_timestamp('2006-01-01 00:00:00', 'yyy-MM-dd HH:mm:ss')"
Example 3: Another example of using the
unix_timestamp function with a supplied date-time
format:
./data_processing_CLI --incrementalUpdate edp_cli_edp_a4d38974-3bab-4ced-8166-9b0f46a59d2c_10163
"creation_date >= unix_timestamp('2015-06-01 20:00:00', 'yyyy-MM-dd HH:mm:ss')"
Example 4: An invalid example of using the
unix_timestamp function with a date that does not
contain a time:
./data_processing_CLI --incrementalUpdate edp_cli_edp_a4d38974-3bab-4ced-8166-9b0f46a59d2c_10163
"claim_date >= unix_timestamp('2000-01-01')"
The error will be:
16:41:29.375 main ERROR: Failed to parse date / time value '2000-01-01' using the format 'yyyy-MM-dd HH:mm:ss'