Naming convention for source data files

Whether you are using a random or deterministic distribution strategy, it is strongly recommended that you use a timestamp format as the naming scheme for the update source data files.

This format ensures that Forge processes the files in the proper order of their creation.

For both strategies, a Perl expression in the record manipulator can use the timestamp part of the filename for the name of the output record file.

Random distribution format

For a random distribution strategy, a suggested format is:

YYYYMMDDHHNNSS.ext

where YYYY is the four-digit year, MM is the two-digit month, DD is the two-digit day, HH is the two-digit hour, NN is the two-digit minute, and SS is the two-digit second, as this example:

20051023161408.txt

These files may contain new records that are distributed randomly to the Agraph partitions.

Deterministic distribution format

For a deterministic distribution strategy, a suggested format is:

YYYYMMDDHHNNSS-partX.ext

where X is the number of the Agraph partition for which these records are intended. For example, records in this source data file are intended for partition3:

20050717151408-part3.txt

The Perl expression in the record manipulator parses the filename for the partition number and uses it to assign new records to that partition.

The expression also uses the timestamp and -partX information for the name of the output record file. For example, the above input filename generates this output record file:

20050717151408-part3.records.xml

Keep in mind that if you pre-partition your baseline source files, you should also pre-partition the records to be added. That is, all ADD (or ADD_OR_REPLACE) records for the partition 0 Dgraph should be in one file, records for the partition1 Dgraph should be in a second file, and so on.