About the Bulk Load API

Data records can be added or replaced via the Dgraph's Bulk Load Interface.

Besides the Data Ingest API, the Bulk Load Interface is available to ingest records into an Endeca data domain. The Bulk Load API exists in the form of a collection of Java classes in a single endeca_bulk_load.jar file, which is shipped in the Endeca Server's apis directory. The Javadoc for the Bulk Load API is located in the apis/doc/bulk_load directory.

Bulk Load characteristics

The characteristics of the interface are:
  • The API can load data source records only. It cannot load PDRs, DDRs, managed attribute values, the GCR, or the Dgraph configuration documents.
  • Existing records in the Endeca data domain are replaced, not updated. That is, the replace operation is not additive. Therefore, the key-value pair list of the incoming record will completely replace the key-value pair list of the existing record.
  • A primary-key attribute (also called the record spec) is required for each record to be added or replaced.
  • If an assignment is for a standard attribute (property) that does not exist in the Endeca data domain, the new standard attribute is automatically created with system default values for the PDR. For these default values, see Default values for new attributes.
The interface rejects non-XML 1.0 characters upon ingest. That is, a valid character for ingest must be a character according to production 2 of the XML 1.0 specification. If an invalid character is detected, an exception is thrown with this error message:
Character <c> is not legal in XML 1.0

The record with the invalid character is rejected.

Post-ingest behavior

There are two operations that must occur at some time after each bulk-load ingest:
  • A merge of the ingested records to a single generation, which re-indexes the database to optimize query performance.
  • A rebuild of the aspell spelling dictionary, so that the newly-added data will be available for spelling DYM and autocorrect.
The BulkIngester constructor's doFinalMerge parameter allows you to set when the post-ingest merge occurs:
  • If set to true, the merge is forced immediately after ingest. This behavior is intended to maximize query performance at the end of a single, large, homogenous data update that would occur during a regularly scheduled update window.
  • If set to false, a merge is not forced at the end of an update, but instead relies on the regular background merge process to keep the generations in order over time. This behavior is more suitable for parallel heterogeneous data updates where low overall update latency is paramount.
The BulkIngester constructor's doUpdateDictionary parameter lets you specify when the aspell spelling dictionary is updated:
  • A setting of true means a dictionary update is forced immediately after the ingest.
  • A setting of false means the dictionary update is disabled. You can later update the dictionary. For information, see the Oracle Endeca Server Administrator's Guide.

If you are doing multiple, consecutive bulk-load operations, you can set both properties to false on all except the last one.