Data records can be added or replaced via the Dgraph's Bulk Load
Interface.
Besides the Data Ingest API, the Bulk Load Interface is available to
ingest records into an Endeca data store. The Bulk Load API exists in the form
of a collection of Java classes in a single
endeca_bulk_load.jar file, which is shipped in the
Endeca Server's
apis directory. The Javadoc for the Bulk Load API is
located in the
apis/doc/bulk_load directory.
Bulk Load characteristics
The characteristics of the interface are:
- The API can load data
source records only. It cannot load PDRs, DDRs, managed attribute values, the
GCR, or the Dgraph configuration documents.
- Existing records in the
Endeca data store are replaced, not updated. That is, the replace operation is
not additive. Therefore, the key/value pair list of the incoming record will
completely replace the key/value pair list of the existing record.
- A primary-key attribute
(also called the record spec) is required for each record to be added or
replaced.
- If an assignment is for a
standard attribute (property) that does not exist in the Endeca data store, the
new standard attribute is automatically created with system default values for
the PDR. For these default values, see
Default values for new attributes in the data store.
The interface rejects non-XML 1.0 characters upon ingest. That is, a
valid character for ingest must be a character according to production 2 of the
XML 1.0 specification. If an invalid character is detected, an exception is
thrown with this error message:
Character <c> is not legal in XML 1.0
The record with the invalid character is rejected.
Post-ingest behavior
There are two operations that must occur at some time after each
bulk-load ingest:
- A merge of the ingested
records to a single generation, which re-indexes the database to optimize query
performance.
- A rebuild of the aspell
spelling dictionary, so that the newly-added data will be available for
spelling DYM and autocorrect.
The
BulkIngester constructor's
doFinalMerge parameter lets you set when the
post-ingest merge occurs:
- If set to
true, the merge is forced immediately after ingest.
This behavior is intended to maximize query performance at the end of a single,
large, homogenous data update that would occur during a regularly scheduled
update window.
- If set to
false, a merge is not forced at the end of an
update, but instead relies on the regular background merge process to keep the
generations in order over time. This behavior is more suitable for parallel
heterogeneous data updates where low overall update latency is paramount.
The
BulkIngester constructor's
doUpdateDictionary parameter lets you specify when
the aspell spelling dictionary is updated:
- A setting of
true means a dictionary update is forced immediately
after the ingest.
- A setting of
false means the dictionary update is disabled. You
can later update the dictionary via the
updateaspell administrative operation, which
is described in the
Oracle Endeca Server Administrator's Guide.
If you are doing multiple, consecutive bulk-load operations, you can
set both properties to
false on all except the last one.