Integrator provides two loader components to support the loading
of data into an Endeca data domain: The
Bulk Add/Update Records component (often called the
"Bulk Loader") and the
Merge Records component.
When loading data, at least a portion of processing resources is devoted
to load processing; as a result, query performance is reduced while data is
loading. The
Add/Update Records component uses the Data Ingest
Web Service, which consumes only a portion of processing resources. Thus, while
query performance is reduced, some query processing can continue. The Bulk
Loader uses the Endeca server's Bulk Load API, which consumes all processing
resources while running. Query processing is essentially placed on hold during
bulk loading and resumes once the bulk load processing is complete.
Use the
Bulk Add/Replace Records component to load data in
bulk when it is acceptable for the visibility of updates to be delayed and for
query processing to stop during the load processing.
Specific situations when you would choose this component include:
- You are performing the
initial load of records into the data domain, whether or not an attribute
schema has been configured. If the schema has not been configured (in other
words, if no PDRs have been loaded) and no user data has been loaded
previously, all new properties are created with default system values.
- You are adding new records
to the data domain any time after the initial upload. Any new standard
attributes that do not exist in the data domain are automatically created with
default system values.
- You want to replace existing
records in the data domain. When you use the
Bulk Add/Replace Records component, if a loaded
record matches an existing record, the loaded record overwrites (completely
replaces) the existing record.
Note that this component cannot be used to load:
- The Global Configuration
Record (GCR)
- Property Description Records
(PDRs)
- Dimension Description
Records (DDRs)
- Managed attribute values
(mvals)
- Any data domain
configuration documents
Use the
Add/Update Records component to add or update
small numbers of records, or to add or modify records when you want query
processing to continue during the load (with reduced performance) and can
accept a longer loading time in exchange.
Specific situations when you would choose this component include:
- You are loading the
attribute schema.
- You are incrementally
updating the data domain with new records after the initial load of records.
Any new attributes that do not exist in the data domain are automatically
created with system defaults.
- You are incrementally
updating existing records in the data domain following initial upload. The
behavior depends on the multi-assign configuration of the standard attribute.
If the single-assign property is configured as false, uploaded data is totally
additive; in other words, the loaded key-value pair will be merged into the
existing record. If the single-assign property is configured as true (which is
the default), and additional values are uploaded for an existing assignment,
the operation fails.
- You are adding new records
to the data domain any time.
Note that this component cannot be used to load:
- The Global Configuration
Record (GCR)
- Managed attribute values
(mvals)
- Any data domain
configuration documents
In general:
- Choose the
Bulk Add/Replace Records component to load, add,
or replace a large number of records, and it is acceptable for query processing
and the visibility of changes to be delayed. (Thus, you may want to consider
scheduling such operations outside of business hours or during weekends.)
- Choose the
Add/Update Records component to load, add, or
update smaller numbers of records, when you want query processing to continue
during load processing (as noted above, query processing performance will still
be reduced) and you want the changes to be visible immediately.