Bulk Add/Replace Records component

The Bulk Add/Replace Records component adds new records or replaces existing records in an Endeca data domain.

Bulk Add/Replace Records component

The Bulk Add/Replace Records component uses the Endeca Server's Bulk Load Interface. (Other components use the Data Ingest Web Service [DIWS]). The Bulk Load Interface defines the basic characteristics of this component:
  • The component can load data source records only.
    Thus, you cannot use this component to load
    • Property Description Records (PDRs)
    • Dimension Description Records (DDRs)
    • Managed attribute values (MVals)
    • Global Configuration Record (GCR)
    • Data domain configuration documents
  • Existing records in the Endeca data domain are replaced, not updated. In other words, the load operation is a replace operation not an append operation.
  • A primary-key attribute (also called a record spec) is required for each record to be added or replaced.
  • If an assignment (key-value pair) specifies a standard attribute (property) that does not exist in the Endeca data domain, a new standard attribute is automatically created with system default values for the PDR (see Standard attribute default values).
  • The component does not send records in batches. It employs a single streaming connection to the data domain.
Note: String data submitted for ingest must consist of valid XML characters. For details see Valid characters for ingest.

Post ingest behavior

The default behavior of the Bulk Load Interface is to force a merge to a single generation at the end of every bulk-load ingest operation. This behavior is intended to maximize query performance at the end of a single, large, homogenous data update that would occur during a regularly scheduled update window.

ThePost Ingest Query Optimization property of the Bulk Add/Replace Records allows you to control when the post-ingest merge occurs:
  • If this property is set to true (the default), the merge is forced immediately after ingest.
  • If the property is set to false, a merge is not forced at the end of an update, but instead relies on the regular background merge process to keep the generations in order over time. This behavior is more suitable for parallel heterogeneous data updates where low overall update latency is paramount.
The Post Ingest Dictionary Update property controls when the spelling dictionary is updated:
  • If this property is set to true (the default), a dictionary update is forced immediately after the ingest.
  • If this property is set to false, dictionary update is disabled. You can update the dictionary manually at a later time. For details about updating the dictionary, see "Updating spelling dictionaries for a data domain" in the Oracle Endeca Server Cluster Guide.

Input metadata schema

The input metadata schema for the Bulk Add/Replace Records component is not fixed. Each metadata field represents a property on a data domain record.

The metadata type of the Integrator field (as shown in the Edit Metadata dialog on the edge connecting to the connector) translates to the mdex property type. For example, the Integrator integer data type translates to the mdex:int data type. Note that you must override this behavior to support Integrator non-native types (such as mdex:duration, mdex:time, and mdex:geocode). For details, see Creating mdexType Custom properties.

Use cases

Use the Bulk Add/Replace Records component to load data in bulk when it is acceptable to delay the visibility of the updates and for query performances to stop while data is loaded.

Use this component for the following cases:
  • Full index initial load of records when no schema has been loaded. In this scenario, the Endeca data domain has no user data records and also has no user-created schema (in other words, no existing PDRs). In this case, all new properties (including the primary-key properties) are created with system default values.
  • Full index initial load of records, after you have loaded the record schema.
  • Adding more new records to the Endeca data domain any time after the initial loading of records. As in the initial load case, new standard attributes that do not exist in the data domain are automatically created with default system values.
  • Replacing existing records in the Endeca data domain any time after the initial loading of records. In this case, all the key/value pairs of the existing record are replaced with the key-value pairs of the input file.

Configuration properties

Note: For details about visual properties for all connectors, see Visual properties of components. For details about configuration properties common to all connectors, see Common configuration properties of components.

The following table describes the configuration properties available for the Bulk Add/Replace Records component.

Table 1. Bulk Add/Replace Records properties
Name Description Valid Values Example
Endeca Server Host Identifies the machine on which the Endeca Server is running. The name or IP address of the machine. You can use localhost. MyEndecaServer

255.255.255.0

Endeca Server Port Identifies the port on which the Endeca Server is listening. Valid ports.

The default Endeca Server port is 7001, but it can be changed to another port.

7001
Endeca Server Context Root Identifies the WebLogic application root context of the Endeca Server Valid root context names in WebLogic /endeca-server
Data Domain Name Name of the data domain that will be modified.

The data domain should be running when the graph containing the connector is run.

Valid data domain names quickstart
Spec Attribute Specifies the primary key (record spec) for the records on which the operation will be performed. Name of the primary key.

If the primary key does not exist in the data domain, the property is created automatically with system default values.

FactSales_OrderNumber
Post Ingest Query Optimization Specifies whether to merge records immediately after the ingest operation is complete or to use the standard background merge process. Checked (True; default)

Unchecked (False)

Post Ingest Dictionary Update Specifies whether to update the spelling dictionaries automatically immediately after the ingest operation is complete. If the dictionaries are not updated when the ingest operation is complete, you can update them later using an operation of the Manage Web Service in the Endeca Server. Checked (True; default)

Unchecked (False)

SSL Enabled Enables or disables SSL for the component.

SSL should only be enabled when the Endeca Server to which you are connecting has SSL enabled.

Checked (True)

Unchecked (False)

Stop after this many errors Specifies the maximum number of ingest errors allowed in a single load operation. If this number of errors occurs, the ingest operation is terminated. Either 0 (no errors are allowed) or a positive integer. 0

15

Multi-assign delimiter Sets the character that separates multi-assign values in a property in a source record. Keep in mind that this delimiter is different from the delimiter that separates property fields on the source record.

See also Multi-assign delimiter.

A single character that is the multi-assign delimiter. The default is the Unicode DELETE character (\U007F). You do not have to use this field if your data does not include multi-assign properties.
Timeout (ms) Specifies the timeout of operations of the component.

If timeouts occur when running graphs, operations on the Endeca Server may be taking too long. Change the value of this parameter to allow more time for operations on the Endeca Server.

The default configures a one minute timeout.

Integers 60000

Data domain status after a failed ingest operation

When a bulk load ingest operation is terminated because of an error, records that were ingested before the error should be included in the data domain. Although the data domain may accept queries on the ingested records, you should consider the data domain to be in an inconsistent state. To restore a consistent state, review the logs to determine the problems that caused the bulk load operation to fail, correct these problems, then reload the data.

Output Ports

Each Information Discovery component that modifies record data in the data domain (adding or removing records or key/value pairs) has two output ports:
  • Port 0 returns status information describing batches of records that were successfully ingested.
  • Port 1 returns error information describing batches of records that the data domain failed to ingest. Each output record to the port corresponds to a failed batch, not to individual records.
Table 2. Port 0 metadata
Port Field Name Data Type Description Example
Records added Long Number of records added to the data domain 984341
Records Queued Long Number of records queued for processing but not processed 1568
Records Rejected Long Number of records submitted that were not added to the data domain 24836
State String Data domain status string returned by the Bulk Load API
Table 3. Port 1 metadata
Port Field Name Data Type Description Example
Fault Message String Error message returned by the Endeca Server