The
BulkIngester class is the primary entry point for the
client-side Bulk Load Interface for loading data into an Endeca data domain.
BulkIngester makes a socket connection to the Endeca
data domain and spawns a thread to handle replies. Its
sendRecord() method sends the provided record over the
wire to the data domain.
Clients to this interface must:
- Define classes that
implement the four callback interfaces (ErrorCallback,
FinishedCallback,
AbortCallback, and
StatusCallback), and perform the appropriate action
when their handler methods are called (which happens in the response thread).
- Instantiate a
BulkIngester object with the appropriate parameters
required by the constructor.
- Call the
begin() method to start the response thread. If this
is not called, an
IOException will be thrown.
- Call
sendRecord() repeatedly to send
Record objects to the Endeca data domain.
- When finished sending
records, call
endIngest() to terminate the response thread and
close the socket.
Defining callback interfaces
The
BulkIngester constructor requires the four callback
interfaces as parameters:
- ErrorCallback
handles error conditions. The
handleError() method is called when errors occur
during the ingest operation.
- FinishedCallback is
called when the Dgraph reports that it has finished with the ingestion. No
further records will be accepted without calling
begin() again.
- AbortCallback
handles abort conditions. An abort condition can happen either in
BulkIngester or on the Dgraph.
- StatusCallback
handles status updates, including the number of successfully ingested records
and the number of rejected records.
ErrorCallback is especially useful, as it reports
the reason that a record was rejected. The sample program defines this callback
as:
ErrorCallback errorCallback = new ErrorCallback() {
void handleError(String reason, Record reject) {
System.out.println("Record "
+ reject.getSpec().getName()
+ " rejected: " + reason);
}
};
Instantiating a BulkIngester object
The
BulkIngester constructor requires ten parameters, in
this order:
- host –
the name (a
String) of the machine on which the Endeca data
domain is running. It can be obtained by using the
allocateBulkLoadPort operation of the Manage Web
Service.
- port –
the bulk load port (an
int) of the Endeca data domain. It can be obtained
by using the
allocateBulkLoadPort operation of the Manage Web
Service.
- useSSL –
a
boolean to specify whether to use SSL for the
connection.
- doFinalMerge – a
boolean that specifies whether a merge is forced
immediately after ingest.
- doUpdateDictionary –
a
boolean that specifies whether the aspell
dictionary is updated immediately after ingest.
- timeout –
the timeout in milliseconds (an
int) for connecting to the Endeca data domain.
- errorCallback – the
ErrorCallback object.
- finishedCallback –
the
FinishedCallback object.
- abortCallback – the
AbortCallback object.
- statusCallback – the
StatusCallback object.
The sample program constructs the
BulkIngester as follows:
BulkIngester ingester("endecaserver.example.com",
1234, // port
false, // useSSL
true, // doFinalMerge
true, // doUpdateDictionary
90000 // timeout in ms
errorCallback,
finishedCallback,
abortCallback,
statusCallback);
Beginning and ending the ingest
After the client program has made a connection to the Endeca data
domain, the ingest process requires the use of these
BulkIngester methods in this order:
- The
begin() method starts the ingest process.
- A series of
sendRecord() calls actually sends the
Record objects to the data domain.
- The
endIngest() method terminates the ingest process.
The sample program, which ingests only two records, is coded as
follows:
Record widget = makeProductRecord("Widget", 12, 99.95);
Record thing = makeProductRecord("Thing", 110, 3.14);
ingester.begin();
ingester.sendRecord(widget);
ingester.requestStatusUpdate();
ingester.sendRecord(thing);
ingester.endIngest();
Note that the
requestStatusUpdate() method is used to retrieve the
status of the ingest operation.