Sending records to the data domain

The BulkIngester class is the primary entry point for the client-side Bulk Load Interface for loading data into an Endeca data domain.

BulkIngester makes a socket connection to the Endeca data domain and spawns a thread to handle replies. Its sendRecord() method sends the provided record over the wire to the data domain.

Clients to this interface must:
  1. Define classes that implement the four callback interfaces (ErrorCallback, FinishedCallback, AbortCallback, and StatusCallback), and perform the appropriate action when their handler methods are called (which happens in the response thread).
  2. Instantiate a BulkIngester object with the appropriate parameters required by the constructor.
  3. Call the begin() method to start the response thread. If this is not called, an IOException will be thrown.
  4. Call sendRecord() repeatedly to send Record objects to the Endeca data domain.
  5. When finished sending records, call endIngest() to terminate the response thread and close the socket.

Defining callback interfaces

The BulkIngester constructor requires the four callback interfaces as parameters:
  • ErrorCallback handles error conditions. The handleError() method is called when errors occur during the ingest operation.
  • FinishedCallback is called when the Dgraph reports that it has finished with the ingestion. No further records will be accepted without calling begin() again.
  • AbortCallback handles abort conditions. An abort condition can happen either in BulkIngester or on the Dgraph.
  • StatusCallback handles status updates, including the number of successfully ingested records and the number of rejected records.
ErrorCallback is especially useful, as it reports the reason that a record was rejected. The sample program defines this callback as:
ErrorCallback errorCallback = new ErrorCallback() {
    void handleError(String reason, Record reject) {
        System.out.println("Record "
                + reject.getSpec().getName()
                + " rejected: " + reason);
    }
};

Instantiating a BulkIngester object

The BulkIngester constructor requires ten parameters, in this order:
  • host – the name (a String) of the machine on which the Endeca data domain is running. It can be obtained by using the allocateBulkLoadPort operation of the Manage Web Service.
  • port – the bulk load port (an int) of the Endeca data domain. It can be obtained by using the allocateBulkLoadPort operation of the Manage Web Service.
  • useSSL – a boolean to specify whether to use SSL for the connection.
  • doFinalMerge – a boolean that specifies whether a merge is forced immediately after ingest.
  • doUpdateDictionary – a boolean that specifies whether the aspell dictionary is updated immediately after ingest.
  • timeout – the timeout in milliseconds (an int) for connecting to the Endeca data domain.
  • errorCallback – the ErrorCallback object.
  • finishedCallback – the FinishedCallback object.
  • abortCallback – the AbortCallback object.
  • statusCallback – the StatusCallback object.
The sample program constructs the BulkIngester as follows:
BulkIngester ingester("endecaserver.example.com",
        1234,        // port
        false,       // useSSL
        true,        // doFinalMerge
        true,        // doUpdateDictionary
        90000        // timeout in ms
        errorCallback,
        finishedCallback,
        abortCallback,
        statusCallback);

Beginning and ending the ingest

After the client program has made a connection to the Endeca data domain, the ingest process requires the use of these BulkIngester methods in this order:
  1. The begin() method starts the ingest process.
  2. A series of sendRecord() calls actually sends the Record objects to the data domain.
  3. The endIngest() method terminates the ingest process.
The sample program, which ingests only two records, is coded as follows:
Record widget = makeProductRecord("Widget", 12, 99.95);
Record thing = makeProductRecord("Thing", 110, 3.14); 
        
ingester.begin();
ingester.sendRecord(widget);
ingester.requestStatusUpdate();
ingester.sendRecord(thing);    
ingester.endIngest();

Note that the requestStatusUpdate() method is used to retrieve the status of the ingest operation.