com.endeca.BulkLoad
Class BulkIngester

java.lang.Object
  extended by com.endeca.BulkLoad.BulkIngester

public class BulkIngester
extends java.lang.Object

The primary entry point for the client-side Bulk Load Interface for loading data into an Endeca data domain. It makes a socket connection to the data domain and spawns a thread to handle replies. Clients to this interface must:

  1. Define classes that implement the four callback interfaces, ErrorCallback, FinishedCallback, AbortCallback, and StatusCallback, and do something useful when their handler methods are called (which happens in the response thread).
  2. Instantiate a BulkIngester object with the data domain hostname and port number (this information can be obtained by using the allocateBulkPort web service method in the manage web service), the four callback objects etc. as defined in the constructor.
  3. Call the begin method to start the response thread. If this is not called, an IOException will be thrown.
  4. Call sendRecord repeatedly to send Data.Record objects to the data domain.
  5. When finished sending records, call endIngest to terminate the response thread and close the socket.


Constructor Summary
protected BulkIngester(java.io.DataInputStream din, java.io.DataOutputStream dout, ErrorCallback errorCallback, FinishedCallback finishedCallback, AbortCallback abortCallback, StatusCallback statusCallback)
          Alternative constructor for unit testing only.
  BulkIngester(java.lang.String host, int port, javax.net.SocketFactory socketFactory, java.lang.String transactionId, boolean doFinalMerge, boolean doUpdateSpellingDictionary, long socketTimeoutMillis, ErrorCallback errorCallback, FinishedCallback finishedCallback, AbortCallback abortCallback, StatusCallback statusCallback)
          Creates a BulkIngester object which can be used to send data to the Data Domain using the Bulk Ingest protocol.
 
Method Summary
 void begin()
          Spawn a thread to asynchronously read the data domain's responses.
 void endIngest()
          Terminates the response thread and closes the socket.
 void requestStatusUpdate()
          Poll the data domain on its current status.
 void sendCancel()
          Tell the data domain that the client wants to cancel the bulk ingest process.
 void sendRecord(Data.Record rec)
          Sends the provided record over the wire to the data domain.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BulkIngester

public BulkIngester(java.lang.String host,
                    int port,
                    javax.net.SocketFactory socketFactory,
                    java.lang.String transactionId,
                    boolean doFinalMerge,
                    boolean doUpdateSpellingDictionary,
                    long socketTimeoutMillis,
                    ErrorCallback errorCallback,
                    FinishedCallback finishedCallback,
                    AbortCallback abortCallback,
                    StatusCallback statusCallback)
             throws java.io.IOException
Creates a BulkIngester object which can be used to send data to the Data Domain using the Bulk Ingest protocol.

Parameters:
host - hostname of the Data Domain. Cannot be null.
port - bulk ingest port of the Data Domain.
socketFactory - SocketFactory to use to create socket for bulk ingest. Cannot be null.
transactionId - The transaction to perform this ingest against. Can be null to signify no outer transaction.
doFinalMerge - Should we instruct the dgraph to perform a final merge after ingest? Setting it to true degrades ingest performance. If you're performing multiple bulk ingests, you should set this to false for all but the last ingest job.
doUpdateSpellingDictionary - Should we instruct the dgraph to update the spelling dictionaries after ingest? Setting it to true degrades ingest performance. If performing multiple bulk ingests, you should set this to false for all but the last ingest job.
socketTimeoutMillis - Timeout for connection to dgraph (in milliseconds).
errorCallback - Callback to invoke when there is an error from the dgraph. Not all errors are fatal. Cannot be null.
finishedCallback - Callback to invoke when the bulk ingest process finishes successfully. Cannot be null.
abortCallback - Callback to invoke when there is a fatal ingest error. Cannot be null.
statusCallback - Callback that is invoked periodically with ingest statistics. Cannot be null.
Throws:
java.io.IOException - if we could not create a connection to the dgraph.

BulkIngester

protected BulkIngester(java.io.DataInputStream din,
                       java.io.DataOutputStream dout,
                       ErrorCallback errorCallback,
                       FinishedCallback finishedCallback,
                       AbortCallback abortCallback,
                       StatusCallback statusCallback)
Alternative constructor for unit testing only. Does not use sockets. Do not use outside of testing

Parameters:
din -
dout -
errorCallback - ErrorCallback object to handle error conditions.
finishedCallback - FinishedCallback object to be called when ingestion finishes.
abortCallback - AbortCallback object to handle aborts.
statusCallback - StatusCallback object to handle status updates.
Method Detail

begin

public void begin()
           throws java.io.IOException
Spawn a thread to asynchronously read the data domain's responses.

Throws:
java.lang.IllegalStateException - if begin() has already been called.
java.io.IOException - when messages could not be sent to the dgraph.

endIngest

public void endIngest()
               throws java.io.IOException,
                      java.lang.InterruptedException
Terminates the response thread and closes the socket. Must be called after the client finishes sending data. This method does not return until the ingest has been terminated.

Throws:
java.lang.IllegalStateException - if you haven't called begin() first.
java.io.IOException - when messages could not be sent to the dgraph
java.lang.InterruptedException - when blocking was interrupted

sendRecord

public void sendRecord(Data.Record rec)
                throws java.io.IOException
Sends the provided record over the wire to the data domain. This will block if the data domain's input queue is full.

Parameters:
rec - The record to send.
Throws:
java.io.IOException - when messages could not be sent to the dgraph
java.lang.IllegalStateException - if begin() has not been called.
java.lang.IllegalArgumentException - if rec has not been initialized.

requestStatusUpdate

public void requestStatusUpdate()
                         throws java.io.IOException
Poll the data domain on its current status. This method blocks only on send. Receive is handled asynchronously.

Throws:
java.io.IOException

sendCancel

public void sendCancel()
                throws java.io.IOException
Tell the data domain that the client wants to cancel the bulk ingest process. If this happens, depending on the structure of the data being ingested, the data domain may be left in an inconsistent state, and a transaction rollback is recommended.

Throws:
java.io.IOException