Dgraph HDFS Agent logging

The Dgraph HDFS Agent writes its stdout/stderr output to a log file.

The Dgraph HDFS Agent --out flag specifies the file name and path of the Dgraph HDFS Agent's stdout/stderr log file. This log file is used for both import (ingest) and export operations.

The name and location of the output log file is set at installation time via the AGENT_OUT_FILE parameter of the bdd.conf configuration file. Typically, the log name is dgraphHDFSAgent.out and the location is the $BDD_HOME/logs directory.

The Dgraph HDFS Agent log is especially important to check if you experience problems with loading records at the end of a Data Processing workflow. Errors received from the Dgraph (such as rejected records) are logged here.

Ingest operation messages

The following are sample messages for a successful ingest operation for a data set. The messages have been edited for readability:
New import request received: MdexCollectionIdentifier{
   databaseName=edp_cli_edp_4dd5ac28-2e85-4efc-a3c2-391b6a78f69c, 
   collectionName=edp_cli_edp_4dd5ac28-2e85-4efc-a3c2-391b6a78f69c}, 
   ... 
   requestOrigin: FROM_DATASET
Received request for database edp_cli_edp_4dd5ac28-2e85-4efc-a3c2-391b6a78f69c
Starting ingest for: MdexCollectionIdentifier{
   databaseName=edp_cli_edp_4dd5ac28-2e85-4efc-a3c2-391b6a78f69c, 
   collectionName=edp_cli_edp_4dd5ac28-2e85-4efc-a3c2-391b6a78f69c}, 
   ...
   requestOrigin: FROM_DATASET
Finished reading 9983 records for MdexCollectionIdentifier{
   databaseName=edp_cli_edp_4dd5ac28-2e85-4efc-a3c2-391b6a78f69c, 
   collectionName=edp_cli_edp_4dd5ac28-2e85-4efc-a3c2-391b6a78f69c}, 
   ... 
   requestOrigin: FROM_DATASET
createBulkIngester edp_cli_edp_4dd5ac28-2e85-4efc-a3c2-391b6a78f69c
sendRecordsToIngester 9983
closeBulkIngester
Ingest finished with 9983 records committed and 0 records rejected. 
   Status: INGEST_FINISHED. 
   Request info: MdexCollectionIdentifier{
   databaseName=edp_cli_edp_4dd5ac28-2e85-4efc-a3c2-391b6a78f69c, 
   collectionName=edp_cli_edp_4dd5ac28-2e85-4efc-a3c2-391b6a78f69c}, 
   location: /user/bdd/edp/data/.dataIngestSwamp/..., 
   user name: fcalvill, 
   notification: {"workflowName":"CLIDataLoad",
   "sourceDatabaseName":null,
   "sourceDatasetKey":null,
   "targetDatabaseName":
   "edp_cli_edp_4dd5ac28-2e85-4efc-a3c2-391b6a78f69c",
   "targetDatasetKey":"edp_cli_edp_4dd5ac28-2e85-4efc-a3c2-391b6a78f69c",
   "ecid":"0000LMSUWCm7ADkpSw4Eyc1NSxM1000000",
   "status":"IN_PROGRESS",
   "startTime":1467209085630,
   "timestamp":1467209136298,
   "progressPercentage":0.0,
   "errorMessage":null,
   "trackingUrl":null,
   "properties":{"dataSetDisplayName":"WarrantyClaims",
   "isCli":"true"}}, 
   actualEcid: 0000LMSUWCm7ADkpSw4Eyc1NSxM1000000, 
   requestOrigin: FROM_DATASET
Notification server url: http://busgg2014.us.oracle.com:7003/bdd/v1/api/workflows
About to send notification
Terminating
Notification{workflowName=CLIDataLoad, 
   sourceDatabaseName=null, sourceDatasetKey=null, 
   targetDatabaseName=edp_cli_edp_4dd5ac28-2e85-4efc-a3c2-391b6a78f69c, 
   targetDatasetKey=edp_cli_edp_4dd5ac28-2e85-4efc-a3c2-391b6a78f69c, 
   ecid=0000LMSUWCm7ADkpSw4Eyc1NSxM1000000, 
   status=SUCCEEDED, 
   startTime=1467209085630, 
   timestamp=1467209222088, 
   progressPercentage=100.0, 
   errorMessage=null, 
   properties={dataSetDisplayName=WarrantyClaims, isCli=true}}
Notification sent successfully
Terminating
...
Some events in the sample log are:
  1. The Data Processing workflow has written a set of Avro files in the /user/bdd/edp/data/.dataIngestSwamp directory in HDFS.
  2. The Dgraph HDFS Agent starts an ingest operation for the data set.
  3. The createBulkIngester operation is used to instantiate a Bulk Load ingester instance for the data set.
  4. The Dgraph HDFS Agent reads 9983 records from the Avro files.
  5. The sendRecordsToIngester operation sends the 9983 records to the Dgraph's ingester.
  6. The Bulk Load instance is closed with the closeBulkIngester operation.
  7. The Status: INGEST_FINISHED message signals the end of the ingest operation. The message also lists the number of successfully committed records and the number of rejected records. In addition, the Dgraph HDFS Agent notifies Studio that the ingest has finished, at which point Studio updates the status attribute of the DataSet Inventory with the final status of the ingest operation. The status should be FINISHED for a successful ingest or ERROR if an error occurred.
  8. The Dgraph HDFS Agent sends a final notification to Studio that the workflow has finished, with a status of SUCCEEDED.

Note that throughout the workflow, Dgraph HDFS Agent constantly sends notification updates to Studio, so that Studio can report on the progress of the workflow to the end user.

Rejected records

It is possible for a certain record to contain data which cannot be ingested or can even crash the Dgraph. Typically, the invalid data will consist of invalid XML characters. In this case, the Dgraph cannot remove or cleanse the invalid data, it can only skip the record with the invalid data. The interface rejects non-XML 1.0 characters upon ingest. That is, a valid character for ingest must be a character according to production 2 of the XML 1.0 specification. If an invalid character is detected, the record with the invalid character is rejected with this error message in the Dgraph HDFS Agent log:
Received error message from server: Record rejected: Character <c> is not legal in XML 1.0

A source record can also be rejected if it is too large. There is a limit of 128MB on the maximum size of a source record. An attempt to ingest a source record larger than 128MB fails and an error is returned (with the primary key of the rejected record), but the bulk load ingest process continues after that rejected record.

Logging for new and deleted attributes

The Dgraph HDFS Agent logs the names of attributes being created or deleted as result of transforms. For example:
Finished reading 499 records for Collection name: default_edp_2a0122f2-4d15-46bf-9669-21333442f10b
Adding attributes to collection: default_edp_2a0122f2-4d15-46bf-9669-21333442f10b
  [NumInStock]
Added attributes to collection: default_edp_2a0122f2-4d15-46bf-9669-21333442f10b
...
Deleting attributes from collection: default_edp_2a0122f2-4d15-46bf-9669-21333442f10b
  [OldPrice2]
Deleted attributes from collection: default_edp_2a0122f2-4d15-46bf-9669-21333442f10b

In the example, the NumInStock attribute was added to the data set and the OldPrice2 attribute was deleted.