This example gives an overview of the various DP logs that are generated when you run a workflow with the DP CLI.
The example assumes that the Hive administrator has created a table named masstowns (which contains information about towns and cities in Massachusetts). The workflow will be run with the DP CLI, which is described in DP Command Line Interface Utility.
./data_processing_CLI --database default --table masstowns --maxRecords 1000
The --table flag specifies the name of the Hive table, the --database flag states that the table in is the Hive database named "default", and the --maxRecords flag sets the sample size to be a maximum of 1,000 records.
Command stdout
... EdpEnvConfig{endecaServer=http://web07.example.oracle.com:7003/endeca-server/, edpDataDir=/user/bdd/edp/data, ... ProvisionDataSetFromHiveConfig{hiveDatabaseName=default, hiveTableName=masstowns, newCollectionId=MdexCollectionIdentifier{databaseName= edp_cli_edp_ac680edd-c25f-4b9d-8cab-11441c5a3d2e, collectionName=edp_cli_edp_ac680edd-c25f-4b9d-8cab-11441c5a3d2e}, runEnrichment=false, maxRecordsForNewDataSet=1000, disableTextSearch=false, languageOverride=en, operation=PROVISION_DATASET_FROM_HIVE, transformScript=, accessType=public_default, autoEnrichPluginExcludes=[Ljava.lang.String;@71034e3b} ProvisionDataSetFromHiveConfig{notificationName=CLIDATALOAD, ecid=0000LM3rDDu7ADkpSw4Eyc1NROXb000001, startTime=1466796128122, properties={dataSetDisplayName=Taxi_Data, isCli=true}} New collection name = MdexCollectionIdentifier{ databaseName=edp_cli_edp_ac680edd-c25f-4b9d-8cab-11441c5a3d2e, collectionName=edp_cli_edp_ac680edd-c25f-4b9d-8cab-11441c5a3d2e} data_processing_CLI finished with state SUCCESS ...
The operation field lists the operation type of the Data Processing workflow. In this example, the operation is PROVISION_DATASET_FROM_HIVE, which means that it will create a new BDD data set from a Hive table.
$BDD_HOME/logs/edp logs
$BDD_HOME/logs/edp
directory has three logs. The owner of one of them is the user ID of the person who ran the DP CLI, while the owner of other two logs is the user yarn:
YARN logs
EDP: ProvisionDataSetFromHiveConfig{hiveDatabaseName=default, hiveTableName=masstowns, newCollectionId=MdexCollectionIdentifier{ databaseName=edp_cli_edp_ac680edd-c25f-4b9d-8cab-11441c5a3d2e, collectionName=edp_cli_edp_ac680edd-c25f-4b9d-8cab-11441c5a3d2e}}
ProvisionDataSetFromHiveConfig
is the type of DP workflow that was run.hiveDatabaseName
lists the name of the Hive database (default in this example).hiveTableName
lists the name of the Hive table that was provisioned (masstowns in this example).newCollectionId
lists the name of the new data set and its Dgraph database (both names are the same).Clicking on History in the Tracking UI field displays the job history. The information in the Application Overview panel includes the name of the name of the user who ran the job, the final status of the job, and the elapsed time of the job. FAILED jobs will have error information in the Diagnostics field.
Clicking on logs in the Logs field displays the stdout
and stderr
output. The stderr
output will be especially useful for FAILED jobs. In addition, the stdout
section has a link (named Click here for the full log) that displays more detailed output information.
Dgraph HDFS Agent log
Received request for database edp_cli_edp_ac680edd-c25f-4b9d-8cab-11441c5a3d2e Starting ingest for: MdexCollectionIdentifier{ databaseName=edp_cli_edp_ac680edd-c25f-4b9d-8cab-11441c5a3d2e, collectionName=edp_cli_edp_ac680edd-c25f-4b9d-8cab-11441c5a3d2e}, ... createBulkIngester edp_cli_edp_ac680edd-c25f-4b9d-8cab-11441c5a3d2e Finished reading 1004 records for MdexCollectionIdentifier{ databaseName=edp_cli_edp_ac680edd-c25f-4b9d-8cab-11441c5a3d2e, collectionName=edp_cli_edp_ac680edd-c25f-4b9d-8cab-11441c5a3d2e}, ... sendRecordsToIngester 1004 closeBulkIngester Ingest finished with 1004 records committed and 0 records rejected. Status: INGEST_FINISHED. Request info: MdexCollectionIdentifier{ databaseName=edp_cli_edp_ac680edd-c25f-4b9d-8cab-11441c5a3d2e, collectionName=edp_cli_edp_ac680edd-c25f-4b9d-8cab-11441c5a3d2e}, ... Notification server url: http://busgg2014.us.oracle.com:7003/bdd/v1/api/workflows About to send notification Terminating Notification{workflowName=CLIDataLoad, sourceDatabaseName=null, sourceDatasetKey=null, targetDatabaseName=edp_cli_edp_ac680edd-c25f-4b9d-8cab-11441c5a3d2e, targetDatasetKey=edp_cli_edp_ac680edd-c25f-4b9d-8cab-11441c5a3d2e, ecid=0000LM3rDDu7ADkpSw4Eyc1NROXb000001, status=SUCCEEDED, startTime=1466796128122, timestamp=1466796195365, progressPercentage=100.0, errorMessage=null, properties={dataSetDisplayName=masstowns, isCli=true}} Notification sent successfully Terminating
The ingest operation is complete when the final Status: INGEST_FINISHED message is written to the log.
Dgraph out log
dgraph.out
) will have these bulk_ingest messages:
Start ingest for collection: edp_cli_edp_ac680edd-c25f-4b9d-8cab-11441c5a3d2e for database edp_cli_edp_ac680edd-c25f-4b9d-8cab-11441c5a3d2e Starting a bulk ingest operation for database edp_cli_edp_ac680edd-c25f-4b9d-8cab-11441c5a3d2e batch 0 finish BatchUpdating status Success for database edp_cli_edp_ac680edd-c25f-4b9d-8cab-11441c5a3d2e Ending bulk ingest at client's request for database edp_cli_edp_ac680edd-c25f-4b9d-8cab-11441c5a3d2e - finalizing changes Bulk ingest completed: Added 1004 records and rejected 0 records, for database edp_cli_edp_ac680edd-c25f-4b9d-8cab-11441c5a3d2e Ingest end - 0.584MB in 2.010sec = 0.291MB/sec for database edp_cli_edp_ac680edd-c25f-4b9d-8cab-11441c5a3d2e
At this point, the data set records are in the Dgraph and the data set can be viewed in Studio.
Studio log
Similar to workflows run from the DP CLI, Studio-generated workflows also produce logs in the $BDD_HOME/logs/edp
directory, as well as YARN logs, Dgraph HDFS Agent logs, and Dgraph out logs.
In addition, Studio workflows are also logged in the $BDD_DOMAIN/servers/<serverName>/logs/bdd-studio.log
file.