This topic describes how to run a Refresh update operation.
To run a Refresh update on a data set:
[2016-06-24T09:56:22.963-04:00] [DataProcessing] [INFO] [] [org.apache.spark.Logging$class] [tid:main] [userID:fcalvill] client token: N/A diagnostics: N/A ApplicationMaster host: 10.152.105.219 ApplicationMaster RPC port: 0 queue: root.fcalvill start time: 1466776490743 final status: SUCCEEDED tracking URL: http://bus2014.example.com:8088/proxy/application_1466716670116_0002/A user: fcalvill Refreshing existing collection: MdexCollectionIdentifier{ databaseName=edp_cli_edp_ad9a93eb-fbec-49ca-bdc9-8ac897dd5c8f, collectionName=edp_cli_edp_ad9a93eb-fbec-49ca-bdc9-8ac897dd5c8f} Collection key for new record: MdexCollectionIdentifier{ databaseName=refreshed_edp_a284bd0c-23fe-4d26-9e92-cbfc22b1555e, collectionName=refreshed_edp_a284bd0c-23fe-4d26-9e92-cbfc22b1555e} data_processing_CLI finished with state SUCCESS
EDP: DatasetRefreshConfig{hiveDatabase=, hiveTable=, collectionToRefresh=MdexCollectionIdentifier{databaseName=edp_cli_edp_ad9a93eb-fbec-49ca-bdc9-8ac897dd5c8f, collectionName=edp_cli_edp_ad9a93eb-fbec-49ca-bdc9-8ac897dd5c8f}, newCollectionId=MdexCollectionIdentifier{databaseName=refreshed_edp_a284bd0c-23fe-4d26-9e92-cbfc22b1555e, collectionName=refreshed_edp_a284bd0c-23fe-4d26-9e92-cbfc22b1555e}, op=REFRESH_DATASET}
hiveDatabase
and hiveTable
are blank because the --database and --table flags were not used. In this case, the Refresh update operation uses the same Hive table and database that were used when the data set was first created.collectionToRefresh
is name of the data set that was refreshed. This name is the same as the Refreshing existing collection
field in the stdout listed above.newCollectionId
is an internal name for the refreshed data set. This name will not appear in the Studio UI (the original Data Set Logical Name will continue to be used as it is a persistent name). This name is also the same as the Collection key for new record
field in the stdout listed above.You can also check the Dgraph HDFS Agent log for the status of the Dgraph ingest operation.
Note that future Refresh updates on this data set will continue to use the same Data Set Logical Name. You will also use this name if you set up a Refresh update cron job for this data set.