DP CLI flags

The DP CLI has a number of runtime flags that control its behavior.

You can list these flags if you use the --help flag. Each flag has a full name that begins with two dashes (such as --maxRecords) and an abbreviated version with one dash (such as -m).

The --devHelp flag displays flags that are intended for use by Oracle internal developers and support personnel. These flags are therefore not documented in this guide.

Note:

All flag names are case sensitive.
The CLI flags are:
CLI Flag Description
-a, --all Runs data processing on all Hive tables in all Hive databases.
-bl, --blackList <blFile> Specifies the file name for the blacklist used to filter out Hive tables. The tables in this list are ignored and not provisioned. Must be used with the --database flag.
-clean, --cleanAbortedJobs Cleans up artifacts left over from incomplete workflows.
-d, --database <dbName> Runs Data Processing using the specified Hive database. If a Hive table is not specified, runs on all Hive tables in the Hive database (note that tables with the skipAutoProvisioning property set to true will not be provisioned).

For Refresh and Incremental updates, can be used to override the default database in the data set's metadata.

-devHelp, --devHelp Displays usage information for flags intended to be used by Oracle support personnel.
-disableSearch, --disableSearch Turns off Dgraph indexing for search. This means that DP Discovery disables record search and value search on all the attributes, irrespective of the average String length of the values. This flag can be used only for provisioning workflows (for new data sets created from Hive tables) and for refresh workflows (with the --refreshData flag). This flag cannot be used in conjunction with the --incrementalUpdate flag.
-e, --runEnrichment Runs the Data Enrichment modules (except for the modules that never automatically run during the sampling phase). Overrides the runEnrichment property in the edp.properties configuration file.

You can also exclude some modules with the CLI --excludePlugins flag.

-ep, --excludePlugins <exList> Specifies a list of Data Enrichment modules to exclude when Data Enrichments are run.
-h, --help Displays usage information for flags intended to be used by customers.
-incremental, --incrementalUpdate <logicalName> <filter> Performs an incremental update on a BDD data set from the original Hive table, using a filter predicate to select the new records. Optionally, can use the --table and --database flags.
-m, --maxRecords <num> Specifies the maximum number of records in the sample size of a data set (that is, the number of sampled records from the source Hive table). In effect, this sets the maximum number of records in a BDD data set. Note that this setting controls the sample size for all new data sets and it also controls the sample size resulting from transform operations (such as during a Refresh update on a data set that contains a transformation script). Overrides the CLI maxRecordsForNewDataSet property in the edp.properties configuration file.
-mwt, --maxWaitTime <secs> Specifies the maximum waiting time (in seconds) for each table processing to complete. The next table is processed after this interval or as soon as the data ingesting is completed.

This flag controls the pace of the table processing, and prevents Hadoop and Spark cluster nodes, as well as the Dgraph cluster nodes from being flooded with a large number of simultaneous requests.

-ping, --pingCheck Ping checks the status of components that Data Processing needs.
-refresh, --refreshData <logicalName> Performs a full data refresh on a BDD data set from the original Hive table. Optionally, you can use the --table and --database flags.
-t, --table <tableName> Runs data processing on the specified Hive table. If a Hive database is not specified, assumes the default database. Note that the table is skipped in these cases: it does not exist, is empty, or has the table property skipAutoProvisioning set to true.

For Refresh and Incremental updates, can be used to override the default source table in the data set's metadata.

-v, --versionNumber Prints the version number of the current iteration of the Data Processing component within Big Data Discovery.
-wl, --whiteList <wlFile> Specifies the file name for the whitelist used to select qualified Hive tables for processing. Each table on this list is processed by the Data Processing component and is ingested into the Dgraph as a BDD data set. Must be used with the --database flag.
UpgradeDatesetInventory <fromVersion> Upgrades the DataSet Inventory from a given BDD version to the latest version. Note that this subcommand is called by the upgrade script and should not be run interactively.
UpgradeSampleFiles <fromVersion> Upgrades the sample files (produced as a result of a previous workflow) from a given BDD version to the latest version. Note that this subcommand is called by the upgrade script and should not be run interactively.