DP CLI configuration

The DP CLI has a configuration file, edp-cli.properties, that sets some Data Processing properties for Provisioning and Refresh update workflows.

The edp-cli.properties file is located in the $BDD_HOME/dataprocessing/edp_cli/config directory. Some of the default values for the properties are populated from the bdd.conf installation configuration file. After installation, you can change the CLI configuration parameters by opening the edp-cli.properties file with a text editor.

Note: The Workflow Manager has a similar configuration file, named edp.properties, that is also used for all workflows launched by the DP CLI. For details on that file, see DP workflow properties.

Workflows that use edp-cli.properties

The following properties in the edp-cli.properties file are used by Provisioning and Refresh workflows:
  • Provisioning workflows:
    • maxRecordsForNewDataSet
    • runEnrichment
    • defaultLanguage
    • datasetAccessType
  • Refresh update workflows:
    • maxRecordsForNewDataSet
    • datasetAccessType

These workflows use a combination of the above properties from the edp-cli.properties file and the rest of the properties from the Workflow Manager's edp.properties file. Therefore, you can change the edp.properties file for other properties used by these workflows (such as Kerberos properties) and also for properties used by other types of workflows (such as Incremental update workflows).

Data Processing properties for Provisioning and Refresh workflows

These properties can be changed to affect the creation of Provisioning and Refresh data sets.
Data Processing Property Description
maxRecordsForNewDataSet Specifies the maximum number of records in the sample size of a new data set (that is, the number of sampled records from the source Hive table). In effect, this sets the maximum number of records in a BDD data set. Note that this setting controls the sample size for all new data sets and it also controls the sample size resulting from transform operations (such as during a Refresh update on a data set that contains a transformation script).

The default is set by the MAX_RECORDS property in the bdd.conf file. The CLI --maxRecords flag can override this setting.

runEnrichment Specifies whether to run the Data Enrichment modules. The default is set by the ENABLE_ENRICHMENTS property in the bdd.conf file.

You can override this setting by using the CLI --runEnrichment flag. The CLI --excludePlugins flag can also be used to exclude some of the Data Enrichment modules.

defaultLanguage Sets the language for all attributes in the created data set. The default is set by the LANGUAGE property in the bdd.conf file. For the supported language codes, see Supported languages.
datasetAccessType Sets the access type for the data set, which determines which Studio users can access the data set in the Studio UI. This property takes one of these case-insensitive values:
  • public means that all Studio users can access the data set. This is the default.
  • private means that only designated Studio users and groups can access the data set. The users and groups are specified in attributes set in the data set's entry in the DataSet Inventory.