Internal settings

The third part of bdd.conf contains internal settings either required by the installer or intended for use by Oracle Support. Note that the installer will automatically add properties to this section when it runs.

Warning: Don't modify any properties in this part unless instructed to by Oracle Support.
Configuration property Description
DP_POOL_SIZE The maximum number of concurrent calls Studio can make to Data Processing.
DP_TASK_QUEUE_SIZE The maximum number of jobs Studio can add to the Data Processing queue.
MAX_INPUT_SPLIT_SIZE The maximum partition size used for Spark inputs, in MB. This controls the size of the blocks of data handled by Data Processing jobs.

Partition size directly affects Data Processing performance. When partitions are smaller, more jobs run in parallel and cluster resources are used more efficiently. This improves both speed and stability.

The default value is 32. This amount should be sufficient for most clusters, with a few exceptions:
  • If your Hadoop cluster has a very large processing capacity and most of your data sets are small (around 1GB), you can decrease this value.
  • In rare cases, when data enrichments are enabled, the enriched data set in a partition can become too large for its YARN container to handle. If this occurs, you can decrease this value to reduce the amount of memory each partition requires.

Note that this property overrides the HDFS block size used in Hadoop.

SPARK_DYNAMIC_ALLOCATION Determines whether Data Processing dynamically computes the resources allocated to the Spark executors during processing. This value should always be set to true.
false is only intended for use by Oracle Support. When set, Data Processing allocates Spark resources according to the static configuration defined by the following properties:
  • SPARK_DRIVER_CORES
  • SPARK_DRIVER_MEMORY
  • SPARK_EXECUTORS
  • SPARK_EXECUTOR_CORES
  • SPARK_EXECUTOR_MEMORY
SPARK_DRIVER_CORES The number of cores used by the Spark job driver.
SPARK_DRIVER_MEMORY The maximum memory heap size for the Spark job driver. This must be in the same format as JVM memory settings; for example, 512m or 2g.
SPARK_EXECUTORS The total number of Spark executors to launch.
SPARK_EXECUTOR_CORES The number of cores for each Spark executor.
SPARK_EXECUTOR_MEMORY The maximum memory heap size for each Spark executor. This must be in the same format as JVM memory settings; for example, 512M or 2g.
RECORD_SEARCH_THRESHOLD The minimum number of characters the average value of a String attribute must contain to be record searchable.
VALUE_SEARCH_THRESHOLD The minimum number of characters the average value of a String attribute must contain to be value searchable.
BDD_VERSION The version of BDD. This property is intended for use by Oracle Support and shouldn't be changed.
BDD_RELEASE_VERSION The BDD hotfix or patch version. This property is intended for use by Oracle Support and shouldn't be changed.