Attribute searchability thresholds

Two properties in the bdd.conf file set the threshold that determines whether a string attribute should be record searchable and value searchable.

When you install BDD, the bdd.conf configuration file has these properties that control searchability for all string attributes:
  • The RECORD_SEARCH_THRESHOLD property sets the threshold to enable record search for the attribute. The attribute is configured as record searchable if its average string length is greater than the threshold.
  • The VALUE_SEARCH_THRESHOLD property sets the threshold to enable value search for the attribute. The attribute is configured as value searchable if its average string length is equal to or less than the threshold.

In both cases, "average string length" refers to the average string length of all the values in that attribute.

This topic describes how you can change the installed threshold values to new ones.

To configure the attribute searchability thresholds:

  1. On the Admin Server, use a text editor to open bdd.conf in the $BDD_HOME/BDD_manager/conf directory.

    You may want to make a back up of the file before editing it.

  2. Change the settings of the RECORD_SEARCH_THRESHOLD and/or VALUE_SEARCH_THRESHOLD properties. The value must be a Java integer.
    For example:
    ## The threshold length of a Dgraph attribute to enable record-search
    RECORD_SEARCH_THRESHOLD=250
    
    ## The threshold length of a Dgraph attribute to enable value-search
    VALUE_SEARCH_THRESHOLD=150
  3. Go to $BDD_HOME/BDD_manager/bin and run this command:
    ./bdd-admin.sh publish-config bdd <path>
    where <path> is the absolute path to the bdd.conf file.
    The command will publish the bdd.conf file to all BDD nodes, and also to the HDFS /user/$HDFS_DP_USER_DIR/edp/data directory.
  4. Restart your BDD cluster so the changes take effect in the BDD nodes:
    ./bdd-admin.sh restart
    Note that you do not have to restart your Hadoop cluster.

When a Data Processing workflow runs, it will read the bdd.conf in HDFS to obtain the attribute searchability thresholds for its discovery phase. If the workflow cannot find or read the file for any reason, then it will use a hardcoded value of 200 for both thresholds.