A whitelist specifies which Hive tables should be processed in Big Data Discovery, while a blacklist specifies which Hive tables should be ignored during data processing.
Both lists are optional when running the DP CLI. For example, if you manually run the DP CLI with the --table flag to process a specific table, you do not have to specify the lists.
Both default lists are essentially empty — they include commented out samples of regular expressions that you can use as patterns for your tables.
--whiteList cli_whitelist.txt
--blackList cli_blacklist.txt
The --whiteList and the --blackList flags take a corresponding text file as their argument. Each text file contains one or more regular expressions (regex). There should be one line per regex pattern in the file. The patterns are only used to match Hive table names (that is, the match is successful as long as there is one matched pattern found).
The default whitelist and blacklist contain commented out sample regular expressions that you can use as patterns for your tables. This means that the lists are essentially empty. You must edit the whitelist file to include at least one regular expression that specifies the tables to be ingested. Similarly, to exclude any tables, edit the blacklist.
^bdd.*
To summarize, the whitelist is parsed first, which generates a list of Hive tables to process, and the blacklist is parsed second, which generates a list of skipped Hive table names. Typically, the names from the blacklist names modify those generated by the whitelist. If the same name appears in both lists, then that table is not processed, that is, the blacklist can, in effect, "remove" names from the whitelist.
^.*_bdd$
claims_bdd
When you run the DP CLI with both the --whiteList and --blackList flags, all the *_bdd tables will be processed except for the claims_bdd table.