A whitelist specifies which Hive tables should be processed in Big Data Discovery, while a blacklist specifies which Hive tables should be ignored during data processing.
Both files include commented-out samples of regular expressions that you can use as patterns for your tables.
--whiteList cli_whitelist.txt
--blackList cli_blacklist.txt
Both lists are optional when running the DP CLI. However, you use the --database flag if you want to use one or both of the lists.
If you manually run the DP CLI with the --table flag to process a specific table, the whitelist and blacklist validations will not be applied.
The --whiteList and the --blackList flags take a corresponding text file as their argument. Each text file contains one or more regular expressions (regex). There should be one line per regex pattern in the file. The patterns are only used to match Hive table names (that is, the match is successful as long as there is one matched pattern found).
The default whitelist and blacklist contain commented-out sample regular expressions that you can use as patterns for your tables. You must edit the whitelist file to include at least one regular expression that specifies the tables to be ingested. The blacklist by default excludes all tables with the .+ regex, which means you have to edit the blacklist if you want to exclude only specific tables.
For example, suppose you wanted to process any table whose name started with bdd, such as bdd_sales. The whitelist would have this regex entry:
^bdd.*
You could then run the DP CLI with the whitelist, and not specify the blacklist.
To summarize, the whitelist is parsed first, which generates a list of Hive tables to process, and the blacklist is parsed second, which generates a list of skipped Hive table names. Typically, the names from the blacklist names modify those generated by the whitelist. If the same name appears in both lists, then that table is not processed, that is, the blacklist can, in effect, remove names from the whitelist.
^.*_bdd$
claims_bdd
When you run the DP CLI with both the --whiteList and --blackList flags, all the *_bdd tables will be processed except for the claims_bdd table.