URL filter properties

You configure how the URL filter plugins are handled in the default.xml file.

Property Name Property Value
urlfilter.regex.file File name (default is crawl-urlfilter.txt). Specifies the file in the configuration directory containing regular expressions used by the urlfilter-regex (RegexURLFilter) plugin.
urlfilter.order Space-delimited list of URL filter class names (default is empty). Specifies the order in which URL filters are applied.
urlfilter.filter-seeds Boolean value (default is false). Specifies whether URL filtering should be applied to the seeds.

Interaction with crawl scope filtering

Keep in mind that the crawl scope filter (if configured) is applied before all other filters including the regular expressions in this file custom plugins. This means that once a URL has been filtered out by the crawl scope, it cannot be added by expressions in this file.