Creating wildcard filters

The WildcardFilter class specifies a wildcard as an inclusion or exclusion filter.

A WildcardFilter is a filter that applies a wildcard to a particular property. The wildcard matcher uses the question-mark (?) character to represent a single wildcard character and the asterisk (*) to represent multiple wildcard characters. Matching is case insensitive: this is not configurable (If case sensitivity is required, consider using a regular expression). In the example below, the filter applies to the Endeca.FileSystem.Name property.

To create a wildcard filter:

  1. Make sure that you have created a SourceConfig and a CrawlConfig.
  2. Instantiate a new, empty WildcardFilter object:
    WildcardFilter filter = new WildcardFilter();
  3. Call the setPropertyName() method (inherited from the Filter class) to set the name of the property against which the filter is applied:
    // filter on the file name
    filter.setPropertyName("Endeca.FileSystem.Name");
  4. Use the setWildcard() method to set the wildcard:
    // exclude Word files
    filter.setWildcard("*.doc");
  5. Use the setScope() method (inherited from the Filter class) to set the filter scope. You can set the scope to files (as in the following example), or to folders (FilterScope.DIRECTORY).
    // set the scope of the filter for only files
    filter.setScope(FilterScope.FILE);
  6. Create a list of Filter objects and use the add() method (inherited from the List interface) to add the wildcard filter.
    List<Filter> filterList = new ArrayList<Filter>();  
    filterList.add(filter);
  7. Use the SourceConfig.setExcludeFilters() method to set the populated list in the SourceConfig configuration object. If this were an inclusion filter, you would use the SourceConfig.setIncludeFilters() method instead.
    // Set the filter in the source configuration.
    sourceConfig.setExcludeFilters(filterList);
  8. Use the CrawlConfig.setSourceConfig() method to set the populated SourceConfig in the main CrawlConfig configuration object.
    // Set the source config in the crawl configuration.
    crawlConfig.setSourceConfig(sourceConfig);

Note that the WildcardFilter class has a getWildcard() method to retrieve a wildcard value. In addition, the SourceConfig class has the getExcludeFilters() and getIncludeFilters() methods to retrieve the filters from the source configuration.