This section addresses indexing configuration for large individual content sets, such as those containing repositories or file sets involving millions of items.

You may want to configure Search to perform parallel indexing. This configuration uses more than the normal resources to perform estimation and indexing tasks, which might otherwise take longer than is convenient.

Multiple content sets cannot be indexed in parallel. If you have a large individual content set, configure parallel indexing by changing the defaultActivePhysicalPartitions setting in the DAF\Search\Routing\RoutingServiceIndex.properties file. The Search routing module uses the specified number of Search engines to index content.

Note that while parallel indexing can significantly improve performance, there is not a 1:1 ratio between the number of engines and processing speed, due to disk and network overhead, and returns diminish as more engines are added. Moving from one engine to two engines may cut indexing time in half, but adding another two engines may not lead to such a significant gain.

Consider using parallel indexing only if you have sufficient hardware; since each indexing engine requires one CPU, you should not set defaultActivePhysicalPartitions higher than the number of CPUs normally available, or performance will suffer. Also, keep in mind that the more engines you run on any given machine, the greater the demands placed on the physical disk. Separate engines should either have separate physical disks or share a RAID drive.

 
loading table of contents...