This section addresses indexing configuration for large individual content sets, such as those containing repositories or file sets involving millions of items.

You may want to configure Oracle ATG Web Commerce Search to perform parallel indexing. This configuration uses more than the normal resources to perform indexing tasks, which might otherwise take longer than is convenient.

Multiple content sets cannot be indexed in parallel. If you have a large individual content set, configure parallel indexing by changing the defaultActivePhysicalPartitions property of the \atg\Search\Routing\RoutingIndexService component. This setting applies to all search projects.

Search uses the number of search engines you specify to index content. These engines are distributed across the indexing environment’s available host machines. Make sure that you have enough host machines configured for the indexing environment; for example, if you configure defaultActivePhysicalPartitions=4, make sure that your machines have at least 4 free CPUs between them. Use dedicated machines for indexing; do not use machines for both indexing and answering.

Parallel indexing with multiple machines requires that all machines have access to the content to be indexed. Repository content is automatically streamed to the engines; however, if you are indexing file system content, the path to the content must be a shared location accessible to all of the engine machines.

Note that while parallel indexing can significantly improve performance, there is not a 1:1 ratio between the number of engines and processing speed, due to disk and network overhead, and returns diminish as more engines are added. Moving from one to two engines may cut indexing time in half, but adding another two engines may not lead to such a significant gain.