Using Multiple Runs

The multiple run feature helps to improve performance of the job, and takes advantage of multiple processors. The multiple run feature applies to both Sequence Patterns and Rule Patterns. Each run is executed in a separate thread.

Figure 10-2 Rule Matcher Job Editor for Multiple Run

To create a multiple run job, implement the following steps:

On the Datasets tab, select the first dataset in the Datasets list. The Multiple Run check box enables.
Select the Multiple Run check box.
The Multiple Run Attribute field enables.
Select the attribute of the dataset that is used to separate the data.
Typically, this is the attribute that represents the focus of the pattern. The Multiple Run Values and Job Thread Count field enables.
Enter value separators in a comma delimited list in the Multiple Run Values.
For example, for a Security-focused pattern, the data miner determines that the securities can be divided into four relatively equal buckets by dividing them alphabetically. The different buckets are A-G, H-M, N-S, and T-Z. The Multiple Run Values should be populated with G,M,S. The first thread will handle securities where the Security Internal ID is less than or equal to G. The second thread handles securities where the Security Internal ID is greater than G and less than or equal to M. The third thread will handle securities where the Security Internal ID is greater than M and less than or equal to S. The fourth thread will handle securities where the Security Internal ID is greater than S.
In the Job Thread Count field, enter the desired number of threads for this job.
This value should be less than or equal to the number of buckets specified in the Multiple Run Values field. In the previous example, four threads would be the maximum value. This value should take into account the number of buckets specified in the Multiple Run Values field, and the number of processors available on the detection machine.

Note:
Repeat the steps for each dataset in the job. The same set of values should be used for the Multiple Run Attribute, Multiple Run Values, and Job Thread Count fields for each dataset.