As described in the Result Groups section of the Standard Query chapter, ATG Search performs grouping on the raw result list in order to avoid duplicate results. The settings that control grouping, along with other controls on the selection of the final results, are described in this section.
The values of these controls can be changed in two ways:
Query XML attributes override settings in the AEConfig.xml
file. The param
is the name of the parameter, which is pre-pended to the numeric value Value
.
Maximum Results by Type
Each statement result has a type
attribute that specifies the source of the statement. For unstructured statement results, the type is SENTENCE
. And for structured statement results, the type is either the field name, such as ROLE:GOAL
, or simply SOLUTION
. ATG Search controls how many results of each type can be returned, using the following parameters:
f2,o10,s10,t50
The parameters are:
f
—Maximum number of preferred answer statements to include in the final results.o
—Maximum number of structured statement results.s
—Maximum number of unstructured statement results.t
—Maximum number of all types that can be returned.
The o
parameter is the default value for any structured statement result type. Individual fields in structured documents can be defined separately, as shown below:
role:goal10,role:symptom10,role:fact2
Maximum Result Pool
The result grouping process operates on the pool of final results, the size of which is controlled by this parameter:
pool200
The pool
parameter is the maximum number of each major type of result to collect for grouping: unstructured statements, and structured statements.
Similar Statement Text Threshold
The group-by-statement feature (see Grouping by Statement in the Standard Query chapter) groups statement results by similar statement text. This feature uses a similarity metric to compute a value that quantifies how similar two statements are; this value is then compared to a numeric threshold, which is set by the following parameter:
sim1
The sim
parameter sets the value of the numeric threshold, and can be any integer from 0 to 100.
The similarity metric computes what percentage one statement is of the other, based on a strict sub-string match, which ignores case, white space, and punctuation. For example, consider these two statements:
If the installation failed, you probably have the wrong version.
The installation failed.
The first statement is 53 characters long, excluding white space and punctuation. The second statement is 21 characters long, and it is a sub-string of the first. So the similarity metric is 21/53, which equals 40%. If this value is greater than the threshold set by the sim
parameter, then these statements are deemed similar for grouping.
A sim
value of 0 means that any size sub-string will be considered similar. A value of 100 means that only identical strings are considered similar.
Altering Weight by Result Type
Normally, all statement results receive the same treatment in the relevancy calculation. However, at times it may be useful for certain statement types to be weighted higher or lower. For example, two identical statements from similar documents usually receive near identical relevancy. However, if the statements are from two different text fields (such as role:goal
and role:fact
), and there is reason to consider one field of more interest to users, varying the relevancy would be valuable. ATG Search supports these weighting factors with the following parameters:
f*1.0,o*1.0,s*1.0,ROLE:ID*2.0
The parameters are:
f*
--The weight (or multiplier) of preferred answer statement relevance.o*
--The weight of structured statement results.s*
--The weight of unstructured statement results.
A weight of 1.0 means the original (pure) relevancy is used. The ROLE:ID
field is double weighted, since a text search on a particular id strongly prefers the single document with that id, rather than other documents which might refer to it.
The o*
parameter is the default value for any structured statement result type. Individual structured types (or fields) can be defined separately, as shown below:
role:goal*1.2,role:symptom*1.1,role:fact*0.5
Returning Whole Fields in Result Text
Normally, the result text is the matching statement text plus some additional context for small sentences (if using the Extending Statement Result Text option). However, for structured content, which contains potentially multi-sentence fields of text, applications might want the entire text of the field returned as the result text. This behavior is controlled by the following parameter:
field0
The field
parameter holds a Boolean value which, if non-zero, means that the entire field text associated with the matching statement is returned.
Displaying Document Summary Text
Normally, the result text is the matching statement text plus some additional context for small sentences. However, some applications may not want to display this text, but simply display the static summary of the retrieved document. For example, a commerce shopping site might always want to display the product description rather than a matching field of text. This behavior is controlled by the following parameter:
sum0
The sum
parameter holds a Boolean value which, if non-zero, means that each result will contain no matching result text and the summary text should be used in its place.
Returning One Result per Document or Solution
Normally, ATG Search returns matching statements that may come from the same retrieved document, especially if the documents are large or have repetitive content. The group-by-document feature (see Grouping by Document in the Standard Query chapter) can collect results from the same document, providing a seemingly single-result-per-document display. However, in some applications, other grouping algorithms might be required in conjunction with the desire for a single result per indexed item. For example, commerce applications might want to sort by metadata, but restrict the results to one per item. This behavior is controlled by the following parameter:
onePerDoc0
The onePerDoc
parameter holds a Boolean value which, if non-zero, means that there will be only one result for document or item.
Similarly, the onePerSol
parameter holds a Boolean value which, if non-zero, means that there will be only one result for a structured document (such as a Knowledge solution or Commerce item).
onePerSol0
This feature takes effect before the final result pool is constructed, so more unique retrieved documents may be collected.