Glossary

Adaptor	See Dictionary Adaptor.
BNF	Backus Naur Form. John Backus and Peter Naur introduced for the first time a formal notation to describe the syntax of a given language.
Boolean AND	Logical operator which evaluates to true if all of its operands also evaluate to true, otherwise it evaluates to false.
Boolean operator	Logical operator which evaluates to true or false; see Boolean AND, OR, and NOT.
Boolean NOT	Logical operator which evaluates to true if all of its operands evaluate to false, otherwise it evaluates to false.
Boolean OR	Logical operator which evaluates to true if any of its operands evaluate to true, otherwise it evaluates to false.
Boolean expression	Mathematical formula consisting of Boolean operators and operands, where the operands can be other Boolean expressions or simple logical tests.
Boolean mathematics	Simple approximation of Boolean expressions where a plus sign indicates a required test and a minus sign indicates a negated (or subtracted) test.
Category	Also known as topics, represents a classification of text content, typically by its subject matter.
Compositional phrase	Lexical phrase whose meaning is derived directly from its component terms; see Non-compositional phrase.
Content value	The importance or significance of a term in the content. High-content means the term is important, and low-content means the term is insignificant.
Derivational morphology	Analyzing a form of word down to its base root, removing suffixes and prefixes. The form’s meaning and features derive from the root and the interpretation of the affixes.
Dictionary	Store of language data used for computational analysis, primarily including a large list of word roots with associated information about them.
Dictionary adaptor	Optional extension of the general purpose dictionary, containing domain-specific language data.
Document	Physical file of text content. Also used to refer to index items, which are a data structure stored in the index representing the basic unit of the index.
Document set	Hierarchical collection of index items (or documents) that can represent a physical directory, a topical category, or a metadata value. Document sets are used for sub-setting the indexed content during search or browse. Also called an item set.
Equivalent terms	Two terms that are identical in terms of meaning and usage; see Synonymous terms.
Feedback	Information or data concerning the user query that may help the end-user improve the next query. Examples include spelling suggestions, related categories, or related terms.
Field	Region or portion of the text content of an index item. For AS, fields are represented as features on the statement term vectors.
Fielded search	Searching within a region of the text content of the items, as opposed to searching across the entire content.
Group-by-document	The group-by-document algorithm reviews the list of matching statements and collapses those that come from the same source document, forming groups of document results.
High content term	Important or significant term for the indexed content, typically very specific and not referred to in high frequency.
High frequency term	Term that occurs in many documents and potentially many times within each document, generally of low importance.
Hits	Match between a query term and the indexed content, either within a statement term vector or in the whole document.
Index	Secondary repository that stores information about a document or data collection to facilitates search.
Index item	Basic unit of an index, typically represents a document file, but could also represent any database or structured content.
Index term	Word, phrase, or other text token that is used to retrieve index items or statement term vectors.
Inflectional morphology	Analyzing the various surface forms of a word, rather than more complex word formations from a base root. See Derivational morphology.
Inverted index	Look-up table of every index term to the index items and statement term vectors that they occur in. This is a critical component of the overall index of the text content.
Item set	Hierarchical collection of index items (or documents) which can represent a physical directory, a topical category, or a metadata value. Item sets are used for sub-setting the indexed content during search or browse. Also called a document set.
Keyword query	Query consisting of a list of literal terms, generally without low-content or stop words.
Language data	Information that represents a system’s knowledge about a natural language that is used to process text content written in that language. Also known as the dictionary.
Literal match	Exact match of a query term in the indexed content.
Low content term	Unimportant or insignificant term for the indexed content, typically very general and used in high frequency.
Low frequency term	Term that occurs in few documents and generally of high importance.
Match	Retrieved index item that is similar or related to the original query term.
Metadata	Information about a unit of text content, separate from the text content itself.
Morphology	Analysis of word variations and word formation.
Natural language processing (NLP)	Computational analysis of natural (human) language. For search, NLP is used to index content and interpret user queries.
Natural language query	Any search query that consists of normal terms in the end-user’s language, without any special syntax or format.
NLP	See Natural language processing.
Non-compositional phrase	Lexical phrase or construct whose meaning does not derive directly (or even indirectly) from the component terms.
Part of speech (POS)	Major syntactic category representing a word’s grammatical usage, such as noun, verb, adjective, and adverb.
Phrases	Lexical construct made up of multiple words or other tokens.
Property	Typed attribute of an index item, such as a string property named ID.
Query	End-user input entered into the search site or application.
Preferred Answer	Special kind of content, which consists of a question, an answer and a reference document, that is indexed to improve search results for specific questions. Similar to a Frequently Asked Question (FAQ).
Refinement	Process of modifying or narrowing the search to make the results more precise.
Regular expression	Query term that is treated as a character pattern that matches many index terms at once, in a similar manner to wildcards.
Relevance	Percentage of total content weight that the category has.
Relevancy score	Calculated based on how well the statement matches the query, plus how related the retrieved index item of that statement is to the query.
RelQuestSettings	Configuration option for ATG Search which contains numerous low-level query settings to adjust performance.
Request	Complex instruction for ATG Search to perform, typically to execute a search query or to retrieve an indexed item for viewing. An ATG Search request is in the form of XML since it contains much more than the end-user’s query input.
Response	Complex reply to an ATG Search request, typically containing the search results and any auxiliary information about the query, also called feedback. An ATG Search response is in the form of XML since it contains complex data relating to the retrieved results.
ResponseNumberSettings	Configuration option for ATG Search which contains numerous controls for the final search results list.
Result	Single matching statement returned by ATG Search executing the query request. The result contains information about the statement and its index item.
Root	Fundamental element of a word or form, exclusive of all endings and prefixes.
Searchable text content	Text of an index item that is indexed and therefore retrievable during search. Typically this is the body of a document file.
Segmentation	Process of identifying the words and tokens within a statement of natural language text that does not contain white space or other delimiters.
Solution	Special document that is authored in response to end-user problems or questions. The solution contains information about the issue and how to address or answer it. Solutions can be indexed for search by support personnel (assisted service) or by end-users (self-service).
Statement features	Attributes of a statement stored in the index, typically representing the fields and security zones of structured content.
Statement query	Grammatical sentence or sentence fragment that is entered as search text, as opposed to simple keyword terms or questions.
Stem	Remainder of a word form after its endings are removed, in a process called Stemming. ATG Search uses morphological analysis to determine the root of a word, which often differs from its stem. The concept of a root is not straight-forward, so ATG Search refers to the root that it indexes by as an index term.
Stemming	Process of stripping of endings of word forms to reduce the number of unique index terms. This process is typically much simpler and mechanical than a robust morphological analyzer.
Stop words	Low content term that should be ignored during search.
Structured document	Document that represents structured information, typically from data repositories. ATG Search uses a special subset of XHTML to denote the structured information.
Structured index item	Structured document that has been indexed by AS. The index item represents the document and preserves its structure.
Synonymous terms	Terms that are similar in meaning or usage, generally very strongly related with respect to user queries.
Taxonomy	Hierarchical organization of categories. For AS, the taxonomy is used to classify documents and queries.
Term	Word, phrase, number, or other token treated as a unit of search and retrieval.
Term exclusion	Simple approximation of the Boolean NOT operator, where results that contain a term are excluded (or subtracted).
Term expansion	Process of selecting alternative terms to the original query terms, typically based on some form of thesaurus.
Term frequency inverse document frequency	Family of statistical formulas that measure the strength (or weight) of relationship between terms, statements, documents, or any other textual content. Term frequency is the number of times a term appears in a single unit of content, and document frequency is the number of items that a term appears in. Also called TF-IDF.
Term normalization	Process of replacing variations of the same term with a single normalized form. The variations are also called equivalent terms.
Term relationship	Link or connection between two terms, generally with some measure of strength. A thesaurus contains entries that represent term relationships.
Term vector	Ordered set of terms representing a statement, sentence, query, or any other unit of text.
Term weight	Content value of a term, typically computed using term frequency and document frequency statistics.
TF-IDF	Acronym for term frequency inverse document frequency.
Thesaurus	Collection of terms and their related terms, generally used for term expansion.
Token	Basic unit of text content, such as a word, number, or punctuation mark.
Tokenization	Process of determining the basic units, or tokens, of text content. Generally, this process uses white space and character types to determine the boundaries between tokens. Some Asian languages require a more complex segmentation process due to the lack of white space.
Topic	Also known as categories, represent a classification of text content, organized in a taxonomy.
Unstructured document	Document that does not represent structured information, or is not indexed by ATG Search to preserve the structure. Common HTML, PDF, Word, and text documents are unstructured. ATG Search uses a special subset of XHTML to denote the structured information.
Unstructured index item	Unstructured document that has been indexed by AS. The index item represents the document and has no structured text fields.
XML request	ATG Search receives requests to perform actions in the form of XML. The major request types are query, browse, and view content.

ATG Search Query Reference Guide

Glossary