Adaptor | See Dictionary Adaptor. |
BNF | Backus Naur Form. John Backus and Peter Naur introduced for the first time a formal notation to describe the syntax of a given language. |
Boolean AND | Logical operator which evaluates to true if all of its operands also evaluate to true, otherwise it evaluates to false. |
Boolean operator | Logical operator which evaluates to true or false; see Boolean AND, OR, and NOT. |
Boolean NOT | Logical operator which evaluates to true if all of its operands evaluate to false, otherwise it evaluates to false. |
Boolean OR | Logical operator which evaluates to true if any of its operands evaluate to true, otherwise it evaluates to false. |
Boolean expression | Mathematical formula consisting of Boolean operators and operands, where the operands can be other Boolean expressions or simple logical tests. |
Boolean mathematics | Simple approximation of Boolean expressions where a plus sign indicates a required test and a minus sign indicates a negated (or subtracted) test. |
Category | Also known as topics, represents a classification of text content, typically by its subject matter. |
Compositional phrase | Lexical phrase whose meaning is derived directly from its component terms; see Non-compositional phrase. |
Content value | The importance or significance of a term in the content. High-content means the term is important, and low-content means the term is insignificant. |
Derivational morphology | Analyzing a form of word down to its base root, removing suffixes and prefixes. The form’s meaning and features derive from the root and the interpretation of the affixes. |
Dictionary | Store of language data used for computational analysis, primarily including a large list of word roots with associated information about them. |
Dictionary adaptor | Optional extension of the general purpose dictionary, containing domain-specific language data. |
Document | Physical file of text content. Also used to refer to index items, which are a data structure stored in the index representing the basic unit of the index. |
Document set | Hierarchical collection of index items (or documents) that can represent a physical directory, a topical category, or a metadata value. Document sets are used for sub-setting the indexed content during search or browse. Also called an item set. |
Equivalent terms | Two terms that are identical in terms of meaning and usage; see Synonymous terms. |
Feedback | Information or data concerning the user query that may help the end-user improve the next query. Examples include spelling suggestions, related categories, or related terms. |
Field | Region or portion of the text content of an index item. For AS, fields are represented as features on the statement term vectors. |
Fielded search | Searching within a region of the text content of the items, as opposed to searching across the entire content. |
Group-by-document | The group-by-document algorithm reviews the list of matching statements and collapses those that come from the same source document, forming groups of document results. |
High content term | Important or significant term for the indexed content, typically very specific and not referred to in high frequency. |
High frequency term | Term that occurs in many documents and potentially many times within each document, generally of low importance. |
Hits | Match between a query term and the indexed content, either within a statement term vector or in the whole document. |
Index | Secondary repository that stores information about a document or data collection to facilitates search. |
Index item | Basic unit of an index, typically represents a document file, but could also represent any database or structured content. |
Index term | Word, phrase, or other text token that is used to retrieve index items or statement term vectors. |
Inflectional morphology | Analyzing the various surface forms of a word, rather than more complex word formations from a base root. See Derivational morphology. |
Inverted index | Look-up table of every index term to the index items and statement term vectors that they occur in. This is a critical component of the overall index of the text content. |
Item set | Hierarchical collection of index items (or documents) which can represent a physical directory, a topical category, or a metadata value. Item sets are used for sub-setting the indexed content during search or browse. Also called a document set. |
Keyword query | Query consisting of a list of literal terms, generally without low-content or stop words. |
Language data | Information that represents a system’s knowledge about a natural language that is used to process text content written in that language. Also known as the dictionary. |
Literal match | Exact match of a query term in the indexed content. |
Low content term | Unimportant or insignificant term for the indexed content, typically very general and used in high frequency. |
Low frequency term | Term that occurs in few documents and generally of high importance. |
Match | Retrieved index item that is similar or related to the original query term. |
Metadata | Information about a unit of text content, separate from the text content itself. |
Morphology | Analysis of word variations and word formation. |
Natural language processing (NLP) | Computational analysis of natural (human) language. For search, NLP is used to index content and interpret user queries. |
Natural language query | Any search query that consists of normal terms in the end-user’s language, without any special syntax or format. |
NLP | See Natural language processing. |
Non-compositional phrase | Lexical phrase or construct whose meaning does not derive directly (or even indirectly) from the component terms. |
Part of speech (POS) | Major syntactic category representing a word’s grammatical usage, such as noun, verb, adjective, and adverb. |
Phrases | Lexical construct made up of multiple words or other tokens. |
Property | Typed attribute of an index item, such as a string property named ID. |
Query | End-user input entered into the search site or application. |
Preferred Answer | Special kind of content, which consists of a question, an answer and a reference document, that is indexed to improve search results for specific questions. Similar to a Frequently Asked Question (FAQ). |
Refinement | Process of modifying or narrowing the search to make the results more precise. |
Regular expression | Query term that is treated as a character pattern that matches many index terms at once, in a similar manner to wildcards. |
Relevance | Percentage of total content weight that the category has. |
Relevancy score | Calculated based on how well the statement matches the query, plus how related the retrieved index item of that statement is to the query. |
RelQuestSettings | Configuration option for ATG Search which contains numerous low-level query settings to adjust performance. |
Request | Complex instruction for ATG Search to perform, typically to execute a search query or to retrieve an indexed item for viewing. An ATG Search request is in the form of XML since it contains much more than the end-user’s query input. |
Response | Complex reply to an ATG Search request, typically containing the search results and any auxiliary information about the query, also called feedback. An ATG Search response is in the form of XML since it contains complex data relating to the retrieved results. |
ResponseNumberSettings | Configuration option for ATG Search which contains numerous controls for the final search results list. |
Result | Single matching statement returned by ATG Search executing the query request. The result contains information about the statement and its index item. |
Root | Fundamental element of a word or form, exclusive of all endings and prefixes. |
Searchable text content | Text of an index item that is indexed and therefore retrievable during search. Typically this is the body of a document file. |
Segmentation | Process of identifying the words and tokens within a statement of natural language text that does not contain white space or other delimiters. |
Solution | Special document that is authored in response to end-user problems or questions. The solution contains information about the issue and how to address or answer it. Solutions can be indexed for search by support personnel (assisted service) or by end-users (self-service). |
Statement features | Attributes of a statement stored in the index, typically representing the fields and security zones of structured content. |
Statement query | Grammatical sentence or sentence fragment that is entered as search text, as opposed to simple keyword terms or questions. |
Stem | Remainder of a word form after its endings are removed, in a process called Stemming. ATG Search uses morphological analysis to determine the root of a word, which often differs from its stem. The concept of a root is not straight-forward, so ATG Search refers to the root that it indexes by as an index term. |
Stemming | Process of stripping of endings of word forms to reduce the number of unique index terms. This process is typically much simpler and mechanical than a robust morphological analyzer. |
Stop words | Low content term that should be ignored during search. |
Structured document | Document that represents structured information, typically from data repositories. ATG Search uses a special subset of XHTML to denote the structured information. |
Structured index item | Structured document that has been indexed by AS. The index item represents the document and preserves its structure. |
Synonymous terms | Terms that are similar in meaning or usage, generally very strongly related with respect to user queries. |
Taxonomy | Hierarchical organization of categories. For AS, the taxonomy is used to classify documents and queries. |
Term | Word, phrase, number, or other token treated as a unit of search and retrieval. |
Term exclusion | Simple approximation of the Boolean NOT operator, where results that contain a term are excluded (or subtracted). |
Term expansion | Process of selecting alternative terms to the original query terms, typically based on some form of thesaurus. |
Term frequency inverse document frequency | Family of statistical formulas that measure the strength (or weight) of relationship between terms, statements, documents, or any other textual content. Term frequency is the number of times a term appears in a single unit of content, and document frequency is the number of items that a term appears in. Also called TF-IDF. |
Term normalization | Process of replacing variations of the same term with a single normalized form. The variations are also called equivalent terms. |
Term relationship | Link or connection between two terms, generally with some measure of strength. A thesaurus contains entries that represent term relationships. |
Term vector | Ordered set of terms representing a statement, sentence, query, or any other unit of text. |
Term weight | Content value of a term, typically computed using term frequency and document frequency statistics. |
TF-IDF | Acronym for term frequency inverse document frequency. |
Thesaurus | Collection of terms and their related terms, generally used for term expansion. |
Token | Basic unit of text content, such as a word, number, or punctuation mark. |
Tokenization | Process of determining the basic units, or tokens, of text content. Generally, this process uses white space and character types to determine the boundaries between tokens. Some Asian languages require a more complex segmentation process due to the lack of white space. |
Topic | Also known as categories, represent a classification of text content, organized in a taxonomy. |
Unstructured document | Document that does not represent structured information, or is not indexed by ATG Search to preserve the structure. Common HTML, PDF, Word, and text documents are unstructured. ATG Search uses a special subset of XHTML to denote the structured information. |
Unstructured index item | Unstructured document that has been indexed by AS. The index item represents the document and has no structured text fields. |
XML request | ATG Search receives requests to perform actions in the form of XML. The major request types are query, browse, and view content. |