Adaptor

See Dictionary Adaptor.

BNF

Backus Naur Form. John Backus and Peter Naur introduced for the first time a formal notation to describe the syntax of a given language.

Boolean AND

Logical operator which evaluates to true if all of its operands also evaluate to true, otherwise it evaluates to false.

Boolean operator

Logical operator which evaluates to true or false; see Boolean AND, OR, and NOT.

Boolean NOT

Logical operator which evaluates to true if all of its operands evaluate to false, otherwise it evaluates to false.

Boolean OR

Logical operator which evaluates to true if any of its operands evaluate to true, otherwise it evaluates to false.

Boolean expression

Mathematical formula consisting of Boolean operators and operands, where the operands can be other Boolean expressions or simple logical tests.

Boolean mathematics

Simple approximation of Boolean expressions where a plus sign indicates a required test and a minus sign indicates a negated (or subtracted) test.

Category

Also known as topics, represents a classification of text content, typically by its subject matter.

Compositional phrase

Lexical phrase whose meaning is derived directly from its component terms; see Non-compositional phrase.

Content value

The importance or significance of a term in the content. High-content means the term is important, and low-content means the term is insignificant.

Derivational morphology

Analyzing a form of word down to its base root, removing suffixes and prefixes. The form’s meaning and features derive from the root and the interpretation of the affixes.

Dictionary

Store of language data used for computational analysis, primarily including a large list of word roots with associated information about them.

Dictionary adaptor

Optional extension of the general purpose dictionary, containing domain-specific language data.

Document

Physical file of text content. Also used to refer to index items, which are a data structure stored in the index representing the basic unit of the index.

Document set

Hierarchical collection of index items (or documents) that can represent a physical directory, a topical category, or a metadata value. Document sets are used for sub-setting the indexed content during search or browse. Also called an item set.

Equivalent terms

Two terms that are identical in terms of meaning and usage; see Synonymous terms.

Feedback

Information or data concerning the user query that may help the end-user improve the next query. Examples include spelling suggestions, related categories, or related terms.

Field

Region or portion of the text content of an index item. For AS, fields are represented as features on the statement term vectors.

Fielded search

Searching within a region of the text content of the items, as opposed to searching across the entire content.

Group-by-document

The group-by-document algorithm reviews the list of matching statements and collapses those that come from the same source document, forming groups of document results.

High content term

Important or significant term for the indexed content, typically very specific and not referred to in high frequency.

High frequency term

Term that occurs in many documents and potentially many times within each document, generally of low importance.

Hits

Match between a query term and the indexed content, either within a statement term vector or in the whole document.

Index

Secondary repository that stores information about a document or data collection to facilitates search.

Index item

Basic unit of an index, typically represents a document file, but could also represent any database or structured content.

Index term

Word, phrase, or other text token that is used to retrieve index items or statement term vectors.

Inflectional morphology

Analyzing the various surface forms of a word, rather than more complex word formations from a base root. See Derivational morphology.

Inverted index

Look-up table of every index term to the index items and statement term vectors that they occur in. This is a critical component of the overall index of the text content.

Item set

Hierarchical collection of index items (or documents) which can represent a physical directory, a topical category, or a metadata value. Item sets are used for sub-setting the indexed content during search or browse. Also called a document set.

Keyword query

Query consisting of a list of literal terms, generally without low-content or stop words.

Language data

Information that represents a system’s knowledge about a natural language that is used to process text content written in that language. Also known as the dictionary.

Literal match

Exact match of a query term in the indexed content.

Low content term

Unimportant or insignificant term for the indexed content, typically very general and used in high frequency.

Low frequency term

Term that occurs in few documents and generally of high importance.

Match

Retrieved index item that is similar or related to the original query term.

Metadata

Information about a unit of text content, separate from the text content itself.

Morphology

Analysis of word variations and word formation.

Natural language processing (NLP)

Computational analysis of natural (human) language. For search, NLP is used to index content and interpret user queries.

Natural language query

Any search query that consists of normal terms in the end-user’s language, without any special syntax or format.

NLP

See Natural language processing.

Non-compositional phrase

Lexical phrase or construct whose meaning does not derive directly (or even indirectly) from the component terms.

Part of speech (POS)

Major syntactic category representing a word’s grammatical usage, such as noun, verb, adjective, and adverb.

Phrases

Lexical construct made up of multiple words or other tokens.

Property

Typed attribute of an index item, such as a string property named ID.

Query

End-user input entered into the search site or application.

Preferred Answer

Special kind of content, which consists of a question, an answer and a reference document, that is indexed to improve search results for specific questions. Similar to a Frequently Asked Question (FAQ).

Refinement

Process of modifying or narrowing the search to make the results more precise.

Regular expression

Query term that is treated as a character pattern that matches many index terms at once, in a similar manner to wildcards.

Relevance

Percentage of total content weight that the category has.

Relevancy score

Calculated based on how well the statement matches the query, plus how related the retrieved index item of that statement is to the query.

RelQuestSettings

Configuration option for ATG Search which contains numerous low-level query settings to adjust performance.

Request

Complex instruction for ATG Search to perform, typically to execute a search query or to retrieve an indexed item for viewing. An ATG Search request is in the form of XML since it contains much more than the end-user’s query input.

Response

Complex reply to an ATG Search request, typically containing the search results and any auxiliary information about the query, also called feedback. An ATG Search response is in the form of XML since it contains complex data relating to the retrieved results.

ResponseNumberSettings

Configuration option for ATG Search which contains numerous controls for the final search results list.

Result

Single matching statement returned by ATG Search executing the query request. The result contains information about the statement and its index item.

Root

Fundamental element of a word or form, exclusive of all endings and prefixes.

Searchable text content

Text of an index item that is indexed and therefore retrievable during search. Typically this is the body of a document file.

Segmentation

Process of identifying the words and tokens within a statement of natural language text that does not contain white space or other delimiters.

Solution

Special document that is authored in response to end-user problems or questions. The solution contains information about the issue and how to address or answer it. Solutions can be indexed for search by support personnel (assisted service) or by end-users (self-service).

Statement features

Attributes of a statement stored in the index, typically representing the fields and security zones of structured content.

Statement query

Grammatical sentence or sentence fragment that is entered as search text, as opposed to simple keyword terms or questions.

Stem

Remainder of a word form after its endings are removed, in a process called Stemming. ATG Search uses morphological analysis to determine the root of a word, which often differs from its stem. The concept of a root is not straight-forward, so ATG Search refers to the root that it indexes by as an index term.

Stemming

Process of stripping of endings of word forms to reduce the number of unique index terms. This process is typically much simpler and mechanical than a robust morphological analyzer.

Stop words

Low content term that should be ignored during search.

Structured document

Document that represents structured information, typically from data repositories. ATG Search uses a special subset of XHTML to denote the structured information.

Structured index item

Structured document that has been indexed by AS. The index item represents the document and preserves its structure.

Synonymous terms

Terms that are similar in meaning or usage, generally very strongly related with respect to user queries.

Taxonomy

Hierarchical organization of categories. For AS, the taxonomy is used to classify documents and queries.

Term

Word, phrase, number, or other token treated as a unit of search and retrieval.

Term exclusion

Simple approximation of the Boolean NOT operator, where results that contain a term are excluded (or subtracted).

Term expansion

Process of selecting alternative terms to the original query terms, typically based on some form of thesaurus.

Term frequency inverse document frequency

Family of statistical formulas that measure the strength (or weight) of relationship between terms, statements, documents, or any other textual content. Term frequency is the number of times a term appears in a single unit of content, and document frequency is the number of items that a term appears in. Also called TF-IDF.

Term normalization

Process of replacing variations of the same term with a single normalized form. The variations are also called equivalent terms.

Term relationship

Link or connection between two terms, generally with some measure of strength. A thesaurus contains entries that represent term relationships.

Term vector

Ordered set of terms representing a statement, sentence, query, or any other unit of text.

Term weight

Content value of a term, typically computed using term frequency and document frequency statistics.

TF-IDF

Acronym for term frequency inverse document frequency.

Thesaurus

Collection of terms and their related terms, generally used for term expansion.

Token

Basic unit of text content, such as a word, number, or punctuation mark.

Tokenization

Process of determining the basic units, or tokens, of text content. Generally, this process uses white space and character types to determine the boundaries between tokens. Some Asian languages require a more complex segmentation process due to the lack of white space.

Topic

Also known as categories, represent a classification of text content, organized in a taxonomy.

Unstructured document

Document that does not represent structured information, or is not indexed by ATG Search to preserve the structure. Common HTML, PDF, Word, and text documents are unstructured. ATG Search uses a special subset of XHTML to denote the structured information.

Unstructured index item

Unstructured document that has been indexed by AS. The index item represents the document and has no structured text fields.

XML request

ATG Search receives requests to perform actions in the form of XML. The major request types are query, browse, and view content.

 
loading table of contents...