A dimension is a collection of related dimension values, organized into a tree. Dimension values are tags, or labels, you use to classify the records in your data set.

All dimensions are visible in the Dimensions view, opened from the Project Explorer.

To open the Dimensions view:

You use the Dimension editor to create a new dimension or modify the attributes that affect how an existing dimension is evaluated and displayed.

The top of the Dimension editor contains the following information that identifies the dimension:

The lower half of the Dimension editor contains five tabs. See tables below for details.

The Cluster Discovery tab contains the following settings:

Option

Description

Enable clustering

Specifies whether Cluster Discovery is enabled for this dimension. The checkbox also makes the following controls available.

For more information, see the Endeca Relationship Discovery Guide.

Sample size

This parameter governs how many documents are sampled. Clustering processing time and memory consumption are both roughly linear with this number; thus, lowering the value results in smaller memory consumption and faster turnaround. However, statistical errors are likely to occur when the sample size is small. Setting this value higher will overcome statistical errors for data sets where fewer terms are tagged onto each document.

Range: Integer, 50-2000 (default: 500)

Recommended value: 500

Maximum clusters

This parameter limits the number of clusters that will be generated by the MDEX Engine.

Range: Integer, 2-10 (default: 10)

Recommended value: 6

Coherence

This parameter governs the decision of whether a set of terms is coherent enough to form a cluster (that is, each cluster should have only closely related documents). Low values are permissive (i.e., not demanding much coherence) and will result in fewer larger clusters; high values are strict and will result in more smaller clusters. The average value is recommended.

Range: Integer, 2-10 (default: 10)

Recommended value: 6

Maximum precision

Terms that are extracted from sampled documents are filtered by their precision p (where p = number of sampled documents that this term is tagged onto divided by the number of all sampled documents). Terms that have too high a value of p are likely to be the search term (or be synonymous with it) or be too general to make for a good clustering term. If you use the recommended tuning values of the term extractor, each term is tagged only roughly 1/3 of the documents, which means that the search term, if present, will have p of roughly 0.33 (more or less stringent tuning of the term extractor will change this value). There usually is a gap in the values of p between the search term and the more useful terms, which start at approximately p = 0.25 and less.

Range: Float, 0.0 - 1.0 (default: 1.0)

Recommended value: 0.25

Maximum cluster size

Sets the maximum number of terms that can be in a cluster. Each cluster will have at least 2 terms. Because of the match-partial cluster selection mechanism, the more terms there are in the cluster, the (potentially) higher its coverage will be. On the other hand, the clusters that are too large take up too much space to display and take too long for users to read.

Range: Integer, 2 - 10 (default: 10)

Recommended value: 8

Maximum cluster overlap

If two clusters overlap (that is, if the document sets that each cluster maps to overlap), then the smaller one (as measured by the estimated size of the document set it maps to) can be removed, depending on how big this overlap is. This parameter dictates the overlap above which the smaller cluster is removed.

Clusters which overlap by more than this value will be removed. Thus, the default setting of 10 (out of ten) means that clusters that overlap by more than 10 out of 10 documents will be removed. Since this is impossible, this means that setting of 10 will disable cluster overlap filtering, which is most extreme level of coarseness for this filter. Tuning this parameter down will make the cluster overlap more and more fine-grained. Thus, a value of 9 will remove only the clusters that greatly overlap; setting it to the recommended value of 5 will remove only clusters overlapping half-way or so (remember that the overlap is merely estimated). Setting this parameter to lower values (less than 5) will make overlap filtering quite sensitive and will remove clusters which overlap even by a small amount. Note that clusters that do not overlap at all will never be filtered.

Range: Integer, 0-10 (default: 10)

Recommended value: 5

Open the Dimension Values view from any dimension in the Dimensions view.

To view dimension values for a particular dimension:

  1. In the Dimensions view, select a dimension and click Values. The Dimension Values view opens.

  2. To expand a node in the dimension hierarchy, click the plus sign (+) next to the node's name. To collapse a node, click the minus sign (-). In addition to buttons that allow you to add, edit, delete, or adjust the rank of dimension values, the Load and Promote buttons support two features: editing of auto-generated dimension values and working with externally-generated taxonomies. This view also contains the following columns:

Set dimension value name, type (exact, range, or sift), properties, bounds, synonyms, and specify whether dimension value is inert and/or collapsible.

Option

Description

Name

A unique name for this dimension value. Dimension value names are case sensitive.

Type

Specifies the dimension value's type. A dimension value's type determines how it matches to property values during mapping.

Inert

Check Inert if the dimension value is non-navigable.

Collapsible

Check Collapsible if the dimension value is a candidate for collapsing.

Bounds

This area is used to describe dimension value bounds for dimension values of type Range or Sift.

Synonyms

Click to add synonyms for this dimension value.

Properties

Click to associate a descriptive name/value pair with this dimension value.

Note

Do not confuse these name/value pairs with source properties or Endeca properties. They are purely for descriptive information about a given dimension value.

Set a dimension value's type, navigability, add synonyms, or associate properties to the dimension value.

To configure a dimension value:

  1. In the Dimension Value editor, type the dimension value's name in the Name text box.

  2. In the Type list, choose a dimension value type: Exact, Range, or Sift. A dimension value's type determines how it matches to property values during mapping.

  3. (Optional) Check Inert if the dimension value is non-navigable.

  4. Check Collapsible if the dimension value is a candidate for collapsing

  5. If you chose Range or Sift in step 2 above, do the following:

  6. Click Synonyms to add any dimension value synonyms.

  7. (Optional) Click Properties to associate any properties to the dimension value.

  8. Click OK.

Synonyms provide a textual way to refer to a dimension value, rather than by ID alone. A dimension value can have multiple synonyms. All synonyms that you assign to dimension values must be unique.

You specify the way each synonym is used by the MDEX Engine with the Search, Classify, and (Display) options:

Specify either of these values as the lower or upper bound for a dimension value, to indicate a value less than or greater than all other values in the range.

You can use two special values, NEG_INF and POS_INF, when creating the bounds for your dimension values. NEG_INF indicates less than all other values while POS_INF indicates greater than all other values. For example, to specify a range of greater than 100, you would use a lower bound of 100 and an upper bound of POS_INF.

Range dimension value example.

Likewise, less than 100 would use a lower bound of NEG_INF and an upper bound of 100.

Range dimension value example.

You can use POS_INF and NEG_INF with string values as well. For example, setting a lower bound of S, inclusive, and an upper bound of POS_INF, inclusive, would match all strings starting with S and going to the end of the alphabet, including values such as S, Style, Trigger, and Zzzzz.

The order of symbols depends on your locale setting, which is external to the Endeca software. On UNIX, it is determined by a set of environment variables, typically LOCALE, LANG, or LC_ALL. On Windows, there are separate system and user locales which can be set from the Regional and Language Options control panel. For example, in ASCII, using [NEG_INF, A) as the bounds includes all numerics and many symbols (the '[' symbol indicates the value is inclusive while ')' indicates it is not). Using (Z, POS_INF] includes the rest of the symbols, as well as lower-case letters. This is not the case for other encodings, such as Unicode, which intersperses symbols and numbers with letters much more than ASCII. To use NEG_INF and POS_INF effectively, you must have a good understanding of the order of symbols in your locale's encoding.

Some reasons for issues may include the way a dimension value's bounds are configured, or assignment of a dimension value to an incorrect range.

The following information will help you troubleshoot range dimension value issues:

In an auto-generated dimension, you don't have direct access to and, hence, can't manually rank, the dimension values. Instead, you must set a default rank order.

Default dimension value ranking is used with dimensions that are auto-generated.

To set a default dimension value rank order:

You can prune the list of refinement dimension values returned for a query to those values that occur most frequently in the requested navigation state.

You can limit the number of frequently-occurring (popular) refinements returned, as well as control the order in which they are returned. Note that configuring a dimension so that its dimension values are pruned according to their popularity overrides any manual or default dimension value ranking you may have specified.

To prune dimension value refinements according to their popularity:

Load an auto-generated dimension and then promote its dimension values to make them editable.

Normally, auto-generated dimension values cannot be edited. They are generated by Forge behind the scenes and maintained in state files. With an auto-generated dimension, you can configure the dimension's behavior, but you cannot configure the behavior of individual dimension values within the dimension. Endeca's load and promote functionality, however, allows you to load an auto-generated dimension and then promote its dimension values so that they become editable.

Configuring dimensions that are composed of many dimension values as hidden improves Presentation API and MDEX Engine performance to the extent that navigation query results do not have to include these large dimensions, reducing the processing cycles and amount of data the MDEX Engine must return.

You prevent a dimension from appearing in the navigation controls by designating it a hidden dimension. Hidden dimensions, like regular dimensions, are composed of dimension values that allow the user to refine a set of records. The difference between regular dimensions and hidden dimensions is that regular dimensions are returned for both navigation and record queries, while hidden dimensions are only returned for record queries. This means that hidden dimensions cannot be displayed as part of your navigation controls, but can be displayed as part of a record page (assuming the hidden dimension is configured to render on the record page).

Also, although hidden dimensions are not rendered in the navigation UI, records are still indexed with relevant values from these dimensions. Therefore, an end-user can search for records based on values within hidden dimensions.

To configure a hidden dimension:


Fully implementing this feature requires additional work outside of Developer Studio. Please refer to the Endeca Basic Development Guide for details.

Relevance ranking is used to control the order of results that are returned in response to a keyword search. Record sorting is used to control the order of records that are returned in response to any type of MDEX Engine query that returns records.

Relevance ranking and record sorting are closely related features but there are some distinct differences.

Generally, if you have relevance ranking enabled, you would not specify a record sort key within a record search query because record sort keys take priority over all other types of ordering, making the relevance ranking settings useless.

Specify an explicit sort key in the MDEX Engine query, set a default sort order, or use relevance ranking (for records returned in response to record search queries).

There are three ways of controlling the order in which records are returned:

The priority of record sorting/relevance ranking is as follows:

A search interface is a named collection of properties and dimensions, each of which has its Enable Record Search option checked. Search interfaces allow your end-users to search on multiple properties and/or dimensions simultaneously. The search interface's name is used just like a normal property or dimension when performing record searches. A record search query on a search interface returns results that match any of the properties or dimensions in the interface.

A range filter allows an Endeca-enabled Web application to select a subset of the total dataset for display, based on an arbitrary, dynamic range that uses an Endeca property or dimension as the filter key.

Navigation queries that use a range filter return only those records that are included in the selected data subset, along with the refinement dimension values that are appropriate for the filtered records. Range filters are supported for:

For values of properties and dimensions of type Floating point, you can specify values using both decimal (0.00...68), and scientific notation (6.8e-10).

It is important to remember that range filters are simply modifiers for a navigation query. The range filter acts in the same manner as a dimension value, even though it is not a specific system-defined dimension value. Consider the following records and examples:



Aggregated records allow you to treat a collection of separate records as one if the rollup key is the same for any number of records.

An aggregated record is a collection of individual Endeca records that have been rolled up based on a rollup key (an Endeca property or dimension name). All records in the current record set that have the same value for the rollup key are collected together into an aggregated record. For example, rolling up on a Name key causes all wines in the current record set that have the value 'My Red Wine' for the Name key to be rolled up into one aggregated record.

Commonly, aggregated records are used to eliminate duplicate display entries. For example, in a music store catalog, an album by the same title may exist in several formats, with multiple prices. Each title is represented in the MDEX Engine as a distinct record. However, from a business perspective, it might be useful to treat these separate records as a single record by creating an aggregate record.

Record aggregation affects the current record set only. In other words, if you have 10,000 Endeca records total but only 3,000 are displayed in the current record set, then the aggregation affects those 3,000 records only.

The aggregated records feature requires that each record should have at most one value from the dimension or Endeca property that has been specified as the rollup key. Also, if an Endeca record has a unique value for the rollup key, it is 'rolled up' into an aggregated record that contains only one sub-record.

Non-root and non-leaf dimension values are collapsible dimensions in a hierarchy.

A collapsible hierarchy is an ordinary hierarchy, in which some or all of the internal (non-root and non-leaf) dimension values are flagged as potentially collapsible. The MDEX Engine automatically removes, or collapses, these dimension values when there are only a few leaves available for refinement, creating a more streamlined, user-friendly navigation experience for your users.

For example, a dimension containing many state names could have a collapsible hierarchy introduced to group the names alphabetically. At query time, the available refinements are determined by the dimension values tagged to the records in the current navigation state. If there are many refinement values, it is easier for a user to select first a letter range, then a letter, and then the state name they want. But if there are only a few values, it is easier for the user to look at a brief list and select the state name directly. In this case, the letter-based dimension values can be collapsed, or removed, so that only the list of state names is displayed.

Dimension values that are configured as collapsible have the potential to be collapsible. Whether or not a dimension value is actually collapsed is controlled by the collapsible dimension threshold.

Auto sifting is an extension to autogeneration which positions newly generated dimension values within an existing hierarchy.

The hierarchy must be a sift hierarchy, which is a normal hierarchy that contains dimension values of type Sift. As with autogeneration, the dimension mapping in the property mapper must be configured for auto generation.

Defining a sift dimension is a two-step process. First, you create the sift dimension hierarchy, then you specify that the dimension should use auto-generation during its mapping process.

The procedure below shows you how to create a sift dimension for a hierarchy.

To create a sift dimension:

Specify either of these values as the lower or upper bound for a dimension value, to indicate a value less than or greater than all other values in the range.

You can use two special values, NEG_INF and POS_INF, when creating the bounds for your dimension values. NEG_INF indicates less than all other values while POS_INF indicates greater than all other values. For example, to specify a range of greater than 100, you would use a lower bound of 100 and an upper bound of POS_INF.

Range dimension value example.

Likewise, less than 100 would use a lower bound of NEG_INF and an upper bound of 100.

Range dimension value example.

You can use POS_INF and NEG_INF with string values as well. For example, setting a lower bound of S, inclusive, and an upper bound of POS_INF, inclusive, would match all strings starting with S and going to the end of the alphabet, including values such as S, Style, Trigger, and Zzzzz.

The order of symbols depends on your locale setting, which is external to the Endeca software. On UNIX, it is determined by a set of environment variables, typically LOCALE, LANG, or LC_ALL. On Windows, there are separate system and user locales which can be set from the Regional and Language Options control panel. For example, in ASCII, using [NEG_INF, A) as the bounds includes all numerics and many symbols (the '[' symbol indicates the value is inclusive while ')' indicates it is not). Using (Z, POS_INF] includes the rest of the symbols, as well as lower-case letters. This is not the case for other encodings, such as Unicode, which intersperses symbols and numbers with letters much more than ASCII. To use NEG_INF and POS_INF effectively, you must have a good understanding of the order of symbols in your locale's encoding.

Prior to the MDEX Engine version 6.1.0, each project had to have a single primary dimension which functioned as the root for all other secondary dimensions in the project. You needed to set up precedence rules between the primary dimension and each secondary dimension to ensure that the secondary dimensions would appear in the navigation controls.

If a primary dimension was not explicitly defined in Developer Studio, Dgidx created one for you. Dgidx also created the required precedence rules between the primary dimension and the other dimensions in your project, which were, by default, considered secondary. For most projects, this was sufficient and developers did not have to worry about creating a primary dimension.

Partial updates, however, required an explicit primary dimension that had explicit precedence rules to all secondary dimensions. Starting with the MDEX Engine version 6.1.0, you no longer need to specify a primary dimension. If it is specified, it is ignored by the MDEX Engine. Note that partial updates no longer require that a primary dimension is specified explicitly.

The procedure below describes how to set a primary dimension. See "About precedence rules" and "Creating, modifying, and deleting precedence rules" for information on creating precedence rules.

To set the primary dimension:

Be sure that source data reflects dimension values, match modes are correctly set, and that source properties are mapped to a dimension or an Endeca property.

If a dimension value does not appear as expected in the client browser, double-check the following potential issues:

Your source property is mapped correctly to a dimension in your pipeline's property mapper. See "Establishing a dimension mapping" for details.

The dimension is either configured for auto-generation, or the source property values on the records match the dimension values that you explicitly defined for the dimension in the Dimensions editor.

The Show with Record List and Show with Record options are set correctly in the Dimension editor.


Copyright © Legal Notices