Dimension editor

You use the Dimension editor to create a new dimension or modify the attributes that affect how an existing dimension is evaluated and displayed.

The top of the Dimension editor contains the following information that identifies the dimension:

Option Description
Name The name of this dimension. Dimension names are case sensitive.
ID A unique system-generated identifier.
Member of this dimension group Allows you to select from existing dimension groups or add a new one.
Refinements sort order Specifies the sort type for any refinement dimension values that are returned for this dimension: Alpha, Integer, or Floating point.

The lower half of the Dimension editor contains five tabs. See tables below for details.

General

The General tab contains the following settings:

Option Description
Prepare sort offline When checked, record sorting on this dimension is optimized.
Hidden Specifies whether or not this dimension is shown in the navigation controls.
Show with record list When checked, enables this dimension to appear in the record list display. Any records that are tagged with a value from this dimension will have the value shown as part of their entry in the record list.
Show with record When checked, allows this dimension to appear on the record page. Any records that are tagged with a value from this dimension will have that value shown as part of their entry on the record page.
Language Specifies the language for this dimension so that the MDEX Engine can perform language-specific operations correctly. If your application tends to have mixed-language records, and the languages are segregated into different dimensions, setting a per-dimension language ID might be appropriate. For more information about language settings, see the Endeca Advanced Development Guide.

Search

The Search tab contains the following settings:

Option Description
Search hierarchy for dimension search When checked, allows dimension search to consider ancestor dimension values when matching a dimension search query.
Enable record search Specifies whether or not record search should be enabled for this dimension. Record search finds all records in an Endeca application that have a dimension whose value matches a term the user provides. Checking Enable record search makes the following additional options available.
Search hierarchy for record search When checked, allows record search to consider ancestor dimension values when matching a record search query. This setting is only enabled when Enable record search is checked.
Enable wildcard search When checked, indicates that a user query can contain a wildcard character (*) to match against fragments of words in a dimension value. You must enable each dimension that you want available for wildcard searching.
Important: The Dimension Search Configuration editor does not specify the same options as the Search tab of the Dimension editor. You use the Dimension Search Configuration editor to configure dimension search options for all dimensions in your project. The Search tab of the Dimension editor affects record search.

Advanced

The Advanced tab contains the following settings:

Option Description
Primary Specifies whether this dimension is the project's sole primary dimension. (All other dimensions are secondary.) Starting with version 6.1.0 of the MDEX Engine, primary dimension is no longer required and is ignored by the MDEX Engine. The MDEX Engine treats all dimensions as secondary, no matter what you specify in this field (the Binary setting is ignored).
Multiselect Allows the end user to select more than one dimension value from a dimension.
Enable for rollup Enables aggregated Endeca record creation by allowing rollups based on this dimension.
Compute refinement statistics Enables the computation of refinement statistics.
Collapsible dimension threshold Allows you to set your application to collapse a deep hierarchy to make it shallower when available data is small.

Dynamic Ranking

The Dynamic Ranking tab contains the following settings:

Option Description
Enable dynamic ranking When checked, indicates that the list of refinement dimension values returned for a query should be pruned to those values that occur most frequently in the requested navigation state; that is, the refinement dimension values that are most popular.
Maximum dimension values to return Sets the number of most popular dimension values to return.
Sort dimension values Establishes the sort method used for the most popular dimension values:
  • "Alphabetically" uses whatever order you've selected for the "Refinements sort order" setting on the main part of the Dimension editor.
  • "Dynamically" orders the most popular refinement values according to their frequency of appearance within a data set. Dimension values that occur more frequently are returned before those that occur less frequently.
Generate "More..." dimension value When this option is checked, if the actual number of refinement options exceeds the number set in "Maximum dimension values to return," then an additional option called More is returned for that dimension. If the user selects the More option, then the MDEX Engine will return all of the refinement options for that dimension. If "Generate 'More' dimension value" is not checked, only the number of dimension values defined in "Maximum dimension values to return" is displayed.

Cluster Discovery

The Cluster Discovery tab contains the following settings:

Option Description
Enable clustering

Specifies whether Cluster Discovery is enabled for this dimension. The checkbox also makes the following controls available.

For more information, see the Endeca Relationship Discovery Guide.

Sample size

This parameter governs how many documents are sampled. Clustering processing time and memory consumption are both roughly linear with this number; thus, lowering the value results in smaller memory consumption and faster turnaround. However, statistical errors are likely to occur when the sample size is small. Setting this value higher will overcome statistical errors for data sets where fewer terms are tagged onto each document.

Range: Integer, 50-2000 (default: 500)

Recommended value: 500

Maximum clusters This parameter limits the number of clusters that will be generated by the MDEX Engine.

Range: Integer, 2-10 (default: 10)

Recommended value: 6

Coherence

This parameter governs the decision of whether a set of terms is coherent enough to form a cluster (that is, each cluster should have only closely related documents). Low values are permissive (i.e., not demanding much coherence) and will result in fewer larger clusters; high values are strict and will result in more smaller clusters. The average value is recommended.

Range: Integer, 2-10 (default: 10)

Recommended value: 6

Maximum precision

Terms that are extracted from sampled documents are filtered by their precision p (where p = number of sampled documents that this term is tagged onto divided by the number of all sampled documents). Terms that have too high a value of p are likely to be the search term (or be synonymous with it) or be too general to make for a good clustering term. If you use the recommended tuning values of the term extractor, each term is tagged only roughly 1/3 of the documents, which means that the search term, if present, will have p of roughly 0.33 (more or less stringent tuning of the term extractor will change this value). There usually is a gap in the values of p between the search term and the more useful terms, which start at approximately p = 0.25 and less.

Range: Float, 0.0 - 1.0 (default: 1.0)

Recommended value: 0.25

Maximum cluster size

Sets the maximum number of terms that can be in a cluster. Each cluster will have at least 2 terms. Because of the match-partial cluster selection mechanism, the more terms there are in the cluster, the (potentially) higher its coverage will be. On the other hand, the clusters that are too large take up too much space to display and take too long for users to read.

Range: Integer, 2 - 10 (default: 10)

Recommended value: 8

Maximum cluster overlap

If two clusters overlap (that is, if the document sets that each cluster maps to overlap), then the smaller one (as measured by the estimated size of the document set it maps to) can be removed, depending on how big this overlap is. This parameter dictates the overlap above which the smaller cluster is removed.

Clusters which overlap by more than this value will be removed. Thus, the default setting of 10 (out of ten) means that clusters that overlap by more than 10 out of 10 documents will be removed. Since this is impossible, this means that setting of 10 will disable cluster overlap filtering, which is most extreme level of coarseness for this filter. Tuning this parameter down will make the cluster overlap more and more fine-grained. Thus, a value of 9 will remove only the clusters that greatly overlap; setting it to the recommended value of 5 will remove only clusters overlapping half-way or so (remember that the overlap is merely estimated). Setting this parameter to lower values (less than 5) will make overlap filtering quite sensitive and will remove clusters which overlap even by a small amount. Note that clusters that do not overlap at all will never be filtered.

Range: Integer, 0-10 (default: 10)

Recommended value: 5