This section discusses tuning features related to dimensions and dimension values.
Hidden dimensions
You prevent a dimension from appearing in the navigation controls by designating it as a hidden dimension.
Hidden dimensions, like regular dimensions, are composed of dimension values that allow the user to refine a set of records. The difference between regular dimensions and hidden dimensions is that regular dimensions are returned for both navigation and record queries, while hidden dimensions are only returned for record queries and dimension search.
In cases where certain dimensions in an application are composed of many values, marking such dimensions as hidden improves Dgraph performance to the extent that queries on large dimensions are limited, reducing the processing cycles and amount of data the Dgraph must return.
Consider a case where records have dimensions that have almost—but not quite—full coverage over the records. For example, 99% of the records have a dimension value for a Location dimension, but the remaining 1% do not.
While this factor does not affect performance significantly, you can add an “n/a” dimension value to fill the gap and make the dimension have 100% coverage, if you want to let users explicitly refine to records that do not have an assignment for that dimension.
In general, avoid using large, flat dimensions (that is, dimensions with thousands of dimension values at the same level of hierarchy).
This is doubly true if statistics are enabled for those dimensions. It is better to design dimensions that contain sensible levels of hierarchy.
For some applications with extremely large, non-hierarchical dimensions, larger values for --esampmin
can meaningfully improve dynamic refinement ranking quality with minor performance cost.
When making decisions about whether to configure a dimension as multiselect, keep in mind that users may take longer to refine the list of results, because the user can continue to refine a multiselect dimension until all leaf dimensions have been selected.
In particular, refinements for dimensions tagged as multiselect OR are expensive.
A dimension is considered to be multi-assign if there exists a record which has more than one dimension value assigned to it from that dimension.
Making a dimension multi-assign can slow down refinement computation. To improve performance, you can use multi-assign only for those dimensions for which you need it, and avoid making dimensions multi-assign where it is not useful.
Run-time
performance of the MDEX Engine is sometimes directly related to the number of
refinement dimension values being computed for display. If any refinement
dimension values are being computed by the MDEX Engine but not being displayed
by the application, use the
Ne
parameter more strictly.
The worst-case scenario for
run-time performance is having a data set with a large number of dimensions,
each dimension containing a large number of refinement dimension values, and
setting the
ENEQuery.setNavAllRefinements()
method (Java), or
ENEQuery.NavAllRefinements()
property (.NET) to
true
. This combination is slow to compute and creates a
page with an overwhelming number of refinement choices for the user. Oracle
does not recommend using this strategy.
In general, you may want to reconsider the number of refinements you display, as well as consider implementing precedence rules.
Related links
You should only enable a dimension for dynamic statistics if you intend to use the statistics in your Guided Search-enabled Web application. Because the Dgraph performs additional computation for the statistics, there is a performance cost to enabling statistics that your application does not use.
Using dynamic refinement ranking can greatly speed up refinement computation by displaying only the top refinements for a dimension, rather than computing the exhaustive list of refinements.
To decide whether or not dynamic refinement count statistics are likely to be appropriate for a project, consider the following aspects of your configuration:
The number of dimension value refinements per page, especially dimension values assigned to large numbers of records. The more refinements are returned on each page, the more counts that need to be computed, and the bigger the performance impact.
For example, if the data set has a large number of dimensions, and/or the application uses
ENEQuery.setNavAllRefinements
(true), then the performance impact will be larger. This is especially true if many of the dimension values are assigned to large numbers of records. This frequently happens with hierarchical dimensions. For example, it is more expensive to count Red Wines than it is to count Merlots.The number of records in the data set. Data sets with large numbers of records will see a proportionally higher performance impact from record count statistics.
The average number of results per query. Applications that tend to perform searches that match larger numbers of records will see proportionally higher impact from refinement count statistics.
As a simple rule, add up the counts for all of the refinements on the page. The performance impact of record count statistics grows proportionally with that sum over all refinements. All of the above considerations are aspects of the application that can make that sum larger, and increase your performance slowdown related to record counts.
You can speed up computation of dynamic statistics for refinements by doing the following:
Set the following options in the
STATS
subelement in therefinement_config.xml
file:RECORD_COUNT_DISABLE_THRESHOLD
specifies the maximum number of records in a result set above which the MDEX Engine does not compute or return any dynamic statistics for that query. This speeds up processing if you do not need the counts in this case.MAX_RECORDS_COUNT
causes the MDEX Engine to stop computing dynamic statistics for a particular dimension value when it has reached the specified value. The count returned in this case is the minimum of the actual count andMAX_RECORDS_COUNT
. Thus, you can set this parameter to a specific value if you do not need to know the count for a particular dimension value once it is sufficiently high.
Dynamic statistics on regular and aggregated records are expensive computations for the MDEX Engine.
You should only enable a dimension for dynamic statistics if you intend to use the statistics in your Guided Search-enabled Web application.
Similarly, you should only use the
--stat-abins
flag with the Dgraph to calculate
aggregated record counts if you intend to use the statistics in your
Guided Search-enabled Web application. Because the Dgraph does additional computation
for additional statistics, there is a performance cost for those that you are
not using.
In applications where record counts or aggregated record counts are not used, these lookups are unnecessary. The MDEX Engine takes more time to return navigation objects for which the number of dimension values per record is high.
The
--stat-abins
flag for the Dgraph lets you
calculate aggregated record counts beneath a given refinement. For more
information on using this flag, see the
MDEX Engine Developer's Guide.
You can use --esampmin
with the Dgraph, to specify the minimum number of records to sample during refinement computation. The default is 0.
For most applications, larger values reduce performance without improving dynamic refinement ranking quality. For some applications with extremely large, non-hierarchical dimensions (if they cannot be avoided), larger values for --esampmin
can meaningfully improve dynamic refinement ranking quality with minor performance cost.
Performance impact from displaying disabled refinements falls into three categories. They are discussed in the order of importance.
The cost of computation involved in determining the base and default navigation states.
The base and default navigation states are computed based on the top-level filters that may belong to these states. These filters are text searches, range, EQL and record filters and selections from dimensions. The types and numbers of these top-level filters in the base and default navigation states affect the MDEX Engine processing involved in computing the default navigation state. The more filters exist in the current navigation state, the more expensive is the task; some filters, such as EQL, are more expensive to take into account than others.
The trade off between using dynamic refinement ranking and disabled refinements.
In general, these two features pursue the opposite goals in the user interface — dynamic ranking allows you to intelligently return less information to the users based on most popular dimension values, whereas disabled refinements let you return more information to the users based on those refinements that are not available in the current navigation state but would have been available if some of the selections were not made by the users.
Therefore, carefully consider your choices for the user interface of your front-end application and decide for which of your refinements you would like to have one of these user experiences:
If, for example, for some dimensions you want to have only the most popular dimension values returned, you need dynamic ranking for those refinements. For it, you set the sampling size of records (with
--esampin
), which directly affects performance: the smaller the sampling, the quicker the computation. However, for those dimensions, the MDEX Engine then does not compute (and therefore, does not return) disabled refinements.If, on the other hand, in your user experience you would like to show grayed out (disabled) refinements, and your performance allows it, you can decide to enable them, instead of dynamic ranking for those dimensions. This means that for those dimensions, you need to disable dynamic ranking. As a side effect, this involves a performance cost, since computing refinements without dynamic ranking is more expensive. In addition, with dynamic ranking disabled, the MDEX Engine will need to compute refinement counts for more dimension values.
The cost of navigation queries.
Disabled refinements computation slightly increases the navigation portion of your query processing. This increase is roughly proportional to the number of dimensions for which you request the MDEX Engine to return disabled refinements.
Dimension value properties (that is, key-value pairs that the Dgraph passes back along with a dimension value) could slightly increase the processing or querying time because additional data is moved through the system, but this effect is generally minimal.
If your Guided Search application does complex formatting on the properties, this could slow down page loads. If the properties are used to add formatting HTML or perform other trivial operations, they have minimal impact on performance.
Automatically mapping source properties is a feature that, while it can be used in the staging environment to facilitate testing, is not recommended for using in the production environment.
The Property Mapper in Developer Studio enables you automatically to map source properties to Guided Search properties or dimensions, if no mapping is found. (This feature is also known as Automapper). The option of the Property Mapper that lets you map source properties to Guided Search properties or dimensions defines the setting that Forge uses to handle source properties that have neither explicit nor implicit mappings.
Use this option with caution because each source property that is mapped uses system resources. Ideally, you should only map source properties that you intend to use in your implementation. Many production-level implementations automatically pull and process new data when it is available. If this data has new source properties, they will be mapped and included in your MDEX Engine indices, which uses system resources unnecessarily. As a result, the Forge output is larger, the indexer is larger and the MDEX Engine has additional indices to process.
The --nostrictattrs
flag for Dgidx allows you to index every property found on a record, including those properties that do not have corresponding property mapper settings. Using this flag may negatively affect performance of Dgidx and the MDEX Engine.
If a large number of unused properties are sent to Dgidx, they will get indexed and will consume system resources during the indexing process and at run-time. These properties can also affect performance of the front-end application API, because the amount of information communicated between the MDEX Engine and the API increases.