Performance impact of returning and displaying refinements

This topic summarizes performance impact for returning and displaying refinements, refinement ordering, refinement counts, and using multi-select managed attributes.

Performance impact of returning and displaying refinements

Run-time performance of the Dgraph is directly related to the number of refinement values being computed for display. Only request refinement values if you are planning to display them in the front-end application. If any refinement values are being computed by the Dgraph process, but not being displayed by the application, this negatively affects performance. Attributes containing large numbers of refinements also affect performance.

Additionally, even exposing a large number of attributes (not their individual values/refinements) within each attribute group can have performance implications. This is because, for a query that returns a large number of attributes, the Dgraph needs to compute whether any valid refinements exist for each of the attributes. While this computation has a lower performance cost than listing the actual refinements, it can still have performance impact, because, even if an attribute does not have any valid refinements, the Dgraph checks all the assignments on records to determine this.

Finally, retrieving a full list of refinements (both suggested refinements and applied refinements, which includes explicitly-selected and implicit), has performance implications.

If you must return a large number of attributes, to offset performance costs, consider increasing the system cache: Determine the amount of free RAM on the system, while the Dgraph is under load. If you are seeing a fair amount of free memory, consider increasing the cache size by that amount.

Performance impact of refinement ordering

You can use the data domain configuration flag, --refinement-sampling-min, to specify the minimum number of records to sample during refinement computation (for managed attributes only). This option is useful because sampling the entire navigation state during the refinement computation can be one of the more performance intensive operations for the Dgraph.

For most applications, larger values for the data domain configuration flag, --refinement-sampling-min, reduce performance without improving the quality of refinement ordering. For some applications with extremely large, non-hierarchical attributes (if they cannot be avoided), larger values can meaningfully improve refinement ordering quality with minor performance cost. You specify this flag as:
endeca-cmd --put-dd-profile <profile-name> --refinement-sampling-min

Performance impact of refinement counts

Dynamic statistics on records are expensive computations. You should only enable a managed attribute for dynamic statistics if you intend to use the statistics. Because the Dgraph does additional computation for additional statistics, there is a performance cost for those refinement counts that you are not using.

Performance impact of multi-select managed attributes

Tagging an attribute as multi-select has an impact on performance. Users will take longer to refine the list of results, because each selection from a multi-select attribute still allows for further refinements from that attribute. Also, refinements for multi-or attributes are more expensive.