This section discusses the relationship between the amount of RAM, the Dgraph process's use of virtual memory, the Dgraph cache, the working set size (WSS), and the resident set size (RSS) for the Dgraph process and their effect on performance.

In general, storing information on disk, instead of in memory, increases disk activity, which slows down the server. Although all the information the MDEX Engine may need is stored on disk, the running MDEX Engine attempts to store in memory as many as possible of the structures that it currently needs.

The decisions on what to keep in memory at any given time are based on which parts of the Dgraph are most frequently used. This affects the resident set size and the working set size of the running Dgraph, which, as they increase, lead to the increase of RAM being consumed.

While the amount of virtual memory consumed by the Dgraph process may grow and even exceed RAM at times, it is important for performance reasons that the working set size of the Dgraph process not exceed RAM.

The following diagram illustrates this relationship:

In this diagram:

The diagram illustrates three distinct use cases:

The MDEX Engine cache (or the Dgraph cache) is a storage area in memory that the Dgraph uses to dynamically save potentially useful data structures, such as partial and complete results of processing queries.

Since the Dgraph has direct access to the structures that it needs, it does not need to repeat the computational work previously done. The structures that are chosen for storing enable the Dgraph to answer queries faster by using fewer server resources.

The Dgraph cache is unified and adaptive:

The default Dgraph cache size (specified by the --cmem flag) is 1024MB (1GB).

The Dgraph cache improves both throughput and latency by taking advantage of similarities between processed queries. When a query is processed, the Dgraph checks to see whether processing time can be saved by looking up the results of some or all of the query computation from an earlier query.

The Dgraph cache is used to dynamically cache query results as well as partial or intermediate results. For example, if you perform a text search query the result is stored, if it was not already, in the cache. If you then refine the results by selecting a dimension value, your original text search query is augmented with a refinement. It is likely that the Dgraph can take advantage of the cached text search result from your original query and avoid recomputing that result. If the navigation refinement result is also in the cache, the Engine does not need to do that work either.

To a large extent, the contents of the Dgraph cache are self-adjusting: what information is saved there and how long it is kept is decided automatically.

However, when deploying a Dgraph you need to decide how much memory to allocate for the Dgraph cache.

Allocating more memory to the cache improves performance by increasing the amount of information that can be stored in it. Thus, this information does not have to be recomputed.

Your MDEX Engine is well-tuned only when the Dgraph cache and the file system cache are well-balanced; therefore you need to understand them both.

In some cases, you will not have enough memory to maximize both the FS cache and the Dgraph cache – for example, when you are operating at large data scale. In such cases, you must allocate memory between internal Dgraph cache and FS cache, because you do not have enough memory to maximize both. No general rule for allocating memory in these cases exists, however; you must determine the best way to allocate it experimentally.

Use the following practices for optimizing the Dgraph and the file system caches for best performance:

The amount of memory allocated to the Dgraph cache directly affects the virtual process size of the Dgraph. An example in this topic shows how to adjust the Dgraph cache.

Furthermore, since the cache is accessed frequently, the amount of virtual memory allocated to it affects the working set size of the Dgraph. This may cause virtual memory paging, which can adversely affect throughput and especially the maximum latency. Whether this is a problem depends on your deployment scenario.

This topic provides recommendations for estimating the requirements for physical memory for an Oracle Commerce Guided Search 6.1.x system given the anticipated growth of your data set.

The size of the Dgraph process is impacted by:

Each of these areas is discussed below in a separate section.

Partial updates can have a significant impact on RSS and WSS. The precise details of the generation merging strategy are complex and proprietary. However, the rough pattern of memory usage that you can expect to see from a Dgraph running with partial updates is as follows:

To estimate projected requirements for physical memory for an Endeca 6.1.x system, use the following recommendations:

Once you predict the growth of the resident set size, you can estimate memory requirements for your Guided Search implementation. This will make it possible to provision enough hardware to support the MDEX Engines with the projected data set growth.


Copyright © Legal Notices