Recommendations for identifying network problems

Often the diagnosis of slow performance comes from a query load played against the front-end application. The front-end application, or the configuration of its application server, may be the reason for the poor performance.

Alternatively, the network may be the problem, although this is less likely. (In the case of a Dgraph, unlike an Agraph, it is unusual for the network to be the bottleneck.)

To identify whether the network is a performance issue:

Compare Eneperf performance on the local host and a remote host. First, run Eneperf against the Dgraph on the Dgraph machine. Next, run the same Eneperf against the same Dgraph, but from the front-end machine (if possible), or somewhere on the other side of the network. If the difference is negligible, the network is not a problem. If Eneperf across the network is slow, you need to consider both the network itself and the application configuration.
Alternatively, you can run the Cheetah tool and compare the “Round-Trip Response Time” with the “Engine-Only Processing Time”. If “Round-Trip Response Time” is long but the “Engine-Only Processing Time” is short, this can indicate a network problem or a configuration of an application server for the front-end application.
Measure network performance using Netperf, a freely available tool that can be used to measure bandwidth. Alternatively, you can FTP some large files across the network link. If these tools show poor throughput across the network, this can indicate a network hardware problem such as a failing network interface card (NIC) or cable.
In addition, check Eneperf statistics, the Dgraph request logs, or the Dgraph Stats page to see how much data is being transmitted back from the Dgraph on an average request. Large average result page size can saturate the network.

If it seems as if your application is trying to move too much data, it is likely that you may need to change the configuration of your application. To determine if changes are needed, consider the following:

Is all of the data actually being used by the application? In other words, does the MDEX Engine return record fields that are then ignored by the front-end application? This is an especially serious problem with large documents.
Is your application returning unnecessary fields with the Select feature (described in “Controlling Record Values with the Select Feature” in the Endeca Advanced Development Guide)?
Is your application returning navigation pages that are too large? (Navigation pages are result list pages, as opposed to record detail pages.) If the application returns a lot of detailed information in the result list pages, consider reserving the details for a click-through and reducing the size of the result list pages your application returns on initial requests.
Is your application returning large numbers of records without using the bulk record API (described in “Bulk Export of Records” in the Endeca Advanced Development Guide)?
Is the network saturated? Upgrade to Gigabit Ethernet and identify the transmission speed being used. Ensure there is ample network bandwidth between the front-end application and the Dgraph. To identify Gigabit Ethernet transmission speeds, work with your network administrator.
What is the configuration of NIC cards? Ensure that NIC duplex settings match between the Dgraph host and the web application client host and that both are set to full duplex. A mismatch can cause latency issues.
Could large response sizes returned by the Dgraph be saturating the network? Use Cheetah analysis to confirm large response s izes returned by the Dgraph, which can be caused by the query features you use. The way certain features are used can cause slow processing time and also saturate the network.
Do you have queries waiting in the Dgraph queue to be processed? Check "Threading/Queuing Information" summary in Cheetah for the number of items experiencing queue issues and the number of HTTP Error request 408 timeouts. Review the Dgraph setting for the number of worker threads and consider increasing it, if it is set to 1. Queuing can also be caused by spikes in traffic.
Does the front-end application process the responses returned by the Dgraph quickly enough? Check CPU, memory, and disk I/O utilization on the front-end application server. Ensure the application server does not need to be tuned and that large responses are not being returned by the Dgraph.