Load balancing and routing of requests

This topic discusses the load balancing and routing of requests from Studio nodes to the Dgraph nodes in Oracle Big Data Discovery.

Load balancing of requests

Depending on your deployment strategy, to the external clients, the entry point of contact with the on-premise deployment of the Big Data Discovery cluster could be either any Studio-hosting node in the cluster, or an external load balancer configured in front of Studio instances.

The Big Data Discovery cluster relies on the following two levels of load balancing of requests:
  1. Load balancing of requests across the nodes hosting multiple instances of Studio. This task should be performed by an external load balancer, if you choose to use it in your deployment (an external load balancer is not included in the Big Data Discovery package).

    If an external load balancer is used, it receives all requests and distributes them across all of the nodes in the Big Data Discovery cluster deployment that host the Studio application. Once a request is received from a Studio node, it is routed by BDD to the appropriate Dgraph node.

    If an external load balancer is not used, external requests can be sent to any Studio node. They are then load-balanced between the nodes hosting the Dgraph.

  2. Load balancing of requests across the Dgraph nodes. This task is automatically handled by the BDD cluster — the Big Data Discovery software accepts requests from its Studio and Data Processing components on any node hosting the Dgraph, and provides internal load balancing of these requests across the other Dgraph-hosting nodes in the cluster.

Routing of requests

The Big Data Discovery cluster automatically directs requests to the subset of the cluster nodes hosting the Dgraph instances.

The following statements describe the behavior of the BDD cluster for routing of requests to Dgraph nodes:
  • Requests can be submitted from Studio or Data Processing components to any Dgraph Gateway in the BDD cluster, which in turn will route the request to an appropriate Dgraph node.

    For example, if the request is an updating request, such as a data loading request, or a configuration update, it is routed to the leader Dgraph node in the cluster. If the request represents a non-updating (query processing) request, it is routed to the leader Dgraph node or to any of the follower Dgraph nodes. If a BDD cluster has only one node hosting the Dgraph, this node serves as the leader (with no followers).

  • Non-updating requests are load-balanced using round-robin algorithm across the Dgraph nodes, for processing.
  • The Big Data Discovery cluster utilizes session affinity for all requests arriving from Studio to the Dgraph, by relying on session ID in the header of each Studio request. Requests from the same session ID are always routed to the same Dgraph node in the cluster. This improves query processing performance by efficiently utilizing the Dgraph cache, and improves performance of caching entities (known in Studio as views).