This topic discusses how the BDD cluster deployment ensures
enhanced availability of query-processing.
Important: The BDD cluster deployment provides enhanced
availability but does not provide high availability. This topic discusses the
cluster behavior that enables enhanced availability and notes instances where
system administrators need to take action to restore services.
The following three sections discuss the BDD cluster behavior for
providing enhanced availability.
Note: This topic discusses BDD deployments with more than one running
instance of the Dgraph. Even though you can deploy BDD on a single node, such
deployments can only serve development environments, as they do not guarantee
the availability of query processing in BDD. Namely, in a BDD deployment where
only one node is hosting a single Dgraph instance, a failure of the Dgraph node
shuts down the Dgraph process.
Availability of WebLogic Server nodes hosting Studio
When a WebLogic Server node goes down, Studio also
goes down. As long as the BDD cluster utilizes an external load balancer and
consists of more than one WebLogic Server node on which Studio is started, this
does not disrupt Big Data Discovery operations.
If a WebLogic Studio node hosting Studio fails, the BDD cluster (that
uses an external load balancer) stops using it and relies on other Studio
nodes, until you restart it.
Availability of Dgraph nodes
The ZooKeeper ensemble running on a subset
of Hadoop (CDH or HDP) nodes ensures the enhanced availability of the Dgraph
cluster nodes and services:
- Failure of a leader
Dgraph. When the leader Dgraph of a database goes offline, the BDD cluster
elects a new leader and starts sending updates to it. During this stage,
follower Dgraphs continue maintaining a consistent view of the data and
answering queries. You should manually restart this node with the
bdd-admin script. When the Dgraph that had a leader
role is restarted and joins the cluster, it becomes one of the follower
Dgraphs. It is also possible that the leader Dgraph is restarted and joins the
cluster before the cluster needs to appoint a new leader. In this case, that
Dgraph continues to serve as the leader.
- Failure of a follower
Dgraph. When a follower Dgraph goes offline, the BDD cluster starts routing
requests to other available Dgraphs. You should manually restart this node
using the
bdd-admin script. Once the node is restarted, it
rejoins the cluster, and the cluster adjusts its routing information
accordingly.
Availability of ZooKeeper instances
The
ZooKeeper instances themselves must be highly available. The following
statements describe the requirements in detail:
- Each Hadoop node in the
BDD cluster deployment can be optionally configured at deployment time to host
a ZooKeeper instance. To ensure availability of ZooKeeper instances, it is
recommended to deploy them in a cluster of their own, known as an ensemble. At
deployment time, it is recommended that a subset of the Hadoop nodes is
configured to host ZooKeeper instances. As long as a majority of the ensemble
is running, ZooKeeper services are used by the BDD cluster. Because ZooKeeper
requires a majority, the optimal number of Hadoop nodes hosting Zookeeper
instances is an odd number that is at least 3.
- A Hadoop node hosting a
ZooKeeper instance assumes responsibility for ensuring the ZooKeeper process
uptime. It will start ZooKeeper when BDD is deployed and will restart it should
it stop running.
- If you do not configure at
least three Hadoop nodes to run ZooKeeper, it will be a single point of
failure. Should ZooKeeper fail, the data sets served by BDD become entirely
unavailable. To recover from this situation, the Hadoop node that was running a
failed ZooKeeper must be restarted or replaced (the action required depends on
the nature of the failure).