Role of ZooKeeper

The ZooKeeper utility provides configuration and state management and distributed coordination services to Dgraph nodes of the Big Data Discovery cluster. It ensures high availability of the query processing by the Dgraph nodes in the cluster.

ZooKeeper is part of the CDH package. CDH package is assumed to be installed on all CDH nodes in the BDD cluster deployment. Even though ZooKeeper is installed on all CDH nodes in the BDD cluster, it may not be running on all of these nodes. To ensure availability of a clustered Dgraph deployment, configure an odd number (at least three) of CDH nodes to run ZooKeeper instances. This will avoid ZooKeeper being a single point of failure.

ZooKeeper has the following characteristics:

To summarize, in order to run, ZooKeeper requires a majority of its hosting nodes to be active. Therefore, it is recommended that ZooKeeper runs on an odd number (at least three) of the CDH nodes in the deployed Big Data Discovery cluster. You can ensure this during the installation, when running the deployment script.