2 Kafka Middleware Setup

This topic provides the information about the kafta middleware setup.

2.1 Zookeeper Setup

This topic provides the systematic instructions to install and setup the Zookeeper.

Kafka uses ZooKeeper to manage the cluster. ZooKeeper is used to coordinate the brokers/cluster topology. ZooKeeper is a consistent file system for configuration information. ZooKeeper gets used for leadership election for Broker Topic Partition Leaders. Here we are going to start a node of 2 zookeeper ensemble on 2 servers each.
  1. Extract the zookeeper installation files in /tools/zookeeper on both the servers.
  2. Navigate to config folder in /tools/zookeeper/conf.
  3. Duplicate the zoo_sample.cfg and rename it to zookeeper1.cfg
  4. Open zookeeper1.cfg and modify the following properties.
    DataDir= <zookeeper home directory>/data
    tickTime=2000
    clientPort= Zookeeper client Port value (2181)
    initLimit=10
    syncLimit=5
    
    server.1=<hostname>:<peer port>:<leader port>
    #1 is the id that we put in myid file.
    
    server.2=<hostname>:<peer port>:<leader port>
    #2 is the id that we will put in myid file of second node.
    
    server.3=<hostname>:<peer port>:<leader port>
    #3 is the id that we will put in myid file of third.

    Example:

    tickTime=2000

    initLimit=5

    syncLimit=2

    clientPort=2181

    dataDir=/tmp/zookeeper-oblm/zookeeper-node1

    server.1=server1-IP:2666:3666

    server.2=server2-IP:2667:3667

    Note:

    Update the IP value with the respective server IP.
  5. Duplicate the zoo.cfg file and rename it as zookeeper2.cfg in the same directory on Server 2 (Other names can also be used). These configuration files used for each of the zookeeper nodes
  6. Open zookeeper2.cfg and modify the following properties.

    clientPort=2182

    dataDir=/tmp/zookeeper-oblm/zookeeper-node2

    server.1=server1-IP:2666:3666

    server.2=server2-IP:2667:3667

    Note:

    Update the IP value with the respective server IP.
  7. Copy the zookeeper1.cfg and zookeeper2.cfg and Paste it in the local.
  8. Open the directory /tmp/zookeeper-oblm/zookeeper-node1 on server 1 and create a file named myid, open with text editor and write 1, save and close.
  9. Open the directory /tmp/zookeeper-oblm/zookeeper-node2 on server 2 and create a file named myid, open with text editor and write 2, save and close.
  10. Run the command to start the zookeeper nodes.
    On Server 1:

    nohup ./bin/zkServer.sh start conf/zookeep

    On Server 2:

    nohup ./bin/zkServer.sh start conf/zookeep

2.2 Kafka Setup

This topic provides the systematic instruction to install and setup kafka.

  1. Extract the kafka installation file in /tools/kafka on both the servers.
  2. Navigate to config folder in Apache Kafka (/tools/kafka/config).
  3. Duplicate the server.properties from config folder and rename it to server1.properties.
  4. Open server1.properties and modify the following properties.
    broker.id= (Unique Integer which identifies the kafka broker in the cluster.
    listeners=PLAINTEXT://<hostname>:<Kafka broker listen port(9092)>
    log.dirs=<Kafka home directory>/logs
    log.retention.hours= <The number of hours to keep a log file 
    before deleting it (in hours), tertiary to log.retention.ms property>
    log.retention.bytes= <The maximum size of the log before deleting it>
    log.segement.bytes= <The maximum size of a single log file>
    log.retention.check.interval.ms= <The frequency in milliseconds that
    the log cleaner checks whether any log is eligible for deletion>
    zookeeper.connect=<zookeeper_hostname_1>:<zookeeper_client_port>,
    <zookeeper_hostname_2>:<zookeeper_client_port>,<zookeeper_hostname_3>:<zookeeper_client_port>
    Example:

    broker.id=0

    port=9092

    log.dirs=/tmp/kafka-oblm/logs-node1

    zookeeper.connect=server1-IP:2181,server2-IP:2182

    num.partitions=2

    min.insync.replicas=1

    default.replication.factor=2

    offsets.topic.replication.factor=2

    transaction.state.log.replication.factor=2

    transaction.state.log.min.isr=1

    Note:

    If the Apache Zookeeper is on different server, then change the zookeeper.connect property. i.e., update the highlighted value for the respective server IPs. min.insync.replicas: A typical configuration is replication-factor minus 1.
  5. Duplicate the server.properties into the same directory and rename it to server2.properties on server 2.
  6. Open server2.properties and modify the following properties.

    broker.id=1

    broker.id=1

    log.dirs=/tmp/kafka-oblm/logs-node2

    Note:

    By default, Apache Kafka will run on port 9092 and Apache Zookeeper will run on port 2181.
  7. Copy the server1.properties and server2.properties and paste it in local.
  8. To run Kafka brokers, change path to /tools/kafka directory and run the following command in separate terminals.
    On Server 1:

    nohup ./bin/kafka-server-start.sh config/server1.properties

    On Server 2:

    nohup ./bin/kafka-server-start.sh config/server2.properties

  9. The values set for Logs is under the segment: “Log Retention Policy” in server*.properties file attached in the document. The values set under this segment are defaults from Apache
  10. At present, kafka takes the default value for message size as: message.max.bytes=1000012
  11. Add and update this field in server*.properties for increasing based on requirement.
  12. To add compression type for all data generated by the producer, add the following property in server*.properties file.
    compression.type=none

    Note:

    The default is none (i.e. no compression). Valid values are none, gzip, snappy, lz4, or zstd.