Using Apache Kafka

Apache Kafka is an open source distributed publish-subscribe messaging platform purpose-built to handle real-time streaming data for distributed streaming, pipelining, and replay of data feeds for fast, scalable operations. Kafka is a broker-based solution that operates by maintaining streams of data as records within a cluster of servers.

In Big Data Service clusters, Kafka isn't installed by default. Add the Kafka Service by using the Add Service option in Ambari UI.