Using Streaming with Apache Kafka
This topic describes how you can use Apache Kafka with Oracle Cloud Infrastructure Streaming.
Oracle Cloud Infrastructure Streaming lets users of Apache Kafka offload the setup, maintenance, and infrastructure management that hosting your own Zookeeper and Kafka cluster requires.
Streaming is compatible with most Kafka APIs, allowing you to use applications written for Kafka to send messages to and receive messages from the Streaming service without having to rewrite your code. See Using Kafka APIs for more information.
Streaming can also utilize the Kafka Connect ecosystem to interface directly with external sources like databases, object stores, or any microservice on the Oracle Cloud. Kafka connectors can easily and automatically create, publish to, and deliver topics while taking advantage of the Streaming service's high throughput and durability. See Using Kafka Connect for more information.
Use cases for Streaming and Kafka include:
Move data from Streaming to Autonomous Data Warehouse via the JDBC Connector to perform advanced analytics and visualization.
Use the Oracle GoldenGate connector for Big Data to build an event-driven application.
Move data from Streaming to Oracle Object Storage via the HDFS/S3 Connector for long term storage, or to run Hadoop/Spark jobs.
Requirements and Limitations
Unique Stream Names
If you have streams with the same names in a compartment, you can't use Kafka with Streaming until you delete the duplicated streams, unless the streams are in different stream pools. Two streams with the same name can exist in the same compartment only if the streams are in different stream pools.
Duplicate stream names otherwise manifest through an "authentication failed" error. If you do not want to delete your streams, contact the Streaming team so we can rename your streams without data loss.
The following Kafka APIs and features are not implemented in the Streaming service:
- Compacted topics
- Idempotent producers
- Kafka Streams
- Adding partitions to a topic
- Some administrative APIs
Load Balancing Connection Recycling
Because the Kafka protocol uses long-lived TCP connections, the Streaming Kafka compatibility layer implements a load balancing mechanism to periodically balance connections between front-end nodes. This mechanism periodically closes connections to force new ones. Most Kafka SDKs handle these disconnections automatically when consuming, but producing to Streaming using the Kafka API may raise disconnection errors. Disconnections can be mitigated by adding retries to your requests. Retries are part of the Kafka SDK and are automatically enabled, and you can explicitly configure their behavior.