Overview of Streaming with Apache Kafka

Oracle Cloud Infrastructure (OCI) Streaming with Apache Kafka is a fully managed OCI service that lets you create and run Kafka clusters in an OCI tenancy with all the functionalities of Apache Kafka.

Apache Kafka is an open source event streaming platform used to build real-time data streaming applications. Using Apache Kafka, you can:

  • Write and read streams of events
  • Store streams of events
  • Process streams of events in real-time or later

With Streaming with Apache Kafka, you get all the functionalities of Apache Kafka without the overhead of provisioning and managing the underlying infrastructure.

Image showing all the use cases of OCI Streaming with Apache Kafka.

Features

Streaming with Apache Kafka is built with the following features:

Fully managed
Streaming with Apache Kafka is fully managed and automates activities such as patching, upgrades, backups, high-availability, cross-region replication, scaling, and performance management.
Durability and availability
Each cluster is configured with high-availability and storage redundancy across availability domains or fault domains. You can either create a cluster with a single broker for development or test tenancies, or create a cluster with at least 3 brokers for production tenancies to provide high availability.
Streaming with Apache Kafka automatically detects and recovers from common cluster failure scenarios. This ensures that the producer and consumer applications experience minimal disruption during write and read operations. When Streaming with Apache Kafka detects a broker failure, it either mitigates the failure or replaces the unhealthy or unreachable broker with a new one. Where possible, it reuses the existing storage from the failed broker to minimize the amount of data that Apache Kafka needs to replicate. The availability impact is limited to the time required for Streaming with Apache Kafka to detect and recover from the failure. After recovery, producer and consumer applications can continue communicating with the same broker endpoints as before, ensuring seamless operation.
Apache Kafka compatibility
Streaming with Apache Kafka is 100% compatible with Apache Kafka APIs allowing you to use applications written for Apache Kafka without rewriting the code.
Integrated OCI services
  • Use OCI Vault to securely store and manage super user credentials
  • Use OCI Monitoring for cluster metrics
  • Use OCI Logging for cluster level logs

Use Cases

Use Streaming with Apache Kafka in the following scenarios:

Change data capture
Change data capture (CDC) is a style of application design where changes to the application's state are logged as a time-ordered sequence of records. OCI Streaming with Apache Kafka's support for cloud-scale log data storage makes it an excellent backend for an application built in this style. You can deploy any open source Kafka connectors, or Oracle Golden Gate, in a virtual machine (VM), which polls the source databases for new or changed data based on an update timestamp column and easily streams the data into OCI Streaming with Apache Kafka. For example: E-commerce companies use CDC with Kafka to track order updates in their database to initiate order processing and other order fulfillment micro services.
Image showing the change data capture architecture
Metric and log ingestion
Use OCI Streaming with Apache Kafka as the metrics or log processor from diverse sources. Log ingestion tools such as Fluentd, Logstash, or Kafka Producer API can collect logs from various applications and place them in Kafka topics for data enrichment and aggregation. Using Apache Kafka APIs, you can enrich the data by abstracting the details of logs and sending them to downstream analytics tools for further processing and advance log search capabilities.
Image showing the metrics and logs ingestion architecture
Real-time analytics
Use OCI Streaming with Apache Kafka to process and analyze continuous streams of data from IOT devices or other upstream applications for real-time insights, anomaly detection, and predictive analytics. For example, financial institutions use the service to process market data feeds, detect trading anomalies, and make real-time trading decisions. Retailers analyze customer behavior and preferences in real-time to offer personalized recommendations and promotions.
Image showing real-time analytics architecture
Web and mobile activity data ingestion
Use OCI Streaming with Apache Kafka to rebuild a user activity tracking pipeline as a set of real-time publish-subscribe feeds. These feeds are available for subscription for a range of use cases, including real-time processing, real-time monitoring, and loading into Hadoop or offline data warehousing systems for offline processing and reporting. You can use this solution for the following uses:
  • Clickstream: Clickstream use cases involve collecting website activity data from multiple producers, and analyzing the data in real-time to provide recommendations, such as products to buy, news articles to read, and videos to watch.
  • Game analytics: Gaming companies constantly monitor network lag, user behavior, and in-game activities to offer customers in-game microtransactions, to re-balance network load, change rendering engine parameters, and more. All these actions happen in real- time, in the order of milliseconds to a few seconds.
Image showing web and mobile activity data ingestion
Messaging
Use OCI Streaming with Apache Kafka to decouple the components of large systems. For example, producers and consumers can use OCI Streaming with Apache Kafka as an asynchronous message bus and act independently and at their own pace.
Image showing messaging architecture

When to Use OCI Streaming with Apache Kafka Vs OCI Streaming

Review the details for OCI Streaming with Apache Kafka and OCI Streaming to find the best solution for your Streaming needs.

OCI Streaming OCI OCI Streaming with Apache Kafka
Recommended as a messaging bus for app-to-app communication. Ideal for small-to medium workloads with less than 500 partitions per region per tenancy. Recommended for distributed data storage and real-time data processing including CDC, stream analytics, and processing IOT data, with no limits on the number of partitions.
Managed and serverless Managed, but not serverless
Partial compatibility with Apache Kafka 100% compatible with Apache Kafka
Performance latency ~ 200 ms on average, when cluster is tuned properly. Performance latency less than 100 ms, when cluster is tuned properly.
Multi tenant: Single cluster contains several customer tenancies. Single tenant: Each cluster is dedicated to a single tenancy.
Authentication and authorization using IAM. Authentication using mTLS or SASL/SCRAM and authorization using ACL.
Soft limit 15 and hard limit 500 on partitions No limit
Storage retention 7 days No limit
No storage size limit Storage size limit 5 TB per broker
Write throughput per partition 1 MB per second and Read throughput per partition 2 MB per second. Throughput ~ 10 MB per second.

The default throughput per partition in Apache Kafka isn't a fixed, hard limit, but a combination of factors. It's estimated to be around 10 MB per second per partition. Maximum throughput per partition depends on underlying infrastructure and configurations, such as batching size, compression codec, replication factor, and acknowledgment type.

Maximum message size 1 MB Default maximum message size set to 1 MB to help brokers manage memory effectively. This can be changed in cluster configuration. No limit on maximum size, but very large messages aren't recommended and considered ineffective and anti pattern in Apache Kafka.
Scaling not supported Scale number of brokers upto 30 brokers per cluster and scale number of OCPUs upto the maximum defined by the compute shapes.
50 consumer groups per topic No limit, but the more the consumer groups, the more network usage.
Functionalities compacted topic, idempotent produce, transaction, stream API not available. Functionalities compacted topic, idempotent produce, transaction, stream API supported.
Limited low cardinality metrics available Extensive high cardinality metrics available

Resource Identifiers

Streaming with Apache Kafka supports clusters and work requests as Oracle Cloud Infrastructure resources. Most types of resources have a unique, Oracle-assigned identifier called an Oracle Cloud ID (OCID). For information about the OCID format and other ways to identify resources, see Resource Identifiers.

Regions and Availability Domains

Oracle hosts its OCI services in regions and availability domains. A region is a localized geographic area, and an Availability domain is one or more data centers found within a region. Streaming with Apache Kafka is hosted in all regions of the OC1 realm.

Authentication and Authorization

Each service in Oracle Cloud Infrastructure integrates with IAM for authentication and authorization, for all interfaces (the Console, SDK or CLI, and REST API).

An administrator in your organization needs to set up groups , compartments , and policies  that control which users can access which services, which resources, and the type of access. For example, the policies control who can create new users, create and manage the cloud network, create instances, create buckets, download objects, and so on. For more information, read Getting Started with Policies.

Ways to Access Streaming with Apache Kafka

You can access Oracle Cloud Infrastructure (OCI) by using the Console (a browser-based interface), REST API, or OCI CLI. Instructions for using the Console, API, and CLI are included in topics throughout this documentation. For a list of available SDKs, see Software Development Kits and Command Line Interface.

Console: To access Streaming with Apache Kafka using the Console, you must use a supported browser. To go to the Console sign-in page, open the navigation menu at the top of this page and select Infrastructure Console. You are prompted to enter your cloud tenant, your user name, and your password.

API: To access Streaming with Apache Kafka through APIs, the REST API documentation provide the most functionality, but require programming expertise. The API Reference and Endpoints provide endpoint details and links to the available API reference documents including the Streaming with Apache Kafka API. Streaming with Apache Kafka API let you create and manage the Kafka clusters and configuration files. Use the Apache Kafka APIs for client operations.

CLI: The OCI CLI let you create and manage the Kafka clusters and configuration files. Use the Apache Kafka CLIs for client operations. Use the Cloud Shell environment to run the CLIs.