Concepts in Streaming with Apache Kafka

To better understand OCI Streaming with Apache Kafka, review some terms and concepts.

Terms

Apache Kafka
Apache Kafka is an open source event streaming platform. It runs as a distributed system consisting of servers and clients. OCI Streaming with Apache Kafka lets you run fully managed Kafka clusters in a tenancy with 100% compatibility with Apache Kafka.

You use the OCI Console, API, or CLI to create and update the Cluster and Broker.

Cluster
A distributed system of Apache Kafka servers or brokers that store and manage the streaming data. In OCI Streaming with Apache Kafka, you can create two types of clusters: starter cluster or high availability cluster.
Broker
A Kafka server that stores data and processes client requests. Depending on the workload, you can create 1-30 brokers in each cluster. Each broker in a cluster is a compute node that's used to store topics.

You use the Apache Kafka API or CLI to create and update topics, partitions, records, and producer and consumer operations.

Topic
User-defined categories in a broker where data is stored. Each topic can have many partitions for data distribution and parallel processing.
Partitions
Topics are split into user specified number of partitions that store records. Records are ordered within each partition and each record is assigned a unique, sequentially increasing offset.
Records
Key value data pairs that are written to and read from topics.
Producer
Application that writes records to topics.
Consumer
Application that reads records from topics.
Coordinator Cluster
An internal coordinator instance within each cluster that tracks activities in a cluster, such as partitions. A cluster with 2 or less brokers gets a single node coordinator cluster. Larger clusters get a 3 node coordinator cluster. You can't view details of the coordinator cluster.

Service Limits

Limit Value
Clusters per tenancy 5
Brokers per cluster 30
Brokers per tenancy 150
Storage

Min 50 GB to Max 5 TB per broker

Max 150 TB per cluster

Ingress per cluster no limits
Egress per cluster no limits
Ingress per partition no limits
Egress per partition no limits
Partitions no limits
Max client connections no limits
Max connection attempts no limits
Max requests (per second) no limits
Max message size (MB) no limits
Max request size (MB) no limits
Max fetch bytes (MB) no limits
API keys Not Applicable
ACLs no limits
Configurations 100 per tenancy
Configuration updates no limits

Feature Limits

Feature Support
Exactly once semantics Yes
Key-based compacted storage Yes
Custom connectors No
Native ksqlDB support No
Public networking No
Private networking Yes
OAuth No
Audit logs Yes
Self-managed encryption keys No
Automatic elastic scaling No
Stream sharing No
Client quotas Yes

Cluster Types

In OCI Streaming with Apache Kafka, you can create two types of clusters.

Starter Cluster
Designed for testing and developmental purposes where high-availability isn't a critical requirement. A starter cluster offers a flexible set-up where the clusters can have between 1 to 30 brokers.
High Availability Cluster
Intended for production environments where high availability is essential. These clusters are configured with a minimum of 3 brokers distributed across multiple availability or fault domains to ensure redundancy and fault tolerance. The cluster can scale up to a maximum of 30 brokers, providing reliability and resiliency for mission-critical workloads.

Cluster Default Options

Review the following default options for a Kafka cluster.

Connectivity
By default, all Kafka clusters are created with private connectivity and can be accessed from within the VCN and subnet specified during cluster creation. If more VCNs need access to the Kafka cluster, configure VCN peering.
Broker Disk Quotas
By default, broker disk quotas are enforced to ensure stability and prevent overuse of the disk resources. When a broker disk reaches 97% capacity, producer operation are rate-limited while consumer operations are unaffected and continue to function as expected. When a broker disk reaches 98% capacity, all producer operations are blocked while consumer operations can continue consuming events and committing offsets without interruption. You can change this default behavior and define custom storage quotas in a cluster configuration file.
Cluster Configuration File
When you create a Kafka cluster, the cluster configuration file is created with default properties depending on the cluster type. The property settings are designed to balance reliability, durability, and ease of use. You can use the default file or create a custom one. For a list of configurable and non configurable properties, see Managing Cluster Configurations.

High Availability cluster

allow.everyone.if.no.acl.found=true
auto.create.topics.enable=false
leader.imbalance.per.broker.percentage=1
default.replication.factor=3
offsets.topic.replication.factor=3
min.insync.replicas=2
transaction.state.log.min.isr=2
transaction.state.log.replication.factor=3
Property Default Value Purpose
allow.everyone.if.no.acl.found true If no ACL is found for a resource, all users are allowed access to the cluster.
auto.create.topics.enable false Topics aren't automatically created, they must be explicitly created for better control.
leader.imbalance.per.broker.percentage 1 Controls leader imbalance threshold per broker for rebalancing.
default.replication.factor 3 New topics are created with 3 replicas for high durability.
offsets.topic.replication.factor 3 The internal offsets topic is replicated 3 times for resilience.
min.insync.replicas 2 At least 2 replicas must acknowledge a write for durability.
transaction.state.log.min.isr 2 Minimum in-sync replicas for the transaction state log.
transaction.state.log.replication.factor 3 Replication factor for the transaction state log.

Starter cluster

allow.everyone.if.no.acl.found=true
auto.create.topics.enable=false
leader.imbalance.per.broker.percentage=1
default.replication.factor=1
offsets.topic.replication.factor=1
min.insync.replicas=1
transaction.state.log.min.isr=1
transaction.state.log.replication.factor=1
Property Default Value Purpose
allow.everyone.if.no.acl.found true If no ACL is found for a resource, all users are allowed access to the cluster.
auto.create.topics.enable false Topics aren't automatically created, they must be explicitly created for better control.
leader.imbalance.per.broker.percentage 1 Controls leader imbalance threshold per broker for rebalancing.
default.replication.factor 1 New topics are created with 1 replica (no redundancy).
offsets.topic.replication.factor 1 The internal offsets topic has a single replica.
min.insync.replicas 1 Only 1 replica needs to acknowledge a write.
transaction.state.log.min.isr 1 Minimum in-sync replicas for the transaction state log.
transaction.state.log.replication.factor 1 Replication factor for the transaction state log.

Plan Cluster Size and Storage

Review the cluster sizing guidelines for optimizing and getting the most out of an OCI Streaming with Apache Kafka cluster.

General Guidelines

If you need more throughput, then you can implement the following actions:
  • Increase the number of brokers in the cluster
  • Increase the number of partitions in the cluster
  • Increase OCPU per broker
If you need lower latency, then you can implement the following actions:
  • Use larger brokers with higher memory
  • Use smaller batches, shorter linger.ms, and optimize network path

If durability is more important than latency, then consider setting the producer configuration parameter acks set to all.

Not planning and configuring the cluster impacts the cluster performance.

  • Configuring less brokers results in low throughput, high latency, and broker overload.
  • Configuring less partitions results in poor parallelism and low consumer usage.
  • Configuring underpowered brokers results in high fetch or produce latency, GC issues, and unstable brokers.
  • Configuring too many partitions on smaller brokers results in high memory usage, metadata bloat, and unstable rebalance.

Performance Benchmarks

Review the performance metrics on Kafka clusters with Arm-based processors, replication factor set to 3, and the producer configuration parameter acks set to 1.

Broker Configuration Partitions Maximum Producer Throughput Maximum Consumer Throughput Latency
3 brokers, each with the following configuration:
  • 2 A1 OCPU
  • 12 GB Memory
  • 200 GB Block Storage
  • 10 VPU
1000 161.21 MB per second 394.38 MB per second 50.70 ms
3 brokers, each with the following configuration:
  • 12 A1 OCPU
  • 72 GB Memory
  • 300 GB Block Storage
  • 10 VPU
2000 368.76 MB per second 678.39 MB per second 27.79 ms
3 brokers, each with the following configuration:
  • 20 AI OCPU
  • 120 GB Memory
  • 500 GB Block Storage
  • 10 VPU
2000 505.13 MB per second 710.82 MB per second 21.11 ms
18 brokers, each with the following configuration:
  • 2 AI OCPU
  • 12 GB Memory
  • 300 GB Block Storage
  • 10 VPU
2000 379.39 MB per second 690.27 MB per second 80.18 ms
18 brokers, each with the following configuration:
  • 12 AI OCPU
  • 72 GB Memory
  • 500 GB Block Storage
  • 10 VPU
4000 788.73 MB per second 998.11 MB per second 74.53 ms
18 brokers, each with the following configuration:
  • 20 AI OCPU
  • 120 GB Memory
  • 1000 GB Block Storage
  • 10 VPU
4000 1.08 GB per second 1.15 GB per second 71.29 ms
30 brokers, each with the following configuration:
  • 2 AI OCPU
  • 12 GB Memory
  • 300 GB Block Storage
  • 10 VPU
4000 617.60 MB per second 1.02 GB per second 98.27 ms
30 brokers, each with the following configuration:
  • 12 AI OCPU
  • 72 GB Memory
  • 500 GB Block Storage
  • 10 VPU
6000 1.65 GB per second 1.34 GB per second 65.81 ms
30 brokers, each with the following configuration:
  • 20 AI OCPU
  • 120 GB Memory
  • 1000 GB Block Storage
  • 10 VPU
6000 2.41 GB per second 2.09 GB per second 56.82 ms

The performance metrics depend on several factors and improvements in one configuration could lead to trade offs in others.

For example, throughput numbers vary depending on factors such as message size, compression type, batch size, and linger.ms. A higher batch size can increase throughput, but can also lead to higher latency.

Increasing the number of partitions typically improves throughput and enhances producer performance. However, it also increases the resource usage of producers and consumers and introduces more metadata overhead. This can slow down leader election and in-sync replica (ISR) management, potentially increasing the latency of produce and fetch requests.

You must tune the parameters based on the requirements.

  • To optimize for lower latency, reduce batch.size and linger.ms, and avoid heavy compression.
  • To maximize throughput, increase batch.size, enable compression, and use more partitions and brokers to scale horizontally.

Always monitor latency and throughput metrics closely and tune the cluster iteratively based on real-world workload patterns.

Migrate

You can migrate data from an self-managed Apache Kafka cluster to an OCI Streaming with Apache Kafka cluster.

The recommended solution for this migration use case is to create a MirrorMaker 2.0 connector in a Connect cluster.

Following is a sample client.properties configuration:
security.protocol=SASL_SSL
sasl.mechanism=SCRAM-SHA-512
ssl.truststore.location=</path/to/truststore.jks>
ssl.truststore.password=<truststore-password>
sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="<username>" password="<password>";