Overview of Streaming
The Oracle Cloud Infrastructure Streaming service provides a fully managed, scalable, and durable solution for ingesting and consuming high-volume data streams in real-time. Use Streaming for any use case in which data is produced and processed continually and sequentially in a publish-subscribe messaging model.
You can use Streaming for:
- Use Streaming to decouple the components of large systems. Producers and consumers can use Streaming as an asynchronous message bus and act independently and at their own pace.
- Metric and log ingestion
- Use Streaming as an alternative for traditional file-scraping approaches to help make critical operational data more quickly available for indexing, analysis, and visualization.
- Web or mobile activity data ingestion
- Use Streaming for capturing activity from websites or mobile apps, such as page views, searches, or other user actions. You can use this information for real-time monitoring and analytics, and in data warehousing systems for offline processing and reporting.
- Infrastructure and apps event processing
- Use Streaming as a unified entry point for cloud components to report their lifecycle events for audit, accounting, and related activities.
Streaming provides the following features:
- Fully managed
- Streaming is fully managed, from the underlying
infrastructure to its provisioning, deployment, maintenance, security patching,
and replication. Integration with Monitoring and
default metrics make operations easy.
Oracle manages stream partitions and consumer groups can handle your message offsets.
- Durability and Availability
- Messages published to the Streaming service are
synchronously replicated across three availability domains when
available. In regions with a single availability domain, the data is replicated
across multiple fault domains. This ensures that even the failure of an
availability domain or fault domain does not result in data loss. The result is
highly durable data.
Oracle Cloud Infrastructure provides a service-level agreement (SLA) for Streaming. Refer to the Oracle Cloud Infrastructure Service Level Agreement page for details.
Streaming data is encrypted both at rest and in transit, ensuring message integrity. You can let Oracle manage encryption, or use the Oracle Cloud Infrastructure Vault service to securely store and manage your own encryption keys if you need to meet specific compliance or security standards.
Integration with Oracle Cloud Infrastructure Identity and Access Management (IAM) lets you control who and what services can access which keys and what they can do with those resources.
Private endpoints restrict access to a specified virtual cloud network (VCN) within your tenancy so that its streams cannot be accessed through the internet.
For more information, see Securing a Stream.
- Stream processing
- Streaming's integration with Oracle Cloud Infrastructure Service Connector Hub means that you can designate a stream as a data source, use Oracle Cloud Infrastructure Functions to transform the stream's messages, and output the transformed messages to Object Storage or any other supported Service Connector Hub target while maintaining Streaming's order guarantees.
- Kafka compatibility
- Streaming makes it possible to offload the
setup, maintenance, and management of the infrastructure that hosting your own
Apache Kafka cluster requires.
Streaming is compatible with most Kafka APIs, allowing you to use applications written for Kafka to send messages to and receive messages from the Streaming service without having to rewrite your code. See Using Kafka APIs for more information.
Streaming also takes advantage of the Kafka Connect ecosystem to interface directly with first-party and third-party products by using out-of-the-box Kafka source and sink connectors. See Using Kafka Connect for more information.
How Streaming Works
Here's how Streaming works:
A producer publishes messages to a stream, which is an append-only log. These messages are distributed among Oracle-managed partitions for scalability.
Partitions allow you to distribute a stream by splitting messages across multiple nodes (or brokers). Each partition can be placed on a separate machine, allowing multiple consumers to read a stream in parallel.
A consumer reads messages from one or more partitions. Consumers can read from any partition regardless of where the partition is hosted. Each message within a stream is marked with an offset value, so a consumer can pick up where it left off if it is interrupted. Messages from a partition are guaranteed to be delivered in the same order they were produced.
Consumers can read messages explicitly by providing the partition and offset, or as a member of a consumer group, which coordinates the consumption of an entire stream by the members of the group.
The following concepts are essential to understanding and working with Streaming.
- A partitioned, append-only log of messages.
- stream pool
A grouping that you can use to organize and manage streams, including any shared Kafka or security settings.
- A section of a stream. Partitions allow you to distribute a stream by splitting messages across multiple nodes. This also allows multiple consumers to read from a stream in parallel.
A pointer to a location in a stream. This location could be a pointer to a specific offset or time in a partition, or to a group's current location.
- A Base64-encoded message that is published to a stream. Streaming is schema-agnostic and accepts any message format, including XML, JSON, CSV, and even compressed formats such as gzip. Producers and consumers should agree upon the message format.
- An entity that publishes messages to a stream.
- An entity that reads messages from one or more streams.
- consumer group
- A set of instances which coordinate to consume messages from all partitions in a stream. At any given time, the messages from a specific partition can only be consumed by a single consumer in the group.
- A member of a consumer group. Instances are defined when a group cursor is created. Group membership is maintained through interaction; lack of interaction results in a timeout, removing the instance from the consumer group.
- An identifier used to group related messages.
- The location of a message within a partition. Each message within the partition is identified by its offset. Consumers can read messages starting from any chosen offset. You can use the offset to restart reading from a stream if interrupted.
Benefits of Streams
Streams have several advantages over traditional messaging queues, including:
- Configurable message persistence
- You control how long your data is retained. Messages in a stream are immutable and available for the entirety of the stream's configured retention time.
- Because a stream's messages are not removed immediately when processed by consumers, you can replay any and all messages in the stream at any time within the configured retention limit.
- Message guarantees
- Each message is guaranteed to be delivered at least once. In some cases, such as a consumer's failure to commit messages before going offline, messages may be delivered multiple times.
- Order guarantees
- Messages within a stream, per partition, are always delivered in the same order that they were produced.
- Client-side cursors
- Your client applications control and track which messages are read and can move the cursor as needed for maximum flexibility.
- Horizontal scale
- Partitions provide an opportunity to scale up throughput to meet the needs of multiple consumers, resulting in increased flexibility.
- Consumer groups
- Consumer groups handle all of the coordination that is required to deliver messages to multiple consumers in a balanced manner. Because this management is handled by a consumer group on behalf of all of its members, you can enjoy reduced overhead and operational ease.