Streaming Service Overview
The Oracle Cloud Infrastructure Streaming service provides a fully managed, scalable, and durable solution for ingesting and consuming high-volume data streams in real-time.
Use Streaming for messaging, ingesting application logs, operational telemetry, web click-stream data, or any other use cases in which data is produced and processed continually and sequentially in a publish-subscribe messaging model.
Here's how Streaming works: a producer publishes messages to a stream, which is an append-only log. These messages are distributed among Oracle-managed partitions using the message's key for scalability.
Partitions allow you to distribute a stream by splitting messages across multiple nodes (or brokers). Each partition can be placed on a separate machine to allow multiple consumers to read a stream in parallel. Consumers can read from any partition regardless of where the partition is hosted. All partitions associated with a stream are deleted when the stream is deleted.
A consumer reads messages from one or more partitions. Each message within a stream is marked with an offset value, so a consumer can pick up where it left off if it is interrupted. Consumers can read messages individually, or as a member of a consumer group.
You can use Streaming for:
- Use Streaming to decouple the components of large systems. Producers and consumers can use Streaming as an asynchronous message bus and act independently and at their own pace.
- Metric and log ingestion
- Use Streaming as an alternative for traditional file-scraping approaches to help make critical operational data more quickly available for indexing, analysis, and visualization.
- Web or mobile activity data ingestion
- Use Streaming for capturing activity from websites or mobile apps, such as page views, searches, or other user actions. You can use this information for real-time monitoring and analytics, and in data warehousing systems for offline processing and reporting.
- Infrastructure and apps event processing
- Use Streaming as a unified entry point for cloud components to report their lifecycle events for audit, accounting, and related activities.
Benefits of Streams
Streams have several advantages over traditional messaging queues. Such as:
- Configurable message persistence
- You control how long your data is retained. Messages in a stream are available for the entirety of the stream's configured retention time.
- Because a stream's messages are not removed immediately when processed by consumers, you can replay any and all messages in the stream at any time within the configured retention limit.
- Message guarantees
- Each message is guaranteed to be delivered at least once. In some cases, such as a consumer's failure to commit messages before going offline, messages may be delivered multiple times.
- Order guarantees
- Messages within a stream, per partition, are always delivered in the same order that they were produced.
- Client-side cursors
- Your client applications control and track which messages are read and can move the cursor as needed for maximum flexibility.
- Horizontal scale
- Partitions provide an opportunity to scale up throughput to meet the needs of multiple consumers, resulting in increased flexibility.
- Consumer groups
- Consumer groups handle all of the coordination that is required to deliver messages to multiple consumers in a balanced manner. Because this management is handled by a consumer group on behalf of all of its members, you can enjoy reduced overhead and operational ease.
The following concepts are essential to understanding and working with Streaming.
- A partitioned, append-only log of messages.
- stream pool
A grouping that you can use to organize and manage streams, including any shared Kafka or security settings.
- A section of a stream. Partitions allow you to distribute a stream by splitting messages across multiple nodes. Each partition is typically placed on a separate virtual machine in order to provide durability of data. This also allows multiple consumers to read from a stream in parallel.
A pointer to a location in a stream. This location could be a pointer to a specific offset or time in a partition, or to a groups' current location.
- A Base64-encoded message that is published to a stream.
- An entity that publishes messages to a stream.
- An entity that reads messages from one or more streams.
- consumer group
- A set of instances which coordinates messages from all of the partitions in a stream. At any given time, the messages from a specific partition can only be consumed by a single consumer in the group.
- A member of a consumer group. Instances are defined when a group cursor is created. Group membership is maintained through interaction; lack of interaction results in a timeout, removing the instance from the consumer group.
- An identifier used to group related messages.
- The location of a message within a partition. Each message within the partition is identified by its offset. Consumers can read messages starting from any chosen offset. You can use the offset to restart reading from a stream if interrupted.
- You can use IAM to set permissions on the following operations: list, get, update, create, and delete streams.
Streaming provides the following features:
- Fully managed
- Streaming is fully managed, from the underlying
infrastructure to its provisioning, deployment, maintenance, security patching,
and replication. Integration with Monitoring and
default metrics make operations easy.
Oracle manages stream partitions and consumer groups can handle your message offsets.
- Durability and Availability
- Messages published to the Streaming service are
synchronously replicated across three availability domains when
available. In regions with a single availability domain, the data is replicated
across multiple fault domains. This ensures that even the failure of an
availability domain or fault domain does not result in data loss. The result is
highly durable data.
Oracle Cloud Infrastructure provides a service-level agreement (SLA) for Streaming. Refer to the Oracle Cloud Infrastructure Service Level Agreement page for details.
Streaming data is encrypted both at rest and in transit, ensuring message integrity. You can let Oracle manage encryption, or use the Oracle Cloud Infrastructure Vault service to securely store and manage your own encryption keys if you need to meet specific compliance or security standards.
Integration with Oracle Cloud Infrastructure Identity and Access Management (IAM) lets you control who and what services can access which keys and what they can do with those resources.
Private endpoints restrict access to a specified virtual cloud network (VCN) within your tenancy so that its streams cannot be accessed through the internet.
For more information, see Stream Security.
- Stream processing
- Streaming's integration with Oracle Cloud Infrastructure Service Connector Hub means that you can designate a stream as a data source, use Oracle Cloud Infrastructure Functions to transform the stream's messages, and output the transformed messages to Object Storage or any other supported Service Connector Hub target while maintaining Streaming's order guarantees.
- Kafka compatibility
- Streaming makes it possible to offload the
setup, maintenance, and management of the infrastructure that hosting your own
Apache Kafka cluster requires.
Streaming is compatible with most Kafka APIs, allowing you to use applications written for Kafka to send messages to and receive messages from the Streaming service without having to rewrite your code. See Using Kafka APIs for more information.
Streaming also takes advantage of the Kafka Connect ecosystem to interface directly with first-party and third-party products by using out-of-the-box Kafka source and sink connectors. See Using Kafka Connect for more information.
Ways to Access Streaming
You can access Streaming using any of the following options, based on your preference and use case.
- Oracle Cloud
Infrastructure REST APIs provide the most
functionality, but require programming expertise. API Reference and Endpoints provides endpoint details and
links to the available API reference documents. For general information about using
the API, see REST APIs. The Streaming service is accessible with the Streaming API. Tip
Because Streaming is compatible with the Apache Kafka API, applications written for Kafka can also access Streaming.
- Oracle Cloud Infrastructure provides SDKs so that you can interact with Streaming without having to create a framework. Basic Streaming usage examples are included with our SDKs. For more information about using the SDKs, see the SDK Guides.
- The command line interface (CLI) provides both quick access and full functionality without the need for programming. For more information, see Using the CLI.
- The Console is an easy-to-use, browser-based interface. You can use the Console to create and manage streams, stream pools, and Kafka Connect configurations, but you cannot publish or consume messages using the Console.
To access the Console, you must use a supported browser. To go to the Console sign-in page, open the navigation menu at the top of this page and click Infrastructure Console. You will be prompted to enter your cloud tenant, your user name, and your password.
To get started with Streaming, see the following topics:
- For instructions on how to create and manage streams, see Managing Streams and Managing Stream Pools.
- For information about publishing messages to a stream, see Publishing Messages.
- For information on how to consume messages, see Consuming Messages.
- For information on Apache Kafka compatibility, see Using Streaming with Apache Kafka.
- For API reference documentation, see Streaming API.
- For SDK and CLI information, see Software Development Kits and Command Line Interface.
Authentication and Authorization
Each service in Oracle Cloud Infrastructure integrates with IAM for authentication and authorization, for all interfaces (the Console, SDK or CLI, and REST API).
An administrator in your organization needs to set up groups , compartments , and policies that control which users can access which services, which resources, and the type of access. For example, the policies control who can create new users, create and manage the cloud network, launch instances, create buckets, download objects, etc. For more information, see Getting Started with Policies. For specific details about writing policies for each of the different services, see Policy Reference.
If you’re a regular user (not an administrator) who needs to use the Oracle Cloud Infrastructure resources that your company owns, contact your administrator to set up a user ID for you. The administrator can confirm which compartment or compartments you should be using.
For common policies used to authorize Streaming users, see Common Policies.
For in-depth information on granting users permissions for the Streaming service, see Details for the Streaming Service in the IAM policy reference.
Limits on Streaming Resources
The Streaming service has the following limits:
- The maximum retention period for messages in a stream is seven days. The minimum retention period is 24 hours. All messages in a stream are deleted after the retention period passes, whether or not they have been read.
- The retention period for a stream cannot be changed after creation of the stream.
- A tenancy has a default limit of five partitions (Monthly Universal Credits) or zero partitions (Pay-as-You-Go or Promo). If your throughput requires additional partitions, you can request more.
- The number of partitions for a stream cannot be changed after creation of the stream.
- A single stream can support up to 50 consumer groups reading from the stream.
- Each partition can support:
- A total data write rate of 1 MB per second. There is no limit on the number of PUT requests, provided the limit of 1 MB per second per partition is not exceeded.
- 5 GET requests per second per consumer group. Since a single stream can support up to 50 consumer groups, and a single partition in a stream can be read by at-most one consumer in a consumer group, a partition can support up to 250 GET requests per second (5 GET requests per second per consumer in all 50 consumer groups).
- The maximum size of a unique message that producers can publish to a stream is 1 MB.
See Service Limits for a list of applicable limits and instructions for requesting a limit increase. To set compartment-specific limits on a resource or resource family, administrators can use compartment quotas.