1 Introduction to Oracle CQL

Oracle Continuous Query Language (Oracle CQL) is a SQL based query language, with added constructs that support streaming data. Using Oracle CQL, you can express queries on data streams using GoldenGate Stream Analytics (GGSA).

1.1 Fundamentals of Oracle CQL

Databases are best equipped to run queries over finite stored data sets. However, many modern applications require long-running queries over continuous unbounded sets of data. By design, a stored data set is appropriate when significant portions of the data are queried repeatedly and updates are relatively infrequent. In contrast, data streams represent data that is changing constantly, often exclusively through insertions of new elements. It is either unnecessary or impractical to operate on large portions of the data multiple times.

Many types of applications generate data streams as opposed to data sets, including sensor data applications, financial tickers, network performance measuring tools, network monitoring and traffic management applications, and clickstream analysis tools. Managing and processing data for these types of applications involves building data management and querying capabilities with a strong temporal focus.

To address this requirement, Oracle introduces GoldenGate Stream Analytics (GGSA), a data management infrastructure that supports the notion of streams of structured data records together with stored relations.

To provide a uniform declarative framework, Oracle offers Oracle Continuous Query Language (Oracle CQL), a query language based on SQL with added constructs that support streaming data.

Oracle CQL is designed to be:

  • Scalable with support for a large number of queries over continuous streams of data and traditional stored data sets.

  • Comprehensive to deal with complex scenarios. For example, through composability, you can create various intermediate views for querying.

Using GGSA, you can define event adapters for a variety of data sources including JMS, relational database tables, and files in the local file system. You can connect multiple input channels to an Oracle CQL processor and you can connect an Oracle CQL processor to multiple output channels. You can connect an output channel to another Oracle CQL processor, to an adapter, to a cache, or an event Bean.

For more information on these elements, see:

1.1.1 Streams and Relations

Using Oracle CQL, you can perform the following operations with streams and relations:

1.1.1.1 Streams

A stream is the principle source of data that Oracle CQL queries act on.

Stream S is a bag (or multi-set) of elements (s,T) where s is in the schema of S and T is in the time domain.

Stream elements are tuple-timestamp pairs, which can be represented as a sequence of timestamped tuple insertions. In other words, a stream is a sequence of timestamped tuples. There could be more than one tuple with the same timestamp. The tuples of an input stream are required to arrive at the system in the order of increasing timestamps. For more information, see Time.

A stream has an associated schema consisting of a set of named attributes, and all tuples of the stream conform to the schema.

The term "tuple of a stream" denotes the ordered list of data portion of a stream element, excluding timestamp data (the s of <s,t>). The following example shows how a stock ticker data stream might appear, where each stream element is made up of <timestamp value>, <stock symbol>, and <stock price>:

...
<timestampN>    NVDA,4
<timestampN+1>  ORCL,62
<timestampN+2>  PCAR,38
<timestampN+3>  SPOT,53
<timestampN+4>  PDCO,44
<timestampN+5>  PTEN,50
...

In the stream element <timestampN+1>  ORCL,62, the tuple is ORCL,62.

By definition, a stream is unbounded.

1.1.1.2 Relations and GoldenGate Stream Analytics Tuple Kind Indicator

By default, GGSA includes time stamp and a GGSA tuple kind indicator in the relations it generates.

Timestamp   Tuple Kind  Tuple
 1000:      +           ,abc,abc
 2000:      +           hihi,abchi,hiabc
 6000:      -           ,abc,abc
 7000:      -           hihi,abchi,hiabc
 8000:      +           hi1hi1,abchi1,hi1abc
 9000:      +           ,abc,abc
13000:      -           hi1hi1,abchi1,hi1abc
14000:      -           ,abc,abc
15000:      +           xyzxyz,abcxyz,xyzabc
20000:      -           xyzxyz,abcxyz,xyzabc

The GGSA tuple kind indicators are:

  • + for inserted tuple

  • - for deleted tuple

  • U for updated tuple

1.1.2 Stream-to-Relation Operators (Windows)

Oracle CQL supports stream-to-relation operations based on a sliding window. In general, S[W] is a relation. At time T the relation contains all tuples in window W applied to stream S up to T.

Queries that have the same source (stream) and window specifications are optimized by the system to share common memory space. When a new query is added with these parameters, it automatically receives the content (events) of this shared window. This optimization can cause the query to output initial events even though it might not have received newly added events.

window_type::=

Figure 1-1 window_type

window type

Oracle CQL supports the following built-in window types:

  • Range: time-based

    S[Range T], or, optionally,

    S[Range T1 Slide T2]

  • Range: time-based unbounded

    S[Range Unbounded]

  • Range: time-based now

    S[Now]

  • Range: constant value

    S[Range C on ID]

  • Tuple-based:

    S[Rows N], or, optionally,

    S[Rows N1 Slide N2]

  • Partitioned:

    S[Partition By A1 ... Ak Rows N] or, optionally,

    S[Partition By A1 ... Ak Rows N Range T], or

    S[Partition By A1 ... Ak Rows N Range T1 Slide T2]

This section describes the following stream-to-relation operator properties:

For more information, see:

1.1.2.1 Range, Rows, and Slide

The keywords Range and Rows specify how much data you want to query:

  • Range specifies as many tuples as arrive in a given time period

  • Rows specifies a number of tuples

The Slide keyword specifies how frequently you want to see output from the query, while the Range keyword specifies the time range from which to query events. Using Range and Slide together results in a set of events from which to query, with that set changing based on where the query window slides to.

So the set time is the time from which events get drawn for the query.So the time interval is the actual amount of time (as measured by event timestamps) divided by the amount of time specified for sliding. If the remainder from this is 0, then the set time is the time interval multiplied by the amount of time specified for the slide. If the remainder is greater than 0, then the set time is the time interval + 1 multiplied by the amount of time specified for the slide.

Another way to express this is the following formula: timeInterval = actualTime / slideSpecification if((actualTime % slideSpecification) == 0) // No remainder setTime = timeInterval * slideSpecification else setTime = (timeInterval + 1) * slideSpecification.

In Figure 1-2, the Range specification indicates "I want to look at 4 seconds worth of data" and the Slide specification indicates "I want a result every 4 seconds". In this case, the query returns a result at the end of each Slide specification (except for certain conditions, as Range, Rows, and Slide at Query Start-Up and for Empty Relations describes).

Figure 1-2 Range and Slide: Equal (Steady-State Condition)

Description of Figure 1-2 follows
Description of "Figure 1-2 Range and Slide: Equal (Steady-State Condition)"

In Figure 1-2, the Range specification indicates "I want to look at 8 seconds worth of data" and the Slide specification indicates "I want a result every 4 seconds". In this case, the query returns a result twice during each Range specification (except for certain conditions, as Range, Rows, and Slide at Query Start-Up and for Empty Relations describes)

Figure 1-3 Range and Slide: Different (Steady-State Condition)

Description of Figure 1-3 follows
Description of "Figure 1-3 Range and Slide: Different (Steady-State Condition)"

Table 1-1 lists the default Range, Range unit, and Slide (where applicable) for range-based and tuple-based stream-to-relation window operators:

Table 1-1 Default Range and Tuple-Based Stream-to-Relation Operators

Window Operator Default Range Default Range Unit Default Slide

Range-Based Stream-to-Relation Window Operators

Unbounded

seconds

1 nanosecond

Tuple-Based Stream-to-Relation Window Operators

N/A

N/A

1 tuple

1.1.2.1.1 Range, Rows, and Slide at Query Start-Up and for Empty Relations

Table 1-2 lists the behavior of Range, Rows, and Slide for special cases such as query start-up time and for an empty relation.

Table 1-2 Range, Rows, and Slide at Query Start-Up and Empty Relations

Operator or Function Result

COUNT(*) or COUNT(expression)

Immediately returns 0 for an empty relation (when there is no GROUP BY), before Range or Rows worth of data has accumulated and before the first Slide.

SUM(attribute) and other aggregate functions

Immediately returns null for an empty relation, before Range or Rows worth of data has accumulated and before the first Slide.

For more information and detailed examples, see:

1.1.2.2 Partition

The keyword Partition By logically separates an event stream S into different substreams based on the equality of the attributes given in the Partition By specification. For example, the S[Partition By A,C Rows 2] partition specification creates a sub-stream for every unique combination of A and C value pairs and the Rows specification is applied on these sub-streams. The Rows specification indicates "I want to look at 2 tuples worth of data".

For more information, see Range, Rows, and Slide.

1.1.2.3 Default Stream-to-Relation Operator

When you reference a stream in an Oracle CQL query where a relation is expected (most commonly in the from clause), a Range Unbounded window is applied to the stream by default. For example, the queries in the following examples are identical:

<query id="q1"><![CDATA[ 
    select * from InputChannel
]]></query>
<query id="q1"><![CDATA[ 
    IStream(select * from InputChannel[RANGE UNBOUNDED])
]]></query>

1.1.3 Stream-to-Stream Operators

Typically, you perform stream to stream operations using the following:

For more information, see:

In addition, Oracle CQL supports the following direct stream-to-stream operator:

  • MATCH_RECOGNIZE: use this clause to write various types of pattern recognition queries on the input stream. For more information, see Pattern Recognition.

1.1.4 Pattern Recognition

The Oracle CQL MATCH_RECOGNIZE construct is the principle means of performing pattern recognition.

A sequence of consecutive events or tuples in the input stream, each satisfying certain conditions constitutes a pattern. The pattern recognition functionality in Oracle CQL allows you to define conditions on the attributes of incoming events or tuples and to identify these conditions by using String names called correlation variables. The pattern to be matched is specified as a regular expression over these correlation variables and it determines the sequence or order in which conditions should be satisfied by different incoming tuples to be recognized as a valid match.

For more information, see Pattern Recognition With MATCH_RECOGNIZE.

1.1.5 Functions

Functions are similar to operators in that they manipulate data items and return a result. Functions differ from operators in the format of their arguments. This format enables them to operate on zero, one, two, or more arguments:

function(argument, argument, ...) 

A function without any arguments is similar to a pseudocolumn (refer to Pseudocolumns). However, a pseudocolumn typically returns a different value for each tuple in a relation, whereas a function without any arguments typically returns the same value for each tuple.

Oracle CQL provides a wide variety of built-in functions to perform operations on stream data, including:

  • single-row functions that return a single result row for every row of a queried stream or view

  • aggregate functions that return a single aggregate result based on group of tuples, rather than on a single tuple

  • single-row statistical and advanced arithmetic operations based on the Colt open source libraries for high performance scientific and technical computing.

  • aggregate statistical and advanced arithmetic operations based on the Colt open source libraries for high performance scientific and technical computing.

  • statistical and advanced arithmetic operations based on the java.lang.Math class

If you call an Oracle CQL function with an argument of a data type other than the data type expected by the Oracle CQL function, then GGSA attempts to convert the argument to the expected data type before performing the Oracle CQL function.

Oracle CQL provides a variety of built-in single-row functions and aggregate functions based on the Colt open source libraries for high performance scientific and technical computing. The functions which are available as part of Colt library will not support Big Decimal data type and NULL input values. Also the value computation of the functions are not incremental. See the COLT website for details.

Note:

Function names are case sensitive:

  • Built-in functions: lower case.

For more information, see:

1.1.6 Time

Timestamps are an integral part of a GGSA stream. However, timestamps do not necessarily equate to clock time. For example, time may be defined in the application domain where it is represented by a sequence number. Timestamps need only guarantee that updates arrive at the system in the order of increasing timestamp values.

Note that the timestamp ordering requirement is specific to one stream or a relation. For example, tuples of different streams could be arbitrarily interleaved. The order of processing tuples with the same time-stamps is not guaranteed in the case where multiple streams are processing. In addition, there is no defined behavior for negative timestamps. For t = 0, the event will be outputted immediately, assuming total order.

GGSA can observe the processing time and event time.

For system timestamped relations or streams, time is dependent upon the arrival of data on the relation or stream data source. GGSA generates a heartbeat on a system timestamped relation or stream if there is no activity (no data arriving on the stream or relation's source) for more than a specified time: for example, 1 minute. Either the relation or stream is populated by its specified source or GGSA generates a heartbeat every minute. This way, the relation or stream can never be more than 1 minute behind.

For system timestamped streams and relations, the system assigns time in such a way that no two events have the same value of time. However, for application timestamped streams and relations, events could have same value of time.

1.2 Oracle CQL and SQL Standards

Oracle CQL is a new technology but it is based on a subset of SQL99.

Oracle strives to comply with industry-accepted standards and participates actively in SQL standards committees. Oracle is actively pursuing Oracle CQL standardization.