15 Pattern Recognition With MATCH_RECOGNIZE

The Oracle CQL MATCH_RECOGNIZE construct and its sub-clauses perform pattern recognition in Oracle CQL queries.

15.1 Understanding Pattern Recognition With MATCH_RECOGNIZE

Pattern recognition functionality is provided in Oracle CQL using the MATCH_RECOGNIZE construct.

A sequence of consecutive events or tuples in the input stream, each satisfying certain conditions constitutes a pattern. The pattern recognition functionality in Oracle CQL allows you to define conditions on the attributes of incoming events or tuples and to identify these conditions by using String names called correlation variables (Section 15.3, "DEFINE Clause"). The pattern to be matched is specified as a regular expression over these correlation variables and it determines the sequence or order in which conditions should be satisfied by different incoming tuples to be recognized as a valid match (see Section 15.7, "PATTERN Clause"). The use of regular expressions lends increased expressibility while specifying the pattern to be recognized. You can also perform computations over the attributes of the tuples that match the pattern specification and use them in the SELECT clause of the query of which MATCH_RECOGNIZE is a part (see Section 15.5, "MEASURES Clause"). Additional clauses such as ALL MATCHES, PARTITION BY, and DURATION give you more control over the way the pattern recognition is performed over the input stream.

The output of a MATCH_RECOGNIZE query is always a stream.

pattern_recognition_clause::=

(pattern_partition_clause::=, pattern_measures_clause::=, pattern_def_dur_clause::=)

pattern_def_dur_clause::=

(pattern_clause::=, pattern_skip_match_clause::=, pattern_definition_clause::=, duration_clause::=, subset_clause::=)

Example 15-1 shows a typical MATCH_RECOGNIZE condition in a query. The query will return the MEASURES clause values in its select statement when DEFINE clause conditions are satisfied as constrained by the PATTERN clause.

Example 15-1 Pattern Matching Conditions

<query id="detectPerish"><![CDATA[ 
  select its.itemId
  from tkrfid_ItemTempStream MATCH_RECOGNIZE (
      PARTITION BY itemId
      MEASURES A.itemId as itemId
      PATTERN (A B* C)
      DEFINE
          A  AS  (A.temp >= 25),
          B  AS  ((B.temp >= 25) and (to_timestamp(B.element_time) - to_timestamp(A.element_time) < INTERVAL "0 00:00:05.00" DAY TO SECOND)),
          C  AS  (to_timestamp(C.element_time) - to_timestamp(A.element_time) >= INTERVAL "0 00:00:05.00" DAY TO SECOND)
  ) as its
]]></query>

The MATCH_RECOGNIZE construct provides the following sub-clauses:

Section 15.2, "ALL MATCHES Clause"
Section 15.3, "DEFINE Clause"
Section 15.4, "DURATION Clause"
Section 15.5, "MEASURES Clause"
Section 15.6, "PARTITION BY Clause"
Section 15.7, "PATTERN Clause"
Section 15.8, "SUBSET Clause"

For more examples, see Section 15.9, "Examples".

15.2 ALL MATCHES Clause

Use this optional clause to configure Oracle CEP to match overlapping patterns. Omitting the ALL MATCHES clause configures Oracle CEP to match only one pattern.

pattern_skip_match_clause::=

Consider the query tkpattern_q41 in Example 15-2 that uses ALL MATCHES and the data stream tkpattern_S11 in Example 15-3. Stream tkpattern_S11 has schema (c1 integer, c2 integer). The query returns the stream in Example 15-4.

The query tkpattern_q41 in Example 15-2 will report a match when the input stream values, when plotted, form the shape of the English letter W. The relation in Example 15-4 shows an example of overlapping instances of this W-pattern match.

There are two types of overlapping pattern instances:

Total: Example of total overlapping: Rows from time 3000-9000 and 4000-9000 in the input, both match the given pattern expression. Here the longest one (3000-9000) will be preferred if ALL MATCHES clause is not present.
Partial: Example of Partial overlapping: Rows from time 12000-21000 and 16000-23000 in the input, both match the given pattern expression. Here the one which appears earlier is preferred when ALL MATCHES clause is not present. This is because when ALL MATCHES clause is omitted, we start looking for the next instance of pattern match at a tuple which is next to the last tuple in the previous matched instance of the pattern.

Example 15-2 ALL MATCHES Clause Query

<query id="tkpattern_q41"><![CDATA[ 
    select 
        T.firstW, T.lastZ 
    from 
        tkpattern_S11 
    MATCH_RECOGNIZE ( 
        MEASURES A.c1 as firstW, last(Z.c1) as lastZ 
        ALL MATCHES 
        PATTERN(A W+ X+ Y+ Z+) 
        DEFINE 
            W as W.c2 < prev(W.c2), 
            X as X.c2 > prev(X.c2), 
            Y as Y.c2 < prev(Y.c2), 
            Z as Z.c2 > prev(Z.c2)
    ) as T
]]></query>

Example 15-3 ALL MATCHES Clause Stream Input

Timestamp   Tuple
 1000         1,8
 2000         2,8
 3000         3,8
 4000         4,6
 5000         5,3
 6000         6,7
 7000         7,6
 8000         8,2
 9000         9,6
10000        10,2
11000        11,9
12000        12,9
13000        13,8
14000        14,5
15000        15,0
16000        16,9
17000        17,2
18000        18,0
19000        19,2
20000        20,3
21000        21,8
22000        22,5
23000        23,9
24000        24,9
25000        25,4
26000        26,7
27000        27,2
28000        28,8
29000        29,0
30000        30,4
31000        31,4
32000        32,7
33000        33,8
34000        34,6
35000        35,4
36000        36,5
37000        37,1
38000        38,7
39000        39,5
40000        40,8
41000        41,6
42000        42,6
43000        43,0
44000        44,6
45000        45,8
46000        46,4
47000        47,3
48000        48,8
49000        49,2
50000        50,5
51000        51,3
52000        52,3
53000        53,9
54000        54,8
55000        55,5
56000        56,5
57000        57,9
58000        58,7
59000        59,3
60000        60,3

Example 15-4 ALL MATCHES Clause Stream Output

Timestamp   Tuple Kind  Tuple
 9000:      +      3,9
 9000:      +      4,9
11000:      +      6,11
11000:      +      7,11
19000:      +     12,19
19000:      +     13,19
19000:      +     14,19
20000:      +     12,20
20000:      +     13,20
20000:      +     14,20
21000:      +     12,21
21000:      +     13,21
21000:      +     14,21
23000:      +     16,23
23000:      +     17,23
28000:      +     24,28
30000:      +     26,30
38000:      +     33,38
38000:      +     34,38
40000:      +     36,40
48000:      +     42,48
50000:      +     45,50
50000:      +     46,50

As Example 15-4 shows, the ALL MATCHES clause reports all the matched pattern instances on receiving a particular input. For example, at time 20000, all of the tuples {12,20}, {13,20}, and {14,20} are output.

15.3 DEFINE Clause

Use this clause to define one or more conditions on the tuples of the underlying base stream. You refer to the conditions by using correlation names and variables such as A, B, and C in Example 15-1. You specify the pattern to be recognized as a regular expression over these correlation variables in the PATTERN clause of MATCH_RECOGNIZE condition(Section 15.7, "PATTERN Clause").

On receiving a new tuple from the base stream, the conditions of the correlation variables that are relevant at that point in time are evaluated. A tuple is said to have matched a correlation variable if it satisfies its defining condition. It is straight forward to see that a particular input can match zero, one or more correlation variables. The relevant conditions to be evaluated on receiving an input are determined by logic governed by the PATTERN clause regular expression and the state in pattern recognition process that we have reached after processing the earlier inputs.

The condition can refer to any of the attributes of the schema of the stream or view that evaluates to a stream on which the MATCH_RECOGNIZE clause is being applied.

You can refer to the attributes of a base stream directly, such as c1 < 20, or using a correlation variable (condition), such as A.c1 < 20, while defining the correlation variables.

When you refer to the attributes directly, a tuple that last matched any of the correlation variables is consulted for evaluation. Note that the definitions DEFINE A as c1 < 20 and DEFINE A as A.c1 < 20 both refer to c1 in the same tuple which is the latest input tuple. This is because on receiving an input we evaluate the condition of a correlation variable assuming that the latest input matches that correlation variable.

When you refer to the attributes using a correlation variable, such as A, the tuple which matched A last is consulted for evaluation.

A definition of one correlation variable can refer to another correlation variable, such as DEFINE B as A.c1 > 20. Here A.c1 refers the value of c1 in the tuple that last matched A. Using the PREV function, you can refer to the earlier tuples that matched the same correlation variable. For more information, see "prev".

You can also use functions over the correlation variables while defining them.

pattern_definition_clause::=

(non_mt_corrname_definition_list::=)

non_mt_corrname_definition_list::=

(correlation_name_definition::=)

correlation_name_definition::=

(correlation_name::=, non_mt_cond_list::=)

In Example 15-1, the pattern_definition_clause is:

DEFINE
    A  AS  (A.temp >= 25),
    B  AS  ((B.temp >= 25) and (to_timestamp(B.element_time) - to_timestamp(A.element_time) < INTERVAL "0 00:00:05.00" DAY TO SECOND)),
    C  AS  (to_timestamp(C.element_time) - to_timestamp(A.element_time) >= INTERVAL "0 00:00:05.00" DAY TO SECOND)

If you specify a correlation name that is not defined in the DEFINE clause, it is considered to be true for every input. As Example 15-5 shows, because correlation name A is true for every input, it is not defined in the DEFINE clause. It is an error to define a correlation name which is not used in a PATTERN clause.

Example 15-5 Undefined Correlation Name

<query id="q"><![CDATA[ 
    SELECT
        T.firstW,
        T.lastZ
    FROM
        S2
    MATCH_RECOGNIZE (
        MEASURES
            A.c1 as firstW,
            last(Z) as lastZ
        PATTERN(A W+ X+ Y+ Z+)
        DEFINE 
            W as W.c2 < prev(W.c2),
            X as X.c2 > prev(X.c2),
            Y as Y.c2 < prev(Y.c2),
            Z as Z.c2 > prev(Z.c2)
    ) as T
]]></query>

For more information, see Section 15.7, "PATTERN Clause".

15.4 DURATION Clause

The DURATION clause is an optional clause that you should use only when you are writing a query involving non-event detection. Non-event detection is the detection of a situation when a certain event which should have occurred in a particular time limit does not occur in that time frame.

duration_clause::=

(time_unit::=)

Using this clause, a match is reported only when the regular expression in the PATTERN clause is matched completely and no other event or input arrives until the duration specified in the DURATION clause expires. The duration is measured from the time of arrival of the first event in the pattern match.

This section describes:

Section 15.4.1, "Using the DURATION Clause for Fixed Duration Non-Event Detection"
Section 15.4.2, "Using the DURATION Clause for Recurring Non-Event Detection"

15.4.1 Using the DURATION Clause for Fixed Duration Non-Event Detection

The duration can be specified as a constant value, such as 10. Optionally, you may specify a time unit such as seconds or minutes (see time_unit::=); the default time unit is seconds.

Consider the query tkpattern_q59 in Example 15-6 that uses DURATION 10 to specify a delay of 10 s (10000 ms) and the data stream tkpattern_S19 in Example 15-7. Stream tkpattern_S19 has schema (c1 integer). The query returns the stream in Example 15-8.

Example 15-6 MATCH_RECOGNIZE with Fixed Duration DURATION Clause Query

<query id="BBAQuery"><![CDATA[ 
    select 
        T.p1, T.p2 
    from 
        tkpattern_S19 
    MATCH_RECOGNIZE ( 
        MEASURES A.c1 as p1, B.c1 as p2 
        include timer events 
        PATTERN(A B*) 
        duration 10 
        DEFINE A as A.c1 = 10, B as B.c1 != A.c1
    ) as T
]]></query>

Example 15-7 MATCH_RECOGNIZE with Fixed Duration DURATION Clause Stream Input

Timestamp   Tuple
 1000        10
 4000        22
 6000       444
 7000        83
 9000        88
11000        12
11000        22
11000        15
12000        13
15000        10
27000        11
28000        10
30000        18
40000        10
44000        19
52000        10
h 100000

Example 15-8 MATCH_RECOGNIZE with Fixed DURATION Clause Stream Output

Timestamp   Tuple Kind  Tuple
11000:      +           10,8825000:      +           10,38000:      +           10,1850000:      +           10,1962000:      +           10,

The tuple at time 1000 matches A.

Since the duration is 10 we output a match as soon as input at time 1000+10000=11000 is received (the one with the value 12). Since the sequence of tuples from 1000 through 9000 match the pattern AB* and nothing else a match is reported as soon as input at time 11000 is received.

The next match starts at 15000 with the tuple at that time matching A. The next tuple that arrives is at 27000. So here also we have tuples satisfying pattern AB* and nothing else and hence a match is reported at time 15000+10000=25000. Further output is generated by following similar logic.

For more information, see "Fixed Duration Non-Event Detection".

15.4.2 Using the DURATION Clause for Recurring Non-Event Detection

When you specify a MULTIPLES OF clause, it indicates recurring non-event detection. In this case an output is sent at the multiples of duration value as long as there is no event after the pattern matches completely.

Consider the query tkpattern_q75 in Example 15-9 that uses DURATION MULTIPLES OF 10 to specify a delay of 10 s (10000 ms) and the data stream tkpattern_S23 in Example 15-10. Stream tkpattern_S23 has schema (c1 integer). The query returns the stream in Example 15-11.

tkpattern.cqlx, /data/inpPattern23.txt, log/patternout75.txt

Example 15-9 MATCH_RECOGNIZE with Variable Duration DURATION MULTIPLES OF Clause Query

<query id="tkpattern_q75"><![CDATA[ 
    select 
        T.p1, T.p2, T.p3 
    from 
        tkpattern_S23 
    MATCH_RECOGNIZE ( 
        MEASURES A.c1 as p1, B.c1 as p2, sum(B.c1) as p3 
        ALL MATCHES 
        include timer events 
        PATTERN(A B*) 
        duration multiples of 10 
        DEFINE A as A.c1 = 10, B as B.c1 != A.c1
    ) as T
]]></query>

Example 15-10 MATCH_RECOGNIZE with Variable Duration DURATION MULTIPLES OF Clause Stream Input

Timestamp   Tuple
 1000        10
 4000        22
 6000       444
 7000        83
 9000        88
11000        12
11000        22
11000        15
12000        13
15000        10
27000        11
28000        10
30000        18
44000        19
62000        20
72000        10
h 120000

Example 15-11 MATCH_RECOGNIZE with Variable Duration DURATION MULTIPLES OF Clause Stream Output

Timestamp   Tuple Kind  Tuple
 11000:     +           10,88,637 25000:     +           10,, 38000:     +           10,18,18 48000:     +           10,19,37 58000:     +           10,19,37 68000:     +           10,20,57 82000:     +           10,, 92000:     +           10,,102000:     +           10,,112000:     +           10,,

The execution here follows similar logic to that of the example above for just the DURATION clause (see "Using the DURATION Clause for Fixed Duration Non-Event Detection"). The difference comes for the later outputs. The tuple at 72000 matches A and then there is nothing else after that. So the pattern AB* is matched and we get output at 82000. Since we have the MULTIPLES OF clause and duration 10 we see outputs at time 92000, 102000, and so on.

15.5 MEASURES Clause

Use this clause to define expressions over attributes of the tuples in the base stream that match the conditions (correlation variables) in the DEFINE clause and to alias these expressions so that they can suitably be used in the SELECT clause of the main query of which this MATCH_RECOGNIZE condition is a part. You can refer to the attributes of a base stream either directly or via a correlation variable.

You can use any of the Oracle CQL built-in or user-defined functions (see Section 1.1.9, "Functions").

pattern_measures_clause::=

(non_mt_measure_list::=)

non_mt_measure_list::=

(measure_column::=)

measure_column::=

(arith_expr::=, identifier::=)

In Example 15-1, the pattern_measures_clause is:

MEASURES
    A.itemId as itemId

When an attribute of an underlying stream or view that evaluates to a stream is referred to using a correlation variable, such as A.c1, the value of c1 is the value in the tuple that last matched the condition corresponding to correlation variable A. In case the definition of A is not provided in the DEFINE clause, it is considered as TRUE always. So effectively all the tuples in the input match to A. Therefore the value of A.c1 is the value of c1 in the last processed tuple.

15.6 PARTITION BY Clause

Use this optional clause to specify the stream attributes by which a MATCH_RECOGNIZE clause should partition its results.

pattern_partition_clause::=

(non_mt_attr_list::=)

In Example 15-1, the pattern_partition_clause is:

PARTITION BY 
    itemId

The partition by clause in pattern means the input stream is logically divided based on the attributes mentioned in the partition list and pattern matching is done within a partition.

Consider a stream S with schema (c1 integer, c2 integer) with the input data that Example 15-12 shows.

Example 15-12 Input Stream S1

Consider the MATCH_RECOGNIZE query that Example 15-13 shows.

Example 15-13 MATCH_RECOGNIZE Query Using Input Stream S1

select T.p1, T.p2, T.p3 from S MATCH_RECOGNIZE(
    MEASURES
        A.ELEMENT_TIME as p1,
        B.ELEMENT_TIME as p2
        B.c2 as p3
    PATTERN (A B)
    DEFINE
        A as A.c1 = 10,
        B as B.c1 = 20
) as T

This query would output the following:

3000:+ 2000, 3000, 2

If we add PARTITION BY c2 to the query that Example 15-13 shows, then the output would change to:

3000:+ 2000, 3000, 2
4000:+ 1000, 4000, 1

This is because by adding the PARTITION BY clause, matches are done within partition only. Tuples at 1000 and 4000 belong to one partition and tuples at 2000 and 3000 belong to another partition owing to the value of c2 attribute in them. In the first partition A matches tuple at 1000 and B matches tuple at 4000. Even though a tuple at 3000 matches the B definition, it is not presented as a match for the first partition since that tuple belongs to different partition.

15.7 PATTERN Clause

Use this clause to specify the pattern to be matched as a regular expression over correlation variables defined in the DEFINE clause (see Section 15.3, "DEFINE Clause"). However such a regular expression can contain some correlation variables that are not defined in the DEFINE clause and they are considered as always TRUE meaning they match every input.

pattern_clause::=

(regexp::=)

regexp::=

(correlation_name::=, pattern_quantifier::=)

correlation_name::=

(const_string::=)

pattern_quantifier::=

Table 15-1 lists the pattern quantifiers (pattern_quantifier::=) Oracle CQL supports. Use the pattern quantifiers to specify the behavior of pattern matches. The one-character pattern quantifiers are maximal or "greedy"; they will attempt to match as many instances of the regular expression on which they are applied as possible. The two-character pattern quantifiers are minimal or "reluctant"; they will attempt to match as few instances of the regular expression on which they are applied as possible.

Table 15-1 MATCH_RECOGNIZE Pattern Quantifiers

Maximal	Minimal	Description
`*`	`*?`	0 or more times
`+`	`+?`	1 or more times.
`?`	`??`	0 or 1 time.

An unquantified pattern (such as A) is assumed to have a quantifier that requires exactly one match.

In Example 15-1, the pattern_clause is:

PATTERN (A B* C)

This pattern clause means a pattern match will be recognized and reported when the following conditions are met by consecutive incoming input tuples:

A tuple matches the condition that defines correlation variable A followed by
Zero or more tuples that match the correlation variable B followed by
A tuple that matches correlation variable C.

While in state 2, if a tuple arrives that matches both the correlation variables B and C (since it satisfies the defining conditions of both of them) then as the quantifier * for B is greedy that tuple will be considered to have matched B instead of C. Thus due to the greedy property B gets preference over C and we match a greater number of B. Had the pattern expression be A B*? C, one that uses a lazy or reluctant quantifier over B, then a tuple matching both B and C will be treated as matching C only. Thus C would get preference over B and we will match fewer B.

15.8 SUBSET Clause

Using this clause, you can group together one or more correlation variables that are defined in the DEFINE clause. You can use this named subset in the MEASURES (see Section 15.5, "MEASURES Clause") and DEFINE (Section 15.3, "DEFINE Clause") clauses just like any other correlation variable.

subset_clause::=

(non_mt_subset_definition_list::=)

non_mt_subset_definition_list::=

(subset_definition::=)

subset_definition::=

(subset_name::=, non_mt_corr_list::=)

subset_name::=

(const_string::=)

non_mt_corr_list::=

(correlation_name::=)

Consider the query q55 in Example 15-14 and the data stream S11 in Example 15-15. Stream S11 has schema (c1 integer, c2 integer). This example defines subsets S1 through S6. This query outputs a match if the c2 attribute values in the input stream form the shape of the English letter W. Now suppose we want to know the sum of the values of c2 for those tuples which form the incrementing arms of this W. The correlation variable X represents tuples that are part of the first incrementing arm and Z represent the tuples that are part of the second incrementing arm. So we need some way to group the tuples that match both. Such a requirement can be captured by defining a SUBSET clause as the example shows.

Subset S4 is defined as (X,Z). It refers to the tuples in the input stream that match either X or Z. This subset is used in the MEASURES clause statement sum(S4.c2) as sumIncrArm. This computes the sum of the value of c2 attribute in the tuples that match either X or Z. A reference to S4.c2 in a DEFINE clause like S4.c2 = 10 will refer to the value of c2 in the latest among the last tuple that matched X and the last tuple that matched Z.

Subset S6 is defined as (Y). It refers to all the tuples that match correlation variable Y.

The query returns the stream in Example 15-16.

Example 15-14 MATCH_RECOGNIZE with SUBSET Clause Query

<query id="q55"><![CDATA[ 
    select
        T.firstW,
        T.lastZ,
        T.sumDecrArm,
        T.sumIncrArm,
        T.overallAvg 
    from 
        S11 
    MATCH_RECOGNIZE ( 
        MEASURES 
            S2.c1 as firstW, 
            last(S1.c1) as lastZ, 
            sum(S3.c2) as sumDecrArm, 
            sum(S4.c2) as sumIncrArm, 
            avg(S5.c2) as overallAvg 
        PATTERN(A W+ X+ Y+ Z+) 
        SUBSET S1 = (Z) S2 = (A) S3 = (A,W,Y) S4 = (X,Z) S5 = (A,W,X,Y,Z) S6 = (Y) 
        DEFINE 
            W as W.c2 < prev(W.c2), 
            X as X.c2 > prev(X.c2), 
            Y as S6.c2 < prev(Y.c2), 
            Z as Z.c2 > prev(Z.c2)
    ) as T
]]></query>

Example 15-15 MATCH_RECOGNIZE with SUBSET Clause Stream Input

Timestamp   Tuple
 1000        1,8
 2000        2,8
 3000        3,8
 4000        4,6
 5000        5,3
 6000        6,7
 7000        7,6
 8000        8,2
 9000        9,6
10000       10,2
11000       11,9
12000       12,9
13000       13,8
14000       14,5
15000       15,0
16000       16,9
17000       17,2
18000       18,0
19000       19,2
20000       20,3
21000       21,8
22000       22,5
23000       23,9
24000       24,9
25000       25,4
26000       26,7
27000       27,2
28000       28,8
29000       29,0
30000       30,4
31000       31,4
32000       32,7
33000       33,8
34000       34,6
35000       35,4
36000       36,5
37000       37,1
38000       38,7
39000       39,5
40000       40,8
41000       41,6
42000       42,6
43000       43,0
44000       44,6
45000       45,8
46000       46,4
47000       47,3
48000       48,8
49000       49,2
50000       50,5
51000       51,3
52000       52,3
53000       53,9
54000       54,8
55000       55,5
56000       56,5
57000       57,9
58000       58,7
59000       59,3
60000       60,3

Example 15-16 MATCH_RECOGNIZE with SUBSET Clause Stream Output

Timestamp   Tuple Kind  Tuple
 9000:      +           3,9,25,13,5.428571
21000:      +           12,21,24,22,4.6
28000:      +           24,28,15,15,6.0
38000:      +           33,38,19,12,5.1666665
48000:      +           42,48,13,22,5.0

For more information, see:

Section 15.7, "PATTERN Clause"
Section 15.5, "MEASURES Clause"
Section 15.3, "DEFINE Clause"

15.9 Examples

The following examples illustrate basic MATCH_RECOGNIZE practices:

"Pattern Detection"
"Pattern Detection With Partition By"
"Pattern Detection With Aggregates"
"Fixed Duration Non-Event Detection"

For more examples, see Oracle CEP Getting Started.

15.9.1 Pattern Detection

Consider the stock fluctuations that Figure 15-1 shows. This data can be represented as a stream of index number (or time) and stock price. Figure 15-1 shows a common trading behavior known as a double bottom pattern between days 1 and 9 and between days 12 and 19. This pattern can be visualized as a W-shaped change in stock price: a fall (X), a rise (Y), a fall (W), and another rise (Z).

Figure 15-1 Pattern Detection: Double Bottom Stock Fluctuations

Description of "Figure 15-1 Pattern Detection: Double Bottom Stock Fluctuations"

Example 15-17 shows a query q on stream S2 with schema c1 index number (or time) and c2 stock price. This query detects double bottom patterns on the incoming stock trades using the PATTERN clause (A W+ X+ Y+ Z+). The correlation names in this clause are:

A: corresponds to the start point of the double bottom pattern.

Because correlation name A is true for every input, it is not defined in the DEFINE clause. If you specify a correlation name that is not defined in the DEFINE clause, it is considered to be true for every input.
W+: corresponds to the first decreasing arm of the double bottom pattern.

It is defined by W.stockprice < Prev(W.stockprice). This definition implies that the current price is less than the previous one.
X+: corresponds to the first increasing arm of the double bottom pattern.
Y+: corresponds to the second decreasing arm of the double bottom pattern.
Z+: corresponds to the second increasing arm of the double bottom pattern.

Example 15-17 Simple Pattern Detection: Query

<query id="q"><![CDATA[ 
    SELECT
        T.firstW,
        T.lastZ
    FROM
        S2
    MATCH_RECOGNIZE (
        MEASURES
            A.c1 as firstW,
            last(Z) as lastZ
        PATTERN(A W+ X+ Y+ Z+)
        DEFINE 
            W as W.c2 < prev(W.c2),
            X as X.c2 > prev(X.c2),
            Y as Y.c2 < prev(Y.c2),
            Z as Z.c2 > prev(Z.c2)
    ) as T
]]></query>

15.9.2 Pattern Detection With Partition By

Consider the stock fluctuations that Figure 15-2 shows. This data can be represented as a stream of index number (or time) and stock price. In this case, the stream contains data for more than one stock ticker symbol. Figure 15-2 shows a common trading behavior known as a double bottom pattern between days 1 and 9 and between days 12 and 19 for stock BOFA. This pattern can be visualized as a W-shaped change in stock price: a fall (X), a rise (Y), a fall (W), and another rise (Z).

Figure 15-2 Pattern Detection With Partition By: Stock Fluctuations

Description of "Figure 15-2 Pattern Detection With Partition By: Stock Fluctuations"

Example 15-18 shows a query q on stream S2 with schema c1 index number (or time) and c2 stock price. This query detects double bottom patterns on the incoming stock trades using the PATTERN clause (A W+ X+ Y+ Z+). The correlation names in this clause are:

A: corresponds to the start point of the double bottom pattern.
W+: corresponds to the first decreasing arm of the double bottom pattern as defined by W.stockprice < Prev(W.stockprice), which implies that the current price is less than the previous one.
X+: corresponds to the first increasing arm of the double bottom pattern.
Y+: corresponds to the second decreasing arm of the double bottom pattern.
Z+: corresponds to the second increasing arm of the double bottom pattern.

The query partitions the input stream by stock ticker symbol using the PARTITION BY clause and applies this PATTERN clause to each logical stream.

Example 15-18 Pattern Detection With Partition By: Query

<query id="q"><![CDATA[ 
    SELECT
        T.firstW,
        T.lastZ
    FROM
        S2
    MATCH_RECOGNIZE (
        PARTITION BY
            A.ticker
        MEASURES
            A.c1 as firstW,
            last(Z) as lastZ
        PATTERN(A W+ X+ Y+ Z+)
        DEFINE
            W as W.c2 < prev(W.c2),
            X as X.c2 > prev(X.c2),
            Y as Y.c2 < prev(Y.c2),
            Z as Z.c2 > prev(Z.c2)
    ) as T
]]></query>

15.9.3 Pattern Detection With Aggregates

Consider the query q1 in Example 15-19 and the data stream S in Example 15-20. Stream S has schema (c1 integer). The query returns the stream in Example 15-21.

Example 15-19 Pattern Detection With Aggregates: Query

<query id="q1"><![CDATA[ 
    SELECT 
        T.sumB 
    FROM 
        S 
    MATCH_RECOGNIZE ( 
        MEASURES 
            sum(B.c1) as sumB 
        PATTERN(A B* C) 
        DEFINE 
            A as ((A.c1 < 50) AND (A.c1 > 35)), 
            B as B.c1 > avg(A.c1), 
            C as C.c1 > prev(C.c1)
    ) as T
]]></query>

Example 15-20 Pattern Detection With Aggregates: Stream Input

Timestamp   Tuple
 1000       40
 2000       52
 3000       60
 4000       58
 5000       57
 6000       56
 7000       55
 8000       59
 9000       30
10000       40
11000       52
12000       60
13000       58
14000       57
15000       56
16000       55
17000       30
18000       10
19000       20
20000       30
21000       10
22000       25
23000       25
24000       25
25000       25

Example 15-21 Pattern Detection With Aggregates: Stream Output

Timestamp   Tuple
8000        338
12000       52

15.9.4 Fixed Duration Non-Event Detection

Consider an object that moves among five different rooms. Each time it starts from room 1, it must reach room 5 within 5 minutes. Figure 15-3 shows the object's performance. This data can be represented as a stream of time and room number. Note that when the object started from room 1 at time 1, it reached room 5 at time 5, as expected. However, when the object started from room 1 at time 6, it failed to reach room 5 at time 11; it reached room 5 at time 12. When the object started from room 1 at time 15, it was in room 5 at time 20, as expected. However, when the object started from room 1 at time 23, it failed to reach room 5 at time 28; it reached room 5 at time 30. The successes at times 5 and 20 are considered devents: the arrival of the object in room 5 at the appropriate time. The failures at time 11 and 28 are considered non-events: the expected arrival event did not occur. Using Oracle CQL, you can query for such non-events.

Figure 15-3 Fixed Duration Non-Event Detection

Description of "Figure 15-3 Fixed Duration Non-Event Detection"

Example 15-22 shows query q on stream S (with schema c1 integer representing room number) that detects these non-events. Each time the object fails to reach room 5 within 5 minutes of leaving room 1, the query returns the time of departure from room 1.

Example 15-22 Fixed Duration Non-Event Detection: Query

<query id="q"><![CDATA[ 
select T.Atime FROM S
    MATCH_RECOGNIZE(
        MEASURES
            A.ELEMENT_TIME as Atime
       INCLUDE TIMER EVENTS
       PATTERN (A B*) 
       DURATION 5 MINUTES
       DEFINE
           A as A.c1 = 1,
           B as B.c1 != 5
    ) as T
]]></query>

For more information, see Section 15.4, "DURATION Clause".