20 Understanding High Availability

This chapter describes Oracle CEP components and design patterns that you can use to increase the availability of your Oracle CEP applications, including:

For more information on how to implement a particular high availability quality of service, see Chapter 21, "Configuring High Availability".

20.1 High Availability Architecture

Like any computing resource, Oracle CEP servers can be subject to both hardware and software faults that can lead to data- or service-loss. Oracle CEP high availability options seek to mitigate both the likelihood and the impact of such faults.

Oracle CEP supports an active-standby high availability architecture. This approach has the advantages of high performance, simplicity, and short failover time.

An Oracle CEP application that needs to be highly available is deployed to a group of two or more Oracle CEP server instances running in an Oracle CEP multi-server domain. Oracle CEP will automatically choose one server in the group to be the active primary. The remaining servers become active secondaries.

The primary and secondary servers are all configured to receive the same input events and process them in parallel but only the primary server outputs events to the Oracle CEP application client. Depending on the quality of service you choose, the secondary servers buffer their output events using in-memory queues and the primary server keeps the secondary servers up to date with which events the primary has already output.

Figure 20-1 shows a typical configuration.

Figure 20-1 Oracle CEP High Availability: Primary and Secondary Servers

Description of Figure 20-1 follows
Description of "Figure 20-1 Oracle CEP High Availability: Primary and Secondary Servers"

This section describes:

20.1.1 High Availability Lifecycle and Failover

Figure 20-2 shows a state diagram for the Oracle CEP high availability lifecycle. In this diagram, the state names (SECONDARY, BECOMING_PRIMARY, and PRIMARY) correspond to the Oracle CEP high availability adapter RuntimeMBean method getState return values. These states are specific to Oracle CEP.

Figure 20-2 Oracle CEP High Availability Lifecycle State Diagram

Description of Figure 20-2 follows
Description of "Figure 20-2 Oracle CEP High Availability Lifecycle State Diagram"

It is not possible to specify the server that will be the initial primary. Initially, the first server in the multi-server domain to start up becomes the primary so by starting servers in a particular order, you can influence primary selection. There is no way to force a particular, running server to become the primary. If a primary fails, and then comes back up, it will not automatically become the primary again unless the current primary fails causing a failover.

This section describes the Oracle CEP high availability lifecycle in more detail, including:

20.1.1.1 Secondary Failure

In general, when a secondary server fails, there is no effect on Oracle CEP application operation as Figure 20-3 shows. Regardless of the quality of service you choose, there are no missed or duplicate events.

Figure 20-3 Secondary Failure

Description of Figure 20-3 follows
Description of "Figure 20-3 Secondary Failure"

20.1.1.2 Primary Failure and Failover

However, when a primary server fails, as Figure 20-4 shows, Oracle CEP performs a failover that may cause missed or duplicate events, depending on the quality of service you choose.

Figure 20-4 Primary Failure and Failover

Description of Figure 20-4 follows
Description of "Figure 20-4 Primary Failure and Failover"

During failover, Oracle CEP automatically selects a new primary and the new primary transitions from the SECONDARY state to the BECOMING_PRIMARY state. Depending on the quality of service you choose, the new primary will not transition to PRIMARY state until a configurable readiness threshold is met. For details, see the specific quality of service option in Section 20.2, "Choosing a Quality of Service".

20.1.1.3 Rejoining the High Availability Multi-Server Domain

When a new Oracle CEP server is added to an Oracle CEP high availability multi-server domain or an existing failed server restarts, the server will not have fully joined the Oracle CEP high availability deployment and notification groups until all applications deployed to it have fully joined. The type of application determines when it can be said to have fully joined.

If the application must generate exactly the same sequence of output events as existing secondaries (a Type 1 application), then it must be able to rebuild its internal state by processing input streams for some finite period of time (the warm-up-window period). This warm-up-window time determines the minimum time it will take for the application to fully join the Oracle CEP high availability deployment and notification groups.

If the application does not need to generate exactly the same sequence of output events as existing secondaries (a Type 2 application), then it does not require a warm-up-window time and will fully join the Oracle CEP high availability deployment and notification groups as soon as it is deployed.

For more information, see Section 20.3.2.5, "Choose an Adequate warm-up-window Time".

20.1.2 Deployment Group and Notification Group

All the servers in the multi-server domain belong to the same deployment group: this is the group to which you deploy an application. For the purposes of Oracle CEP high availability, you must deploy the same application to all the servers in this group.

By default, all the servers in the multi-server domain also belong to the same notification group. The servers listen to the notification group for membership notifications that indicate when a server has failed (and exited the group) or resumed operation (and rejoined the group), as well as for synchronization notifications from the primary.

If you need to scale your Oracle CEP high availability application, you can use the ActiveActiveGroupBean to define a notification group that allows two or more servers to function as a primary server unit while retaining the convenience of a single deployment group that spans all servers (primaries and secondaries).

You must use Oracle Coherence-based clustering to create the multi-server domain deployment group. You may use either default groups or custom groups.

For more information, see:

20.1.3 High Availability Components

To implement Oracle CEP high availability options, you configure your Event Processing Network (EPN) with a high availability input adapter after each input adapter and a high availability output adapter before each output adapter.

Figure 20-5 shows a typical EPN with all possible high availability adapters in place.

Figure 20-5 High Availability Adapters in the EPN

Description of Figure 20-5 follows
Description of "Figure 20-5 High Availability Adapters in the EPN"

Note:

For simplicity, Figure 20-5 does not show channels and shows only one processor. However, the EPN may be arbitrarily complex with multiple input streams and output streams, channels, multiple processors, event beans, and so on. The only restriction is that each input adapter must be followed by a high availability input adapter and each output adapter must be preceded by a high availability output adapter. Similarly, for simplicity, a multi-server domain of only two Oracle CEP servers is shown but you may have an arbitrary number of secondary servers.

The optional high availability input adapter in the primary communicates with the corresponding high availability input adapters in each secondary to normalize event timestamps.

Oracle CEP high availability provides one type of high availability input adapter. See Section 20.1.3.1, "High Availability Input Adapter".

The high availability output adapter in the primary is responsible for outputting events to the output streams that connect the Oracle CEP application to its downsteam client. The high availability output adapter in the primary also communicates with the corresponding high availability output adapters in each secondary, and, depending on the high availability quality of service you choose, may instruct the secondary output adapters to trim their in-memory queues of output events.

Oracle CEP high availability provides the following high availability output adapters:

Oracle CEP high availability also provides a notification groups Spring bean to increase scalability in JMS applications. See Section 20.1.3.5, "ActiveActiveGroupBean".

Which adapter you choose is determined by the high availability quality of service you choose. See Section 20.2, "Choosing a Quality of Service".

20.1.3.1 High Availability Input Adapter

The optional Oracle CEP high availability input adapter on the primary Oracle CEP server assigns a time (in nanoseconds) to events as they arrive at the adapter and forwards the time values assigned to events to all secondary servers. This ensures that all servers running the application use a consistent time value (and generate the same results) and avoids the need for distributed clock synchronization.

Since a time value is assigned to each event before the event reaches any downstream channels in the EPN, downstream channels should be configured to use application time so that they do not assign a new time value to events as they arrive at the channel.

Input events must have a key that uniquely identifies each event in order to use this adapter.

You can configure the Oracle CEP high availability input adapter to send heartbeat events.

The Oracle CEP high availability input adapter is applicable to all high availability quality of service options. However, because the high availability input adapter increases performance overhead, it is not appropriate for some high availability quality of service options (such as Section 20.2.1, "Simple Failover" and Section 20.2.2, "Simple Failover with Buffering"). For these options, you should instead consider using application time with some incoming event property.

For more information, see:

20.1.3.2 Buffering Output Adapter

The Oracle CEP high availability buffering output adapter implements a buffered queue trimming strategy. The buffer is a sliding window of output events from the stream. The size of the window is measured in milliseconds.

The Oracle CEP high availability buffering output adapter is applicable to simple failover and simple failover with buffering high availability quality of service options.

For more information, see:

20.1.3.3 Broadcast Output Adapter

The Oracle CEP high availability broadcast output adapter implements a distributed queue trimming strategy. The active primary instance broadcasts messages to the active secondary instances in the notification group telling them when to trim their local representation of the queue.

The Oracle CEP high availability broadcast output adapter is applicable to the light-weight queue trimming high availability quality of service option.

For more information, see:

20.1.3.4 Correlating Output Adapter

The Oracle CEP high availability correlating output adapter correlates two event streams, usually from JMS. This adapter correlates an inbound buffer of events with a second source of the same event stream, outputting the buffer if correlation fails after a configurable time interval. Correlated events are trimmed from the queue. Correlated events are assumed to be in-order.

The Oracle CEP high availability correlating output adapter is applicable to precise recovery with JMS high availability quality of service option.

For more information, see:

20.1.3.5 ActiveActiveGroupBean

The com.oracle.cep.cluster.hagroups.ActiveActiveGroupBean is a Spring bean that allows you to partition an input stream from a JMS input adapter.

This component is applicable to precise recovery with JMS high availability quality of service only. However, it can also be used without high availability to increase Oracle CEP application scalability.

For more information, see:

20.1.4 High Availability and Scalability

If you need to scale your Oracle CEP high availability application, you can use the ActiveActiveGroupBean to define a notification group that allows two or more servers to function as a high availability unit while retaining the convenience of a single deployment group that spans all servers (primaries and secondaries).

Figure 20-6 shows three Oracle CEP application scenarios progressing from the simplest configuration, to high availability, and then to both high availability and scalability.

Figure 20-6 High Availability and Scalability

Description of Figure 20-6 follows
Description of "Figure 20-6 High Availability and Scalability"

Most applications begin in a single-server domain without high availability. In this, the simplest scenario, an Oracle CEP application running on one Oracle CEP server processes an input event stream and produces output events.

In the high availability scenario, the Oracle CEP application has been configured to use Oracle CEP high availability options. This application is deployed to the deployment group of a multi-server domain composed of two servers. In this scenario, only the primary server outputs events.

In the high availability and scalability scenario, the Oracle CEP high availability application has been configured to use the ActiveActiveGroupBean to define notification groups. Each notification group contains two or more Oracle CEP servers that function as a single, high availability unit. In this scenario, only the primary server in each notification group outputs events. Should the primary server in a notification group go down, an Oracle CEP high availability fail over occurs and a secondary server in that notification group is declared the new primary and resumes outputting events according to the Oracle CEP high availability quality of service you configure.

For more information, see:

20.1.5 High Availability and Oracle Coherence

Oracle CEP high availability options depend on Oracle Coherence. You cannot implement Oracle CEP high availability options without Oracle Coherence.

When considering performance tuning, be sure to evaluate your Oracle Coherence configuration in addition to your Oracle CEP application.

For more information, see:

20.2 Choosing a Quality of Service

Using Oracle CEP high availability, you may choose any of the quality of service options that Table 20-1 lists. Choose the quality of service option that suits your application's tolerance for missed and duplicate events as well as expected event throughput. Note that primary and secondary server hardware requirements increase as the quality of service becomes more precise.

Table 20-1 Oracle CEP High Availability Quality of Service

High Availability Option Missed Events? Duplicate Events? Performance Overhead

Section 20.2.1, "Simple Failover"

Yes (many)

Yes (few)

Negligible

Section 20.2.2, "Simple Failover with Buffering"

Yes (few)Foot 1 

Yes (many)

Low

Section 20.2.3, "Light-Weight Queue Trimming"

No

Yes (few)

Low-MediumFoot 2 

Section 20.2.4, "Precise Recovery with JMS"

No

No

High


Footnote 1 If you configure a big enough buffer then there will be no missed events.

Footnote 2 The performance overhead is tunable. You can adjust the frequency of trimming to reduce the overhead, but incur a higher number of duplicates at failover.

20.2.1 Simple Failover

This high availability quality of service is characterized by the lowest performance overhead (fastest recovery time) and the least data integrity (both missed events and duplicate events are possible during failover).

The primary server outputs events and secondary servers simply discard their output events; they do not buffer output events. If the current active primary fails, a new active primary is chosen and begins sending output events once it is notified.

During failover, many events may be missed or duplicated by the new primary depending on whether it is running ahead of or behind the old primary, respectively.

During the failover window, events may be missed. For example, if you are processing 100 events per second and failover takes 10 s then you miss 1000 events

The new primary enters the PRIMARY state immediately. There is no configurable readiness threshold that must be met before the new primary transitions out of the BECOMING_PRIMARY state.

When an Oracle CEP server rejoins the multi-server domain, it is immediately available as a secondary.

To implement this high availability quality of service, you configure your EPN with a high availability buffering output adapter (with a sliding window of size zero) before each output adapter. To reduce performance overhead, rather than use a high availability input adapter, use application time with some incoming event property.

For more information, see Section 21.1.1, "How to Configure Simple Failover".

20.2.2 Simple Failover with Buffering

This high availability quality of service is characterized by a low performance overhead (faster recovery time) and increased data integrity (no missed events but many duplicate events are possible during failover).

The primary server outputs events and secondary servers buffer their output events. If the current active primary fails, a new active primary is chosen and begins sending output events once it is notified.

During the failover window, events may be missed. For example, if you are processing 100 events per second and failover takes 10 s then you miss 1000 events. If the secondary buffers are large, a significant number of duplicates may be output. On the other hand, a larger buffer reduces the chances of missed messages.

When an Oracle CEP server rejoins the multi-server domain, if your application is an Oracle CEP high availability Type 1 application (the application must generate exactly the same sequence of output events as existing secondaries), it must wait the warm-up-window-length time you configure for the Oracle CEP high availability output adapter before it is available as a secondary.

To implement this high availability quality of service, you configure your EPN with a high availability buffering output adapter (with a sliding window of size greater than zero) before each output adapter. To reduce performance overhead, rather than use a high availability input adapter, use application time with some incoming event property.

For more information, see:

20.2.3 Light-Weight Queue Trimming

This high availability quality of service is characterized by a low performance overhead (faster recovery time) and increased data integrity (no missed events but a few duplicate events are possible during failover).

The active primary communicates to the secondaries the events that it has actually processed. This enables the secondaries to trim their buffer of output events so that it contains only those events that have not been sent by the primary at a particular point in time. Because events are only trimmed after they have been sent by the current primary, this allows the secondary to avoid missing any output events when there is a failover.

The frequency with which the active primary sends queue trimming messages to active secondaries is configurable:

  • Every n events (n>0)

    This limits the number of duplicate output events to at most n events at failover.

  • Every n milliseconds (n>0)

The queue trimming adapter requires a way to identify events consistently among the active primary and secondaries. The recommended approach is to use application time to identify events, but any key value that uniquely identifies events will do.

The advantage of queue trimming is that output events are never lost. There is a slight performance overhead at the active primary, however, for sending the trimming messages that need to be communicated and this overhead increases as the frequency of queue trimming messages increases.

During failover, the new primary enters the BECOMING_PRIMARY state and will not transition into the PRIMARY state until its event queue (that it was accumulating as a secondary) has been flushed. During this transition, new input events are buffered and some duplicate events may be output.

When an Oracle CEP server rejoins the multi-server domain, if your application is an Oracle CEP high availability Type 1 application (an application that must generate exactly the same sequence of output events as existing secondaries), it must wait the warm-up-window-length time you configure for the Oracle CEP high availability output adapter before it is available as a secondary.

To implement this high availability quality of service, you configure your EPN with a high availability input adapter after each input adapter and a high availability broadcast output adapter before each output adapter.

For more information, see Section 21.1.3, "How to Configure Light-Weight Queue Trimming".

20.2.4 Precise Recovery with JMS

This high availability quality of service is characterized by a high performance overhead (slower recovery time) and maximum data integrity (no missed events and no duplicate events during failover).

This high availability quality of service is compatible with only JMS input and output adapters.

In this high availability quality of service, we are not concerned with transactional guarantees along the event path for a single-server but in guaranteeing a single output from a set of servers. To achieve this, secondary servers listen, over JMS, to the event stream being published by the primary. As Figure 20-7 shows, this incoming event stream is essentially a source of reliable queue-trimming messages that the secondaries use to trim their output queues. If JMS is configured for reliable delivery we can be sure that the stream of events seen by the secondary is precisely the stream of events output by the primary and thus failover will allow the new primary to output precisely those events not delivered by the old primary.

Figure 20-7 Precise Recovery with JMS

Description of Figure 20-7 follows
Description of "Figure 20-7 Precise Recovery with JMS"

During failover, the new primary enters the BECOMING_PRIMARY state and will not transition into the PRIMARY state its event queue (that it was accumulating as a secondary) has been flushed. During this transition, new input events are buffered and no duplicate events are output.

When an Oracle CEP server rejoins the multi-server domain, if your application is an Oracle CEP high availability Type 1 application (the application must generate exactly the same sequence of output events as existing secondaries), it must wait the warm-up-window-length time you configure for the Oracle CEP high availability output adapter before it is available as a secondary.

To implement this high availability quality of service, you configure your EPN with a high availability input adapter after each input adapter and a high availability correlating output adapter before each output adapter.

To increase scalability, you can also use the cluster groups bean with high availability quality of service.

For more information, see:

20.3 Designing an Oracle CEP Application for High Availability

Although you can implement Oracle CEP high availability declaratively, to fully benefit from the high availability quality of service you choose, you must design your Oracle CEP application to take advantage of the high availability options that Oracle CEP provides.

When designing your Oracle CEP application for high availability, consider the following :

20.3.1 Primary Oracle CEP High Availability Use Case

You can adapt Oracle CEP high availability options to various Oracle CEP application designs but in general, Oracle CEP high availability is designed for the following use case:

  • An application receives input events from one or more external systems.

  • The external systems are publish-subscribe style systems that allow multiple instances of the application to connect simultaneously and receive the same stream of messages.

  • The application does not update any external systems in a way that would cause conflicts should multiple copies of the application run concurrently.

  • The application sends output events to an external downstream system. Multiple instances of the application can connect to the downstream system simultaneously, although only one instance of the application is allowed to send messages at any one time.

Within these constraints, the following different cases are of interest:

  • The application is allowed to skip sending some output events to the downstream system when there is a failure. Duplicates are also allowed.

    For this case, the following Oracle CEP high availability quality of service options are applicable:

  • The application is allowed to send duplicate events to the downstream system, but must not skip any events when there is a failure.

    For this case, the following Oracle CEP high availability quality of service options are applicable:

  • The application must send exactly the same stream of messages/events to the downstream system when there is a failure, modulo a brief pause during which events may not be sent when there is a failure.

    For this case, the following Oracle CEP high availability quality of service options are applicable

20.3.2 High Availability Design Patterns

When designing your Oracle CEP application for use with Oracle CEP high availability options, observe the following design patterns:

20.3.2.1 Select the Minimum High Availability Your Application can Tolerate

Be sure that the extra cost of precise recovery (per-node throughput decrease) is actually necessary for your application.

20.3.2.2 Use Oracle CEP High Availability Components at All Ingress and Egress Points

You must use an Oracle CEP high availability input adapter after each regular input adapter and you must use an Oracle CEP high availability output adapter before each regular output adapter.

20.3.2.3 Only Preserve What You Need

Most Oracle CEP systems are characterized by a large number of raw input events being queried to generate a smaller number of “enriched” events. In general it makes sense to only try and preserve these enriched events – both because there are fewer of them and because they are more valuable.

20.3.2.4 Limit Oracle CEP Application State

Oracle CEP systems allow you to query windows of events. It can be tempting to build systems using very large windows, but this increases the state that needs to be rebuilt when failure occurs. In general it is better to think of long-term state as something better kept in stable storage, such as a distributed cache or a database – since the high availability facilities of these technologies can be appropriately leveraged.

20.3.2.5 Choose an Adequate warm-up-window Time

When a new Oracle CEP server is added to an Oracle CEP high availability multi-server domain or an existing failed server restarts, the server will not have fully joined the Oracle CEP high availability deployment and notification groups until all applications deployed to it have fully joined. The type of application determines when it can be said to have fully joined.

Oracle CEP high availability applications can be described as Type 1 or Type 2 applications as Table 20-2 shows.

Table 20-2 Oracle CEP High Availability Application Types

Application Type Must generate exactly the same sequence of output events? Must be able to rebuild internal state by processing input streams within a finite period of time? Must wait this period of time before it has fully joined?

Type 1

Yes

Yes

Yes

Type 2

No

No

No


For more information, see Section 20.1.1.3, "Rejoining the High Availability Multi-Server Domain".

20.3.2.5.1 Type 1 Applications

A Type 1 application requires the new secondary to generate exactly the same sequence of output events as existing secondaries once it fully joins the Oracle CEP high availability deployment and notification groups.

It is a requirement that a Type 1 application be able to rebuild its internal state by processing its input streams for some finite period of time (warm-up-window time), after which it generates exactly the same stream of output events as other secondaries running the application.

The warm-up-window time is configured on an Oracle CEP high availability output adapter. The warm-up-window time length is specified in terms of seconds or minutes. For example, if the application contains Oracle CQL queries with range-based windows of 5, 7, and 15 minutes then the minimum warm-up-window time is 15 minutes (the maximum range-based window size). Oracle recommends that the maximum window length be padded with a few minutes time, as well, to absolutely ensure that the necessary state is available. So, in the previous example 17 minutes or even 20 minutes would be a good length for the warm-up-window time.

The Oracle CEP server uses system time during the warm-up-window time period, so it is not directly correlated with the application time associated with events being processed.

Type 1 applications must only be interested in events that occurred during a finite amount of time. All range-based Oracle CQL windows must be shorter than the warm-up-window time and tuple-based windows must also be qualified by time. For example, the application should only care about the last 10 events if they occurred within the last five minutes. Applications that do not have this property cannot be Type 1 applications and cannot use the warm-up-window period. For example, an application that uses an tuple-based partitioned window that has no time qualification cannot use the warm-up-window period, since an arbitrary amount of time is required to rebuild the state of the window.

If a Type 1 application uses the Oracle CEP high availability broadcast output adapter, it may trim events using a unique application-specific key, or a monotonic key like application time. Trimming events using application time is encouraged as it is more robust and less susceptible to bugs in the application that may cause an output event to fail to be generated.

For more information, see:

20.3.2.5.2 Type 2 Applications

A Type 2 application does not require the new secondary to generate exactly the same sequence of output events as existing secondaries once it fully joins the Oracle CEP high availability deployment and notification groups. It simply requires that the new cluster member generate valid output events with respect to the point in time at which it begins processing input events.

A Type 2 application does not require a warm-up-window period.

Most applications will be Type 2 applications. It is common for an application to be brought up at an arbitrary point in time (on the primary Oracle CEP server), begin processing events from input streams at that point, and generate valid output events. In other words, the input stream is not paused while the application is started and input events are constantly being generated and arriving. It is reasonable to assume that in many cases a secondary node that does the same thing, but at a slightly different time, will also produce output events that are valid from the point of view of the application, although not necessarily identical to those events produced by the primary because of slight timing differences.For example, a financial application that only runs while the market is open might operate as a Type 2 application as follows: all servers can be brought up before the market opens and will begin processing incoming events at the same point in the market data stream. Multiple secondaries can be run to protect against failure and as long as the number of secondaries is sufficient while the market is open, there is no need to restart any secondaries that fail nor add additional secondaries, so no secondary needs to recover state.

20.3.2.6 Ensure Applications are Idempotent

You should be able to run two copies of an application on different servers and they should not conflict in a shared cache or database. If you are using an external relation (such as a cache or table), then you must ensure that when a Oracle CEP server rejoins the cluster, your application is accessing the same cache or table as before: it must be joining against the same external relation again. The data source defined on the server must not have been changed; must ensure you're pulling data from same data source.

20.3.2.7 Source Event Identity Externally

Many high availability solutions require that events be correlated between different servers and to do this events need to be universally identifiable. The best way to do this is use external information – preferably a timestamp – to seed the event, rather than relying on the Oracle CEP system to provide this.

For more information, see Section 20.3.3.6, "Prefer Application Time".

20.3.2.8 Understand the Importance of Event Ordering

For Oracle CEP high availability quality of service options that use queue trimming, not only must primary and secondary servers generate the same output events, but they must also generate them in exactly the same order.

Primary and secondary servers must generate the same output events and in exactly the same order when you choose Oracle CEP high availability quality of service options that use queue trimming and equality-based event identify (that is, nonmonotonic event identifiers - event identifiers that do not increase continually). In this case, generating output events in different orders can lead to either missed output events or unnecessary duplicate output events when there is a failure

Consider the output event streams shown in Figure 20-8. The primary has output events a, b, and c. After outputting event c, the primary sends the secondary a queue trimming message.

The secondary trims all events in its queue generated prior to event c including event c itself. In this case, the set of events trimmed will be {a, b, e, d, c} which is wrong because the primary has not yet output events d and e. If a failover occurs after processing the trimming message for event c, events will be lost.

To manage event ordering, consider the following design patterns:

20.3.2.8.1 Prefer Deterministic Behavior

In order for an application to generate events in the same order when run on multiple instances, it must be deterministic. The application must not rely on things like:

  • Random number generator that may return different results on different machines.

  • Methods like System.getTimeMillis or System.nanoTime which can return different results on different machines because the system clocks are not synchronized.

20.3.2.8.2 Avoid Multithreading

Because thread scheduling algorithms are very timing dependent, multithreading can be a source of nondeterministic behavior in applications. That is, different threads can be scheduled at different times on different machines.

For example, avoid creating an EPN in which multiple threads send events to an Oracle CEP high availability adapter in parallel. If such a channel is an event source for an Oracle CEP high availability adapter, it would cause events to be sent to the adapter in parallel by different threads and could make the event order nondeterministic.

For more information on channel configuration to avoid, see:

20.3.2.8.3 Prefer Monotonic Event Identifiers

Event identifiers may be monotonic or nonmontonic.

A monotonic identifier is one that increases continually (such as a time value).

A nonmonotonic identifier does not increase continually and may contain duplicates.

In general, you should design your Oracle CEP application using monotonic event identifiers. Using a monotonic event identifier, the Oracle CEP high availability adapter can handle an application that may produce events out of order.

20.3.2.9 Write Oracle CQL Queries with High Availability in Mind

Not all Oracle CQL query usage is supported when using Oracle CEP high availability. You may need to redefine your Oracle CQL queries to address these restrictions.

For more information, see Section 20.3.3, "Oracle CQL Query Restrictions".

20.3.2.10 Avoid Coupling Servers

The most performant high availability for Oracle CEP systems is when servers can run without requiring coordination between them. Generally this can be achieved if there is no shared state and the downstream system can tolerate duplicates. Increasing levels of high availability are targeted at increasing the fidelity of the stream of events that the downstream system sees, but this increasing fidelity comes with a performance penalty.

20.3.2.11 Plan for Server Recovery

When a secondary server rejoins the multi-server domain, the server must have time to rebuild the Oracle CEP application state to match that of the current primary and active secondaries as Section 20.3.2.5, "Choose an Adequate warm-up-window Time" describes.

The time it takes for a secondary server to become available as an active secondary after rejoining the multi-server domain will be a factor in the number of active secondaries you require.

If a secondary is declared to be the new primary before it is ready, the secondary will throw an exception.

20.3.3 Oracle CQL Query Restrictions

When writing Oracle CQL queries in an Oracle CEP application that uses Oracle CEP high availability options, observe the following restrictions:

For more information on Oracle CQL, see the Oracle Complex Event Processing CQL Language Reference.

20.3.3.1 Range-Based Windows

In a Type 1 application (where the application must generate exactly the same sequence of output events as existing secondaries), all range-based Oracle CQL windows must be shorter than the warm-up-window time. See also Section 20.3.2.5, "Choose an Adequate warm-up-window Time".

Channels must use application time if Oracle CQL queries contain range-based Windows. See also Section 20.3.3.6, "Prefer Application Time".

For more information, see "Range-Based Stream-to-Relation Window Operators" in the Oracle Complex Event Processing CQL Language Reference.

20.3.3.2 Tuple-Based Windows

In a Type 1 application (where the application must generate exactly the same sequence of output events as existing secondaries), all tuple-based windows must also be qualified by time. See also Section 20.3.2.5, "Choose an Adequate warm-up-window Time".

For more information, see "Tuple-Based Stream-to-Relation Window Operators" in the Oracle Complex Event Processing CQL Language Reference.

20.3.3.3 Partitioned Windows

Consider avoiding partitioned windows: there are cases where a partition cannot be rebuilt. If using partitioned windows, configure a warm-up-window time long enough to give the Oracle CEP server time to rebuild the partition. See also Section 20.3.2.5, "Choose an Adequate warm-up-window Time".

For more information, see "Partitioned Stream-to-Relation Window Operators" in the Oracle Complex Event Processing CQL Language Reference.

20.3.3.4 Sliding Windows

Oracle CQL queries should not use sliding windows if new nodes that join the multi-server domain are expected to generate exactly the same output events as existing nodes.

For more information, see:

20.3.3.5 DURATION Clause and Non-Event Detection

You must use application time if Oracle CQL queries contain a DURATION clause for non-event detection.

For more information, see:

20.3.3.6 Prefer Application Time

In Oracle CEP each event is associated with a point in time at which the event occurred. Oracle CQL recognizes two types of time:

  • Application time: a time value assigned to each event outside of Oracle CQL by the application before the event enters the Oracle CQL processor.

  • System time: a time value associated with an event when it arrives at the Oracle CQL processor, essentially by calling System.nanoTime().

Application time is generally the best approach for applications that need to be highly available. The application time is associated with an event before the event is sent to Oracle CEP, so it is consistent across active primary and secondary instances. System time, on the other hand, can cause application instances to generate different results since the time value associated with an event can be different on each instance due to system clocks not being synchronized.

You can use system time for applications whose Oracle CQL queries do not use time-based windows. Applications that use only event-based windows depend only on the arrival order of events rather than the arrival time, so you may use system time in this case.

If you must use system time with Oracle CQL queries that do use time-based windows, then you must use a special Oracle CEP high availability input adapter that intercepts incoming events and assigns a consistent time that spans primary and secondary instances.