2 Getting to Know Artifacts in Oracle Stream Analytics

Stream Analytics has various artifacts like connections, references, streams, targets, and many more. You create pipelines using these artifacts.

2.1 Understanding Different Types of Connections

A connection is a very basic artifact and the first entity that you need to create in the Catalog. It is a collection of metadata (such as URLs, credential and the like) required to connect to an external system. A connection is the basis for creation of sources (Streams, References or Geo Fences) and Targets.

It is possible to reuse the same connection to create multiple sources and/or targets. In other words, it can be reused to access different resources in the same system: for example different Kafka topics in the same Kafka cluster, or different database tables in the same Oracle database.

Kafka Connection

A Kafka connection has just a single parameter, the Zookeeper server URL above all the standard properties (name, description, tags) of catalog objects.

The Zookeper URL is of the format host:port. If the port is not provided by the user, the system will assume the default Zookeeper port, i.e. 2181. Authentication to Kafka is not supported in this release.

Oracle Database Connection

To connect to an Oracle database, you must provide the following parameters:

  • Connect using
  • Service name/SID

  • hostname

  • port

  • username

  • password

Oracle Coherence Connection

Oracle Stream Analytics can use Oracle Coherence cache as a reference to look up data to enrich a stream that is being processed.

To connect to an Oracle Coherence connection, you must provide the following parameters:

  • Server Url(s)

JNDI Connection

Oracle Stream Analytics can use JNDI as a source of streaming data.

To connect to an JNDI connection, you must provide the following parameters:

  • JNDI Provider

  • Server Url(s)

  • Username

  • Password

  • Jndi Other Properties

Druid Connection

Oracle Stream Analytics can use Druid as a source of streaming data.

A Druid connection has just a single parameter, the Zookeeper server URL.

2.2 Understanding Streams

stream is a source of dynamic data. The data is flowing, it is not static or frozen. For example, stock prices of a particular company can be considered as a stream as the data arrives in every second or even more frequently. Another example of streaming data is the position (geographical location) of vehicles (e.g. trucks) which again can change continuously as each vehicle is moving. Each vehicle reports its own position to a central system periodically, e.g. every second, and the central system receives the position messages as a stream.

Streams can be transmitted using different network protocols, messaging systems as well as using many different message formats.

For more information on the different types of streams that you can create, refer to Creating a Stream in the Using Oracle Stream Analytics guide.

To create a Kafka stream, you must create a Kafka connection first, and then select that connection in the stream creation wizard. In addition to the connection, the user needs to specify the Kafka topic that represents the stream of data.

2.3 Understanding References

reference is a source of static data that provides contextual information about the event data. There are several different types of references, such as a reference to a database table, or to coherence cache.

References are used to enrich data that arrives from a Stream. For example, an Order stream contains order events and each event contains a product Id and a customer Id. Assume that there are two database tables, each containing information about the products and the customers, respectively. After creating two references, one for the products table and one for the customer table, Oracle Stream Analytics can use these references to enrich the incoming streams with information from these tables, such as product name, manufacturer, customer name, address, etc.

If references take their data from a database table, a caching mechanism can be applied. (You can use the Coherence cache directly.) By turning on caching (a configuration option of the reference), it is possible to add a caching layer in between the pipeline and the database table. This improves the performance of accessing static data, at the price of higher memory consumption by the pipeline. Once the data is loaded into cache, the reference fetches data from the cache only. Any update on the reference table does not take effect if expiration policy is set to "Never".

Note:

To create a Database reference, you should first create a Database connection. Similarly, you have to create a Coherence connection to be able to create a Coherence reference.

2.4 Understanding Geo Fences

A geo fence is a virtual boundary in a real world geographical area. This virtual boundary can be used to find object position with respect to the geo fence.

For example, the object position can be:

  • Near to geo fence

  • Exit geo fence

  • Based on Stay Duration in geo fence

  • Enters geo fence

  • Present inside geo fence

2.4.1 What is a Manual Geo Fence?

User-created geo fences are called as manual geo fences. You can create, edit, and update manual geofence using the built-in map editor. Only polygon geo fences are allowed.

2.4.2 What is a Database-based Geo Fence?

Geo fences for which you import geometry from database are known as database-based geo fences. Geo fence geometry can be seen in Geo Fence Editor. The standard create, delete and update operations using geo fence editor are not allowed in database-based geo fence. Polygon and circular geo fences are supported.

2.5 Understanding Pipelines

A pipeline defines the data processing logic and is a sequence of pipeline stages. A stage can be one of the following types – Query, Pattern, Rule, Query Group, Custom, Scoring, Target.

A pipeline starts with a stream stage, which is the only default stage. You cannot remove the default stage. A stage can have one or more children of any type such as Query, Pattern, Rule, and so on. That is, the pipeline does not have to be linear. You can have multiple branches in accordance with your use case logic. Optionally, a branch can end with one or more targets. You cannot add other stages to a target.

You can edit any stage in your draft pipeline at any time. The changes affecting downstream stages are propagated automatically.

Draft Pipelines

Pipelines in the draft state possess the following characteristics:

  • Are visible only to the owner

  • Can be edited

  • Works only when the pipeline editor is open. When you exit the Pipeline editor or close your browser, the draft pipeline is removed from the Spark cluster.

  • Do not send events to a downstream target even if a target is configured

A newly created pipeline is in draft state. This is where you can explore your streams and implement the business logic. You do not have to do the implementation all at once; the pipeline will not run between your editing sessions.

Published pipelines

Pipelines in the published state possess the following characteristics:

  • Are visible to any user

  • Cannot be edited

  • Will continue to run in the Spark cluster even after you exit the pipeline

  • Send events to a downstream target

After you are done with the implementation and satisfied, you can add a target and publish your pipeline. The published pipeline runs continually on the Spark Cluster.

If you want to edit a published pipeline, you must unpublish it first.

2.6 Understanding Dashboards

Dashboards are a collection of inter-related visualizations based on a common underlying theme. For example, a Sales dashboard shows observations related to various sales-related activities such as quarterly sales, prospective customers, and so on. Dashboards enables the users to have a single page view of all the important and correlated analysis that provides meaningful insights and assists in the decision making process.

A dashboard is first class citizen of the catalog. Any user with appropriate privileges can build the dashboards just by assembling the outcomes of the different pipeline stages without writing a single line of code. Dashboards display the live data.

Oracle Stream Analytics consists of more than one data pipeline stages where the outcome of each stage serves as input to the next stage. At the end of every stage, supporting inline visualizations enable you to visualize the result of the active stage. In Oracle Stream Analytics, you can visualize the outcome of the various application stages at a single place.

Combining dashboards with cube, users can create visualizations based on the data exploration activities performed in the analytics section of Oracle Stream Analytics. You can include these visualizations in the dashboards. With the dashboard feature, user can create mashup like showcases where both operational and analytics visualizations are intermixed to present the complete picture of the underlying business operation.

2.7 Understanding Cubes

A cube is a data structure that helps you quickly analyze data related to business problems on multiple dimensions. Oracle Stream Analytics cubes are powered by Druid, which is a distributed, in-memory OLAP data store.

Oracle Stream Analytics pipelines enable users to perform realtime data processing on the streaming data. Whereas, cube is the mechanism by which users can perform interactive analysis on the historical data. For this purpose, the pipeline outputs the processed data into the Kafka streams which in turn feeds the cube. Using cubes, users can carry out univariate, bivariate, and multivariate data analysis. Cube enables the users to carry out data exploration on the historical data with a rich set of 30 visualizations. These visualizations ranges from simple table, line, bar to the advanced visualizations such as sankey, boxplot, maps, and so on. Users can save the result of these cube explorations and make them available on dashboards that are embedded with Oracle Stream Analystics visualizations. This collaboration of dashboards and cubes serves both operational and strategic analytics needs of the business users. The visualizations available in the cubes also have a rich set of look and feel related properties for enhancing the value of the results of exploratory data analysis.

2.8 Understanding Stream Analytics Patterns

The visual representation of the event stream varies from one pattern type to another based on the key fields you choose.

A pattern provides you with the results displayed in a live output stream based on common business scenarios.

The following table lists the categories of patterns:

Category Pattern

Enrichment

Reverse Geo Code: Near By

Left Outer Join

Outlier

Fluctuation

Inclusion

Union

Left Outer Join

Missing Event

'A' Not Followed by 'B'

Detect Missing Event

Spatial

Proximity: Stream with Geo Fence

Geo Fence

Spatial: Speed

Interaction: Single Stream

Reverse Geo Code: Near By

Geo Code

Spatial: Point to Polygon

Interaction: Two Stream

Proximity: Two Stream

Direction

Reverse Geo Code: Near By Place

Proximity: Single Stream

Geo Filter

Filter

Eliminate Duplicates

Fluctuation

State

'A' Not Followed by 'B'

Inverse W

Detect Missing Event

W

'A' Followed by 'B'

Finance

Inverse W

W

Trend

'A' Not Followed by 'B

Top N

Change Detector

Up Trend

Detect Missing Event

Down Trend

'A' Followed by 'B'

Detect Duplicates

Bottom N

Shape Detector

Inverse W

W

Statistical

Correlation

Quantile

2.8.1 What is the Spatial: Speed Pattern?

This pattern lets you get the output average speed over the selected window range of a moving object.

2.8.2 What is the Geo Code Pattern?

When analyzing data, you may encounter situations where you need to obtain the latitude and longitude of a moving object based on street address, zip code, address, etc.

You can use this pattern to get geographic coordinates (like latitude and longitude) for an address.

2.8.3 What is the Interaction: Single Stream Pattern?

Two shapes are said to interact with each other if any part of the shape overlaps.

The Single Stream pattern lets you get interaction of an object with every other object in the same stream.

2.8.4 What is the Interaction: Two Stream Pattern?

If two shapes interact, the distance between them is zero.

You can use the Two Stream pattern to get interaction of an object in one stream with objects in another stream.

2.8.5 What is the Spatial: Point to Polygon Pattern?

Using this pattern you can get an object shape based on geographical coordinates, fixed length and breadth of and object.

For example, if you know the length and breadth of a ship, you can get the shape or geofence of the ship using its position coordinates, where the coordinates keep changing as the ship moves.

2.8.6 What is the Proximity: Single Stream Pattern?

You can use this pattern to get proximity of each object with every other object in a stream.

For example, if there is a single stream of flying airplanes and the distance buffer is specified as 1000 meters, the output in the table shows planes that are less than 1000 meters apart.

2.8.7 What is the Proximity: Two Stream Pattern?

You can use this pattern to get the proximity between objects of two streams.

The distance buffer acts as a filter in this pattern stage. For example, if there is a driver and passenger stream, you can get the proximity of each passenger with every other driver using a filter criteria of ‘within a distance of 1 km’.

2.8.8 What is the Proximity: Stream with Geo Fence Pattern?

You can use this pattern to get proximity of an object with a virtual boundary or geo fence.

For example, if you have certain stores in the city of California, you can send promotional messages as soon as the customer comes into a proximity of 1000 meters from any of the stores.

2.8.9 What is the Direction Pattern?

You can use this pattern to get the direction of a moving object.

For example, you can evaluate the direction of a moving truck.

2.8.10 What is the Geo Fence Pattern?

You can use this pattern when you want to track the relation of an object with a virtual boundary called geo fence.

Relations can be Enter, Exit, Stay, or Near with respect to a geo fence. For example, you can trigger an alert when an object enters the geo fence. You can also analyze a stream containing geo-location data. It helps in determining how events are related to a polygon in a geo fence.

The geo-location can be:

  • Near to Geo Fence

  • Exiting Geo Fence

  • Staying within Geo Fence for a specified duration

  • Entering Geo Fence

2.8.11 What is the Geo Fence Filter: Inside Pattern?

You can use this pattern to track objects inside one or more geo fences.

For example, if users move from one geographical location to another, you can send promotional messages to the users when they are inside a specified geo fence.

2.8.12 What is the Reverse Geo Code: Near By Pattern?

You can use this to obtain nearest place for the specified latitude/longitude or geographical coordinates.

2.8.13 What is the Reverse Geo Code: Near By Place Pattern?

This pattern lets you obtain the near by location with granular information like city, country, street etc. for the specified latitude and longitude.

2.8.14 What is the Correlation Pattern?

You can use this pattern if you need to identify correlation between two numeric parameters. An output of 0 implies no correlation, +1 is positive correlation, and -1 implies negative correlation.

2.8.15 What is the Quantile Pattern?

You should use this pattern if you need to calculate the value of quantile function. For example, when asked for the 3rd quantile of student scores, it could return a value of 80 to imply 75% of students scored less than 80.

2.8.16 What is the Detect Duplicates Pattern?

The Detect Duplicates pattern detects duplicate events in your stream according to the criteria you specify and within a specified time window. Events may be partially or fully equivalent to be considered duplicates.

You can use this pattern to understand how many duplicate events your stream has. For example, when you suspect that your aggregates are offset, you may want to check your stream for duplicate events.

2.8.17 What is the Change Detector Pattern?

The Change Detector pattern looks for changes in the values of your event fields and reports the changes once they occur within a specified range window. For example, and events arrives with value value1 for field field1. If any of the following incoming events within a specified range window contains a value different from value1, an alert is triggered. You can designate more than one field to look for changes.

You can use it when you need to be aware of changes in a normally stable value. For example, a sensor reading that is supposed to be the same for certain periods of time and changes in readings may indicate issues.

The default configuration of this pattern stage is to alert on change of any selected fields.

2.8.18 What is the W Pattern?

The W pattern, also known as a double bottom chart pattern, is used in the technical analysis of financial trading markets.

You can use this pattern to detect when an event data field value rises and falls in “W” fashion over a specified time window. For example, use this pattern when monitoring a market data feed stock price movement to determine a buy/sell/hold evaluation.

2.8.19 What is the ‘A’ Followed by ‘B’ Pattern?

The 'A' Followed by 'B' pattern looks for particular events following one another and will output an event when the specified sequence of events occurs.

You can use it when you need to be aware of a certain succession of events happening in your flow. For example, if an order status BOOKED is followed by an order status SHIPPED (skipping status PAID), you need to raise an alert.

2.8.20 What is the Top N Pattern?

The Top N pattern will output N events with highest values from a collection of events arriving within a specified time window sorted not in the default order of arrival but the way you specify.

You can use it to get the highest values of fields in your stream within a specified time window. For example, use it to get N highest values of pressure sensor readings.

2.8.21 What is the Bottom N Pattern?

The Bottom N pattern will output N events with lowest values from a collection of events arriving within a specified time window sorted not in the default order of arrival but the way you specify.

You can use it to get the lowest values of fields in your stream within a specified time window. For example, use it to get N lowest values of pressure sensor readings.

2.8.22 What is the Up Trend Pattern?

The Up Trend pattern detects a situation when a numeric value goes invariably up over a period of time.

You can use the pattern if you need to detect situations of a constant increase in one of your numeric values. For example, detect a constant increase in pressure from one of your sensors.

2.8.23 What is the ‘A’ Not Followed by ‘B’ Pattern?

The 'A' Not Followed by 'B' pattern will look for a missing second event in a particular combination of events and will output the first event when the expected second event does not arrive within the specified time period.

You can use it when you need to be aware of a specific event not following its predecessor in your flow. For example, if an order status BOOKED is not followed by an order status PAID within a certain time period, you may need to raise an alert.

2.8.24 What is the Down Trend Pattern?

The Down Trend pattern detects a situation when a numeric value goes invariably down over a period of time.

You can use this pattern if you need to detect situations of a constant reduction in one of your numeric values. For example, detect a constant drop in pressure from one of your sensors.

2.8.25 What is the Union Pattern?

The Union pattern merges two streams with identical shapes into one.

You can use this pattern if you have two streams with identical shapes that you want to merge into one, for example when you have two similar sensors sending data into two different streams, and you want to process the streams simultaneously, in one pipeline.

2.8.26 What is the Fluctuation Pattern?

You can use this pattern to detect when an event data field value changes in a specific upward or downward fashion within a specific time window. For example, use this pattern to identify the variable changes in an Oil Pressure value are maintained within acceptable ranges.

2.8.27 What is the Inverse W Pattern?

The Inverse W pattern, also known as a double top chart pattern, is used in the technical analysis of financial trading markets.

You can use this pattern when you want to see the financial data in a graphical form.

2.8.28 What is the Eliminate Duplicates Pattern?

The Eliminate Duplicates pattern looks for duplicate events in your stream within a specified time window and removes all but the first occurrence. A duplicate event is an event that has one or more field values identical to values of the same field(s) in another event. It is up to you to specify what fields are analyzed for duplicate values. You can configure the pattern to compare just one field or the whole event.

You can use it to get rid of noise in your stream. If you know that your stream contains duplicates that might offset your aggregates, such as counts, use the Eliminate Duplicates pattern to cleanse your data.

2.8.29 What is the Detect Missing Event Pattern?

The Detect Missing Event pattern discovers simple situations when an expected event is missing.

You can use this pattern if you need to detect missing events in your feed. For example, you have a feed when multiple sensors send their readings every 5 seconds. Use this pattern to detect sensors that have stopped sending their readings, which may indicate that the sensor is broken or there is no connection to the sensor.

2.8.30 What is the Left Outer Join Pattern?

The Left Outer join pattern joins your flow with another stream or a reference using the left outer join semantics.

You can use this pattern to join a stream or a reference using the left outer join semantics. The result of this pattern always contains the data of the left table even if the join-condition does not find any matching data in the right table.

2.9 Understanding Shapes

A shape is the format of the data. In Oracle Stream Analytics, each message (or event, in stream processing terminology) in a Stream or Target must have the same format and this format must be specified when creating the Stream or Target. You can think of the shape as the streaming analogy of the database table structure for static data. Each shape consists of a number of fields and each field has a name and a data type. In the Stream creation wizard, it is possible to assign an alias to a field, so that the field can later be referenced by this user-given alias.

Assume that the stream contains data about orders. In this case, the shape may contain the following fields: an order id of type string, a customer id of type integer, product id of type integer, a quantity of type integer and a unit price of type Number.

2.10 Understanding Target

A target represents an external system where the results of the stream processing is being directed to. Just like streams, targets are the links to the outside world. Streams are the input to a pipeline, whereas targets are the output. While a pipeline can consume and process multiple streams.

It can have no target, but that configuration does not really make sense, as the purpose of creating a pipeline is to process streaming data and direct the output to an external system, i.e a target.

JMS target type

To create JMS type target, you must provide the following parameters:

  • Connection

  • Jndi name

  • Data Format

Kafka target type

To create Kafka type target, you must provide the following parameters:

  • Connection

  • Topic name

  • Data Format

REST target type

To create REST type target, you must provide the following parameters:

  • URL

  • Custom HTTP headers

  • Batch processing

  • Data Format

2.11 Understanding the Predictive Model

The predictive model is an algorithm that you apply to streaming data to predict outcomes. In Oracle Stream Analytics, a predictive model is a PMML file, that you upload and store in the system.

Oracle Stream Analytics supports PMML versions 3.0, 3.1, 3.2, 4.0, and 4.1. In a pipeline, you use a predictive model in a scoring stage to do probability scoring.

2.12 Understanding Custom JARs

A custom JAR is a Oracle Stream Analytics catalog artifact. It’s a JAR file containing a custom event-processing algorithm implemented in Java, or a POJO file in Java..

You need to create your own custom event processing logic if you cannot build this logic using the Oracle Stream Analytics web tooling. This may be required in complex use cases in particular industries. Generally, you can address complex use cases with patterns. A custom JAR contains one or more custom stage types and/or one or more custom functions. You can use a custom stage type in the Custom Stage, and you can use custom functions in calculated field expressions. For details on how to build custom JARs, see Oracle Stream Analytics Developer’s Guide.

2.13 Understanding Export and Import

The export and import feature lets you migrate your pipeline and its contents between Oracle Stream Analytics systems (such as development and production) in a matter of few clicks. You also have the option to migrate only select artifacts.

To export and import a pipeline, you need admin privileges. The export will result in a ZIP file containing the metadata for the catalog object being exported along with all its dependencies. You can import only ZIP files that you previously exported using this export / import functionality. Any other type of files won’t be imported successfully.

Note:

  • You cannot import a pipeline developed with earlier versions of Oracle Stream Analytics.

  • On reimport, the existing metadata is overwritten with the newly imported metadata.

  • Artifacts of a published pipeline can’t be overwritten. You must first unpublish the pipeline and then retry.