Analyzing Data Streams

13 Analyzing Data Streams

This topic only applies to Data Integration Platform Cloud Classic.

With Stream Analytics, you can analyze complex event data streams that Data Integration Platform Cloud consumes using sophisticated correlation patterns, enrichment, and machine learning to provide insights and real-time business decisions.

Stream Analytics is available only for user-managed Data Integration Platform Cloud instances.

Topics:

Planning Your Stream Analytics Installation

This topic applies only to Oracle user-managed services.

Before you begin with Stream Analytics, make sure you meet these requirements.

Download the following required software to your Data Integration Platform Cloud VM:

JDK 8 Update 131+
Spark version 2.2.1 with Hadoop 2.7 or higher
OSA-18.1.0.0.1.zip

Ensure that you have Google Chrome version 60+ downloaded on your local machine.

It’s assumed that you have an Oracle Cloud account with Database Cloud Service, Data Integration Platform Cloud, and Storage instances provisioned. If not, see Create Instances for Data Integration Platform Cloud

Provisioning Oracle Event Hub and Big Data Cloud Instances

This topic applies only to Oracle user-managed services.

Using Oracle Stream Analytics with Oracle Data Integration Platform Cloud requires Oracle Event Hub and Oracle Big Data Cloud instances.

Complete the steps in the following sections to provision the necessary service instances needed to use Oracle Stream Analytics with Data Integration Platform Cloud.

Topics:

Provisioning an Oracle Event Hub — Dedicated Instance

This topic applies only to Oracle user-managed services.

Complete these steps to provision an Oracle Event Hub — Dedicated cloud instance.

To provision an Oracle Event Hub — Dedicated cloud instance:

Log in to Oracle Cloud My Services Dashboard.
Click Create Instance, then click All Services and select Event Hub — Dedicated.
On the Instance page of the Create New Instance wizard, enter an Instance Name, Description, and Notification Email, and then click Next.
On the Details page, complete the following fields, and then click Next:
1. Select Basic or Recommended for Deployment Type.
2. Click Edit for SSH Public Key and download new keys to your local machine.
3. Select Enable REST Access, and then enter the Username and Password credentials for REST Proxy Acesss.
Confirm your instance details and click Finish.
After your Event Hub — Dedicated instance has been provisioned successfully, click Manage this Service and select Access Rules.
Go to Actions and select Enable all rules.

Provisioning an Oracle Event Hub Cloud Instance

This topic applies only to Oracle user-managed services.

Complete these steps to provision an Oracle Event Hub Cloud instance, also referred to as your Kafka topic.

To provision an Oracle Event Hub Cloud instance:

Log in to Oracle Cloud My Services Dashboard.
Click Create Instance, then click All Services and select Event Hub.
On the Instance page of the Create New Instance wizard, enter an Instance Name, Description, and Notification Email.
Select your Oracle Event Hub — Dedicated instance from the Hosted On menu.
Enter the Number of Partitions and Retention Period in hours, and then click Next.
Confirm your instance details and click Create.

Provisioning an Oracle Big Data Cloud Instance

This topic applies only to Oracle user-managed services.

Complete these steps to provision an Oracle Big Data Cloud instance.

To provision an Oracle Big Data Cloud instance:

Log in to Oracle Cloud My Services Dashboard.
Click Create Instance, then click All Services and select Big Data Cloud.
On the Instance page of the Create New Instance wizard, enter an Instance Name, Description, and Notification Email, and then click Next.
On the Details page, complete the following fields, and then click Next:
1. Select Basic or Full for Deployment Profile.
2. Enter 3 or greater for Number of Nodes.
3. Select OC2m for Compute Shape.
4. Select 2.1 for Spark Version.
5. Click Edit for SSH Public Key and download the keys to your local machine.
6. Enter the Administrative User and Password details.
7. For Event Hub Cloud Service, select the instance you created previously.
8. Enter the Cloud Storage Container, Username, and Password.
Confirm your instance details and click Create.
After your Big Data Cloud instance has been provisioned successfully, click Manage this service, and then select Access Rules.
Click Actions, and select Enable all rules.

Click Create Rule and create each of the following rules:

Rule Name	Source	Destination	Destination Port	Protocol
Yarn_Web_UI	PUBLIC-INTERNET	bdcsce_Master	8088	TCP
Hadoop_Web_UI	PUBLIC-INTERNET	bdcsce_Master	50070	TCP
Access_to_HDFS	PUBLIC-INTERNET	bdcsce_Master	50075	TCP
Access_to_HDFS_CopyFile	PUBLIC-INTERNET	bdcsce_Master	8020	TCP
Yarn_RM_Rule	PUBLIC-INTERNET	bdcsce_Master	8032	TCP
Access_to_HDFS_DataNode	PUBLIC-INTERNET	bdcsce_Master	50010	TCP
Access_to_Zookeeper	PUBLIC-INTERNET	bdcsce_Master	2181	TCP
Access_to_Kafka_Broker	PUBLIC-INTERNET	bdcsce_Master	9092	TCP
Access_to_Kafka_Server	PUBLIC-INTERNET	bdcsce_Master	6667	TCP

Installing Oracle Stream Analytics

This topic applies only to Oracle user-managed services.

After you ensure that you have procured the required software and hardware and completed the procedures to set up the required environment for installation, use the following steps to install Oracle Stream Analytics 18.1.0.0.1:

Create a directory, for example, spark-downloads, and download Apache Spark into the newly created folder as specified below:
- Spark release: 2.2.1
- Package type: Pre-built for Apache Hadoop 2.7 and later
- Download Spark: spark-2.2.1-bin-hadoop2.7.tgz
Extract the Spark archive to a local directory.
You can see a subfolder, spark-2.2.1-bin-hadoop2.7.
Create a new directory, for example, OSA-downloads and download OSA-18.1.0.0.1.zip from Oracle eDelivery and extract it into the newly created folder.
You can find the OSA-18.1.0.0.1-README.txt file in the OSA-18.1.0.0.1 zip file.
Review the file OSA-18.1.0.0.1-README.txt.
Set the environment variables:
1. Set the SPARK_HOME environment variable in the OSA-18.1.0.0.0/osa-base/etc/osa-env.sh file to point to the directory where you have extracted the Spark archive. For example:
  SPARK_HOME=/products/spark-downloads/spark-2.2.1-bin-hadoop2.7
1. Set the JAVA_HOME environment variable in the OSA-18.1.0.0.0/osa-base/etc/osa-env.sh file to point to the directory where you have extracted the JDK archive. For example:
  JAVA_HOME=/products/java-downloads/jdk1.8.0_131

Configure your data source in OSA-18.1.0.0.1/osa-base/etc/jetty-osa-datasource.xml as per instructions in Initializing Metadata Store.

Uncomment and edit one of the two Data source configurations, either for Oracle Database or MySQL depending on the database you want to use as metadata store. The uncommented fragment for Oracle database is shown below:

<New id="osads" class="org.eclipse.jetty.plus.jndi.Resource">
     <Arg>
       <Ref refid="wac"/>
     </Arg>
     <Arg>jdbc/OSADataSource</Arg>
     <Arg>
       <New class="oracle.jdbc.pool.OracleDataSource">
         <Set name="URL">jdbc:oracle:thin:@myhost.example.com:1521:OSADB</Set>
         <Set name="User">{OSA_USER}</Set>
         <Set name="Password">{OSA_USER_PWD}</Set>
         <Set name="connectionCachingEnabled">true</Set>
         <Set name="connectionCacheProperties">
           <New class="java.util.Properties">
             <Call name="setProperty"><Arg>MinLimit</Arg><Arg>1</Arg></Call>
             <Call name="setProperty"><Arg>MaxLimit</Arg><Arg>15</Arg></Call>
             <Call name="setProperty"><Arg><InitialLimit</Arg><Arg>1</Arg></Call>
           </New>
         </Set>
       </New>
     </Arg>
</New>

Change the database host, port, SID, Oracle Stream Analytics schema user name and password fields marked in bold in the code in step 6a.

Configuring Stream Analytics

This topic applies only to Oracle user-managed services.

Now that you have Stream Analytics installed, there are a few post-installation tasks that you need to complete before you use Stream Analytics with Data Integration Platform Cloud.

Topics:

Initializing Metadata Store

This topic applies only to Oracle user-managed services.

After installing Oracle Stream Analytics, you need to configure the metadata store with the database admin credential details and the version of Oracle Stream Analytics as required.

Note:

If you do not have the database admin credentials, ask your database administrator to create a Oracle Stream Analytics database user and initialize the content under the user by using the SQL scripts available in the OSA-18.1.0.0.1/osa-base/sql folder. The Oracle Stream Analytics database username must match the one configured in jetty-osa-datasource.xml.

To initialize the metadata store, you need database admin credentials with sysdba privileges:

Change directory to OSA-18.1.0.0.1/osa-base/bin.
Execute the following command:
./start-osa.sh dbroot=<root user> dbroot_password=<db root password>

Note:
Ensure that you replace the database root username and password as required.
The following console messages indicates that the Oracle Stream Analytics schema is created and the metadata store is successfully initialized:
```
OSA DB user created: <User>
OSA schema version: 18.1.0.0.1
```
Note:
If you don’t see the above messages, check the OSA-18.1.0.0.1/osa-base/logs folder to identify the cause and potential solution.
Press enter on the console and run the OSA-18.1.0.0.1/osa-base/bin/stop-osa.sh file to complete the metadata store initialization.

Changing the Admin Password

This topic applies only to Oracle user-managed services.

Oracle Stream Analytics comes with a default user, osaadmin. You must create your own password to login by using that user ID.

Use the following steps to change the password:

Change directory to top-level folder OSA-18.1.0.0.1 and execute the following command:

java -cp ./lib/jetty-util-9.4.7.v20170914.jar org.eclipse.jetty.util.security.Password osaadmin <your new password>
You can see the following codes in the console:
```
2018-03-01 15:37:55.800:INFO::main: Logging initialized @117ms to org.eclipse.jetty.util.log.StdErrLog
alphago
OBF:1u9d1uv81uha1u9r1ugg1uuy1ua5
MD5:34d0a556209df571d311b3f41c8200f3
CRYPT:osX/8jafUvLwA
```
Make a note of the obfuscated password string, marked in bold in the code in step 1, and copy it to a notepad.
Connect to the Oracle Stream Analytics metadata store database by using your SQL tool such as SQL Plus or SQL Developer.
The credentials must match the ones provided in the OSA-18.1.0.0.1/osa-base/etc/jetty-osa-datasource.xml file.
Update the osa_users table by using the command below and replace the obfuscated password with the string that you copied in step 2:
```
update osa_users set pwd='<CopiedObfuscatedPasswordString>' where username='osaadmin'
```

Note:

This on-premise version of Oracle Stream Analytics doesn’t support role based access. All users have admin privileges. To create additional users with obfuscated passwords, see Adding Users.

Setting up Runtime for Oracle Stream Analytics Server

This topic applies only to Oracle user-managed services.

Before you start using Stream Analytics, you need to specify the runtime server, environment, and node details. You must do this procedure right after you launch Stream Analytics for the first time.

Change directories to OSA-18.1.0.0.0/osa-base/bin and execute the following command to start the Stream Analytics server: ./start-osa.sh
You’ll see the following message:
```
OSA schema version 18.1.0.0.0
The schema is preconfigured and current. No changes or updates are required.
```
Note:

If you don’t see this message, check the log file in OSA-18.1.0.0.0/osa-base/logs.
Set up access rules on your Data Integration Platform Cloud instance. Log in to Oracle Cloud, go to the Data Integration Platform Cloud Service Console, and then locate your Data Integration Platform Cloud instance under the Instances list.
Click Manage this service and select Access Rules for your Data Integration Platform Cloud instance.

Click Create Rule and create the following access rules:

Rule Name	Source	Destination	Destination Port	Protocol
Access_to_OSA_UI_1	PUBLIC-INTERNET	WLS_MS	9080	TCP
Access_to_OSA_UI_2	PUBLIC-INTERNET	WLS_ADMIN	9080	TCP
Access_to_OSA_UI_3	PUBLIC-INTERNET	WLS_ADMIN_HOST	9080	TCP

In your Chrome browser’s address bar, enter <Public IP of your Data Integration Platform Cloud instance>:9080/osa to access OSA’s login page.

Note:

You can find your Data Integration Platform Cloud instance’s public IP on its service instance detail page.
Enter the username and password credentials for the osaadmin user.

Note:

The password is a plain-text password.
After you’re logged in, click System Settings and complete the fields as follows, and then click Save:
1. Enter the host and port where Kafka ZooKeeper is running in the Kafka ZooKeeper Connection field.
  This is required to see live results while you design your pipeline.
2. Enter the Hadoop cluster public IP where YARN resource manager is running in the YARN Resource Manager URL field.
  
  (Check <Resource Manager public IP>:8088/cluster.)
3. Select HDFS for Storage.
4. Enter the <HDFS cluster public IP where Namenode is running>, plus a root folder.
  
  (Check <HDFS Hadoop cluster public IP>:50070. If the root folder does not exist, it will automatically be created but the user specified in the Hadoop authentication below must have write permissions.)
5. Select Simple for Hadoop Authentication.
6. Enter the username of a user who has write permissions for the folder specified in webhdfs://URL in the Username field.

Upgrading Oracle Stream Analytics

This topic applies only to Oracle user-managed services.

If you have an existing Oracle Stream Analytics 18.1.0.0.0 installation, use the following steps to upgrade to Oracle Stream Analytics 18.1.0.0.1:

Metadata Upgrade

Use these steps if there is no additional configuration apart from the data source configuration and environment variables:

Backup the existing Oracle Stream Analytics 18.1.0.0.0 metadata database, for example, use Oracle Database Backup tool.
Stop Oracle Stream Analytics 18.1.0.0.0.
Unzip Oracle Stream Analytics 18.1.0.0.1 installer.

Copy the data source configuration from existing installation to the new installation:

cp OSA-18.1.0.0.0/osa-base/etc/jetty-osa-datasource.xml OSA-18.1.0.0.1/osa-base/etc/jetty-osa-datasource.xml

Copy the environment variable osa-env.sh from existing installation to the new installation:
```
cp OSA-18.1.0.0.0/osa-base/etc/osa-env.sh OSA-18.1.0.0.1/osa-base/etc/osa-env.sh
```

Upgrade Using the Existing Jetty Configurations

Use these steps if you want to upgrade using the existing Jetty configurations:

Backup the existing Oracle Stream Analytics 18.1.0.0.0 metadata database, for example, use Oracle Database Backup tool.
Stop Oracle Stream Analytics 18.1.0.0.0.
Unzip Oracle Stream Analytics 18.1.0.0.1 installer.

Use the following set of commands:

#!/bin/bash
OSA_V18.1.0.0.0=<refers to the existing installation that you would like to update>
OSA_V18.1.0.0.1=<refers to the unzipped OSA 18.1.0.0.1 installer>
 
cp ${OSA_V18.1.0.0.1}/osa-base/bin/osa.installer.jar ${OSA_V18.1.0.0.0}/osa-base/bin/osa.installer.jar
cp ${OSA_V18.1.0.0.1}/osa-base/extensibility-api/osa.spark-cql.extensibility.api.jar ${OSA_V18.1.0.0.0}/osa-base/extensibility-api/osa.spark-cql.extensibility.api.jar
cp ${OSA_V18.1.0.0.1}/osa-base/lib/ext/osa.web.jetty-session.jar ${OSA_V18.1.0.0.0}/osa-base/lib/ext/osa.web.jetty-session.jar
 
cp ${OSA_V18.1.0.0.1}/osa-base/resources/libs/wlthint3client.jar ${OSA_V18.1.0.0.0}/osa-base/resources/libs/wlthint3client.jar
cp ${OSA_V18.1.0.0.1}/osa-base/resources/libs/spark-archive.jar ${OSA_V18.1.0.0.0}/osa-base/resources/libs/ #spark-acrhive.jar is a new file added in Oracle Stream Analytics 18.1.0.0.1 installer

#comment below line if you want to keep existing logging configuration
cp ${OSA_V18.1.0.0.1}/osa-base/resources/log4j2.xml ${OSA_V18.1.0.0.0}/osa-base/resources/log4j2.xml

cp ${OSA_V18.1.0.0.1}/osa-base/resources/modules/spark-osa.jar ${OSA_V18.1.0.0.0}/osa-base/resources/modules/spark-osa.jar
cp ${OSA_V18.1.0.0.1}/osa-base/sql/*.sql ${OSA_V18.1.0.0.0}/osa-base/sql/
 
rm -f ${OSA_V18.1.0.0.0}/osa-base/start.d/logging-log4j2.ini
 
cp ${OSA_V18.1.0.0.1}/osa-base/webapps/osa.web.*.war ${OSA_V18.1.0.0.0}/osa-base/webapps/
cp ${OSA_V18.1.0.0.1}/osa-base/webapps/osa.web.admin.xml ${OSA_V18.1.0.0.0}/osa-base/webapps/
#osa.web.admin.xml is a new file added in Oracle Stream Analytics 18.1.0.0.1 installer

cp ${OSA_V18.1.0.0.1}/osa-base/version.txt ${OSA_V18.1.0.0.0}/osa-base/version.txt

Administering Stream Analytics

This topic applies only to Oracle user-managed services.

Administering Stream Analytics is essential to get the required results.

Topics:

Managing Users in Stream Analytics

This topic applies only to Oracle user-managed services.

After installing Oracle Stream Analytics, the next important step is to set up and manage users to use the application.

In this release of Oracle Stream Analytics, user details are stored in a database. When you create Oracle Stream Analytics schema at the time of installation, the following database tables are populated with one record in each table:

osa_users — table containing the users
osa_user_roles — table containing the user names and their associated roles

When you execute a query to pull in all the data from the osa_users table, you can see the following:

select * from osa_users;

+----+----------+--------------------------------------+ 
| id | username | pwd                                  | 
+----+----------+--------------------------------------+ 
|  1 | osaadmin | MD5:201f00b5ca5d65a1c118e5e32431514c | 
+----+----------+--------------------------------------+

where osaadmin is the pre-configured user along with the encrypted password.

When you execute a query to pull in all the data from the osa_user_roles table, you can see the following:

select * from osa_user_roles;

+---------+---------+
| user_id | role_id |
+---------+---------+
|       1 |       1 |
+---------+---------+

where role_id of value 1 indicates that the user is an administrator.

Configuring Stream Analytics System Settings

This topic applies only to Oracle user-managed services.

Only users with the Administrator role can set the system settings in Stream Analytics.

To set/update system settings:

Click the user name in the top right corner of the screen.
Click System Settings.
The System Settings page opens.
Click Environment.
Specify the server names and URLs where the Kafka Zookeeper, Yarn Resource Manager, or Spark Standalone, Path, HA Namenodes are deployed and running. Hadoop authentication is an optional setting.
Description of the illustration system_settings_environment.png
- Kafka Zookeeper Connection — the URL where the Zookeeper server or servers are configured, separated by comma. Kakfa is used as internal transport.
- Runtime Server — the runtime server you want your Stream Analytics instance to run on
- YARN Resource Manager URL — the URL where the YARN Resource Manager is configured, if the runtime server is Yarn
- Spark REST URL — Spark standalone REST URL, if the runtime server is Spark standalone. To submit an Stream Analytics pipeline to Spark, the pipeline needs to be copied to a storage location that is accessible by all Spark nodes.
- Storage — the type of storage for pipelines
- Path — the path where the storage exists. The user who accesses this folder must have the write permission to it. The folder at this path is either created by the administrator, or Stream Analytics creates the folder at runtime.
- HA Namenodes — If the hostname in the above URL refers to a logical HA cluster, specify the actual namenodes here in the format hostname1<port>, hostname2<port> etc.
- Hadoop Authentication — the type of authentication configured on the Hadoop cluster. Stream Analytics supports only simple authentication.
Click Pipelines. Specify the various settings for the pipeline within Stream Analytics.
- Batch Duration — the default duration of the batch for each pipeline
- Executor Count — the default number of executors per pipeline
- Cores per Executor — the default number of cores. A minimum value of 2 is required.
- Executor Memory — the default allocated memory for each executor instance in megabytes
- Cores per Driver — the default number of cores
- Driver Memory — the default allocated memory per driver instance in megabytes
- High Availability — toggle the default HA value as on/off for each pipeline
Click Proxy. If you set proper proxy, the back-end system will use these settings to reach the REST target. The proxy settings are also required for geo code related spatial patterns.
Click User Management. You see the list of available users here. You can add/delete users from this screen.
Click Save.

Note:
On the Yarn Cluster, make sure that you do not set a high value for yarn.nm.liveness-monitor.expiry-interval-ms (for example, 3000 ms instead of 10 minutes). This property determines the default value for how long to wait until a node manager is considered dead. Using a high value for this property does not process the events and the events are lost.

Configuring User Preferences

This topic applies only to Oracle user-managed services.

Based on the preferences the users set in this page, the characteristics of Stream Analytics vary.

General

To set/update user preferences:

Click the user name in the top right corner of the screen.
Click Preferences. The Preferences page opens.

Provides a set of general preferences that you can view and set according to your requirements.

Description of general_pref.png follows
Description of the illustration general_pref.png

Start Page

Select if you want the Home page, the Catalog page, or the Patterns page to appear as the Start Page.

Notifications

Provides a set of notifications preferences that you can view and set according to your requirements.

Description of notifications_pref.png follows
Description of the illustration notifications_pref.png

Show Information Notifications

Select this option if you want the information notifications to appear in the pipeline. This option is selected by default.

Information Notification duration (in seconds)

Choose the number of seconds for which the notifications appear. The default value is 5.

Catalog

Provides a set of catalog preferences that you can view and set according to your requirements.

Description of catalog_pref.png follows
Description of the illustration catalog_pref.png

Default Sorting Column

Select the column by which you want the columns to be sorted. This value will be used as the default for all columns until you change the value again.

Default Sorting Order

Select the order by which you want the columns to be sorted. This value will be used as the default value for all columns until you change the value again.

Default Page Size

Select the value to be used as the default page size. Based on the value selected, the number of records that appear on a page vary. This value will be used as the default for all pages until you change the value again.

Pipeline

Provides a set of pipeline preferences that you can view and set according to your requirements.

Description of application_ua_pref.png follows
Description of the illustration application_ua_pref.png

Select Yes if you want to display the User Assistance text for the pipelines in the Pipeline Editor.

Live Output Stream

Provides a set of pipeline live output stream preferences that you can view and set according to your requirements.

Select a value that you want to be applied as the default table size for the data in Live Output Stream of a pipeline.

Timestamp

Provides a set of pipeline timestamp preferences that you can view and set according to your requirements.

Description of preferences_timestamp.png follows
Description of the illustration preferences_timestamp.png

Map

Provides a set of map preferences that you can view and set according to your requirements.

Select a value that you want to be used as the default tile layer preference in the geo fences.

Description of map_preferences.png follows
Description of the illustration map_preferences.png

Working with Stream Analytics

This topic applies only to Oracle user-managed services.

Stream Analytics has various artifacts like connections, references, streams, targets, cubes, dashboards, predictive models, custom jars, and many more. The artifacts are the important resources using which you can create pipelines.

Topics:

Home Page

The Home page is the first page that you see when you login to Stream Analytics. This page lists the industry verticals that Stream Analytics supports.

Each industry vertical has a tag associated with it and the tags are case-sensitive.

Distributed Intelligence for IOT - Acquire, analyze, and act on high-volume, high-velocity data from sensors and devices both at the edge and in the data center in real-time. Tag for this vertical is IOT.
Risk and Fraud Management - Leverage industry's best stream processing platform to assess risk and prevent financial fraud in real-time. Tag for this vertical is risk.
Transportation and Logistics - Manage fleet, track assets, and improve supply chain efficiencies by combining streaming data with Oracle's advanced spatial functions. Tag for this vertical is transportation.
Customer Experience and Consumer Analytics - Know the sentiment of your customers to reduce churn, improve loyalty, make offers, and attract customers in real-time. Tag for this vertical is customer.
Telecommunications - Pro actively monitor your networks, predict network failures, and prevent distributed denial of service type attacks. Tag for this vertical is telecom.
Retail — Understand and Apply instant Retail Shopping trends, instigate beneficial shelf life patterns and placements, be responsive to Customers cart utilization and interoperate with advanced Vending Machines. Tag for this vertical is retail.

The Home page is as shown below:

Description of home_page.png follows
Description of the illustration home_page.png

You can navigate to the Catalog or the Patterns page from the home page to get started with Stream Analytics.

About the Catalog

The Catalog page is the location where resources including pipelines, streams, references, maps, connections, targets, dashboards, predictive models, custom jars, visualizations, and cubes are listed. This is the go-to place for you to perform any tasks in Stream Analytics.

You can mark a resource as a favorite in the Catalog by clicking on the Star icon. Click the icon again to remove it from your favorites. You can also delete a resource or view its topology using the menu icon to the right of the favorite icon.

The tags applied to items in the Catalog are also listed on the screen below the left navigation pane. You can click any of these tags to display only the items with that tag in the Catalog. The tag appears at the top of the screen. Click Clear All at the top of the screen to clear the Catalog and display all the items.

You can include or exclude pipelines, streams, references, predictive models, geo fences, connections, targets, custom jars, visualizations, dashboards, and cubes using the View All link in the left panel under Show Me. When you click View All, a check mark appears beside it and all the components are displayed in the Catalog.

When you want to display or view only a few or selective items in the Catalog, deselect View All and select the individual components. Only the selected components will appear in the Catalog.

Typical Workflow for Administering Stream Analytics

The typical workflow lists the artifacts required to create a pipeline in Stream AnalyticsStream Analytics.

The prerequisites for a pipeline are:

A connection is required to create a stream, except for a file stream.
A stream is required to create a pipeline.

Creating a Connection

To create a connection:

Click Catalog in the left pane.
From the Create New Item menu, select Connection.
Provide details for the following fields on the Type Properties page and click Next:
- Name — name of the connection
- Description — description of the connection
- Tags — tags you want to use for the connection
- Connection Type — type of connection: Coherence, Database, Druid, JNDI, or Kafka
Description of the illustration create_connection_type.png
Enter Connection Details on the next screen and click Save.
When the connection type is Coherence:
- Host name — the Coherence Extend Proxy Services TCP/IP Server Socket host
- Port — the Coherence Extend Proxy Services TCP/IP Server Socket port
When the connection type is Database:
- Connect using — select the way you want to identify the database; SID or Service name
- Service name/SID — the details of the service name or SID
- Host name — the host name on which the database is running
- Port — the port on which the database is running. Usually it is 1521
- Username — the user name with which you connect to the database
- Password — the password you use to login to the database
When the connection type is Druid, provide Zookeeper URL.

When the connection type is JNDI:
- JNDI Provider — select the JNDI service provider
- Server Url(s) — the server url(s) for the JNDI connection; for example: host1:port1, host2:port2
- Username — the user name for authenticating the JNDI connection
- Password — the password for the JNDI connection
When the connection type is Kafka, provide Zookeeper URL.

A connection with the specified details is created.

Cache Configuration for Coherence

Stream Analytics requires a special coherence cache configuration and the proxy schema, so that it can connect to the coherence.

To enrich stream data with external coherence cluster reference data, you must access external coherence cluster using extend client APIs. To access external cluster as client, you need to configure cache-config with ExtendTcpCacheService and ExtendTcpInvocationService.

Configure the Coherence Cluster

Make sure that you have Coherence for Java is installed.

To configure the external cluster as client:

Create an XML file named cache-config.xml.

Copy the following XML to the file:

<?xml version="1.0"?>

<cache-config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns="http://xmlns.oracle.com/coherence/coherence-cache-config"
   xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-cache-config
   coherence-cache-config.xsd">
		<caching-scheme-mapping>
			<cache-mapping>
				<cache-name>
					externalcache*
				</cache-name>
				<schema-name>
					remote
				</schema-name>
			</cahce-mapping>
		</caching-scheme-mapping>

		<caching-schemes>
			<remote-cache-scheme>
				<scheme-name>
					remote
				</scheme-name>
				<service-name>
					ExtendTcpCacheService
				</service-name>
				<initiator-config>
					<tcp-initiator>
						<remote-addresses>
							<socket-address>
								<address>localhost	</address>
								<port>9099</port>
							</socket-address>
						</remote-addresses>
					</tcp-initiator>
					<outgoing-message-handler>
						<request-timeout>5s</request-timeout>
					</outgoing-message-handler>
				</initiator-config>
			</remote-cache-scheme>

			<remote-invocation-scheme>
				<scheme-name>extend-invocation</scheme-name>
				<service-name>ExtendTcpInvocationService</service-name>
				<initiator-config>
					<tcp-initiator>
						<remote-addresses>
							<socket-address>
								<address>localhost</address>
								<port>9099</port>
							</socket-address>
						</remote-addresses>
					</tcp-initiator>
					<outgoing-message-handler>
						<request-timeout>5s</request-timeout>
					</outgoing-message-handler>
				</initiator-config>
			</remote-invocation-scheme>
		</caching-schemes>
</cache-config>

Save and close the file.
Test the connection to the cluster.
```
InvocationService service = (InvocationService) CacheFactory.getConfigurableCacheFactory().ensureService("ExtendTcpInvocationService");
```
ensureService() will throw exception if there is no coherence cluster available with the given host and port.
Create a coherence reference using a coherence connection.
Register the coherence as reference.

The following is the sample code to register the coherence as reference:

override def initialize():Unit = {
    repartition = true
    val externalEvent = EventType("externalorders",IntAttr("orderId"), VarCharAttr("orderDesc", 20))
    val sExtSrcProps = Map(EXT_URL -> "",EXT_ENTITY_NAME -> "externalcache")
    val jExtSrcProps = new java.util.HashMap[String,String](sExtSrcProps)
    val converter = ConverterFactory(ConverterType.COHERENCE,externalEvent)
    cc.registerEventType(externalEvent)
    cc.registerRelation(externalEvent).onExternal(jExtSrcProps,ExtSourceType.COHERENCE,converter)
 }

def main(args: Array[String]) {
    cql = "istream(select R.orderId as orderId, R.orderStatus as orderStatus, Ext.orderDesc as orderDesc from orders[now] as R, externalorders as Ext where R.orderId = Ext.orderId)"
    name = "CoherenceCorrelation"
    processOrders(args)
    }
}
// EXT_URL is not used for coherence as reference , currently used for webservice & database, so this will be set to EMPTY
//EXT_ENTITY_NAME is the cache name of the external coherence cluster

For the above example, coherence cache must have key as orderId <Integer> and value as Map of values for orderId and orderDesc. A sample cache similar to the following will populate:

NamedCache cache = CacheFactory.getCache("externalcache"); 
Map<String,Object> order1 = new HashMap<String, Object>();
order1.put("orderId", new Integer(1)); 
order1.put("orderDesc", "HP Deskjet v2"); 
Map<String,Object> order2 = new HashMap<String, Object>(); 
order2.put("orderId", new Integer(2)); 
order2.put("orderDesc", "Oracle Database 12"); 
MapString,Object> order3 = new HashMap<String, Object>();
order3.put("orderId", new Integer(3)); 
order3.put("orderDesc", "Apple iPhone6s"); 
Map<String,Object> order4 = new HashMap<String, Object>();
order4.put("orderId", new Integer(4)); 
order4.put("orderDesc", "Logitech Mouse"); 
cache.put(1,order1); 
cache.put(2,order2); 
cache.put(3,order3); 
cache.put(4,order4);

Creating a Stream

A stream is a source of events with a given content (shape).

To create a stream:

Navigate to Catalog.
Select Stream in the Create New Item menu.
Provide details for the following fields on the Type Properties page and click Next:
- Name — name of the stream
- Description — description of the stream
- Tags — tags you want to use for the stream
- Stream Type — select suitable stream type. Supported types are File, GoldenGate, JMS, and Kafka.
Description of the illustration create_stream_type.png
Provide details for the following fields on the Source Details page and click Next:

When the stream type is File:
- File Path or URL — the location of the file that you want to upload
- Read whole content — select this option if you want to read the whole content of the file
- Number of events per batch — the number of events that you want to process per batch
- Loop — select this option if you want to process the file in a loop
- Data Format — the format of the data. The supported types are: CSV and JSON.
When the stream type is GoldenGate:
- Connection — the connection for the stream
- Topic name — the topic name that receives events you want to analyze
- Data Format — the format of the data. The supported types are: CSV, JSON, AVRO. AVRO is a data serialization system.
When the stream type is JMS:
- Connection — the connection for the stream
- Jndi name — the Jndi that reads messages from topics, distributed topics, queues and distributed queues
- Client ID — the client to be used for durable subscriber
- Message Selector — the message selector to filter messages. If your messaging application needs to filter the messages it receives, you can use a JMS API message selector, which allows a message consumer to specify the messages it is interested in. Message selectors assign the work of filtering messages to the JMS provider rather than to the application.
  
  A message selector is a String that contains an expression. The syntax of the expression is based on a subset of the SQL92 conditional expression syntax. The message selector in the following example selects any message that has a NewsType property that is set to the value 'Sports' or 'Opinion':
```
NewsType = ’Sports’ OR NewsType = ’Opinion’
```
  The createConsumer and createDurableSubscriber methods allow you to specify a message selector as an argument when you create a message consumer.
- Subscription ID — the subscription id for durable selector
- Data Format — the format of the data. The supported types are: CSV, JSON, AVRO, MapMessage. MapMessage is supported only for JNDI based streams.
  
  If the data format is AVRO, you must also specify the message schema by setting org.apache.kafka.clients.producer.ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG and KEY_SERIALIZER_CLASS_CONFIG parameters as ByteArraySerializer.
  
  A MapMessage object is used to send a set of name-value pairs. The names are String objects, and the values are primitive data types in the Java programming language. The names must have a value that is not null, and not an empty string. The entries can be accessed sequentially or randomly by name. The order of the entries is undefined.
When the stream type is Kafka:
- Connection — the connection for the stream
- Topic name — the topic name that receives events you want to analyze
- Data Format — the format of the data within the stream. The supported types are: CSV, JSON, AVRO.
  
  If the data format is AVRO, you must also specify the message schema by setting org.apache.kafka.clients.producer.ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG and KEY_SERIALIZER_CLASS_CONFIG parameters as ByteArraySerializer.
Description of the illustration create_stream_source.png
Select one of the mechanisms to define the shape on the Shape page:
- Infer Shape — detects the shape automatically from the input data stream.
  
  You can infer the shape from Kafka, JSON schema file, or CSV message/data file. You can also save the auto detected shape and use it later.
- Select Existing Shape — lets you choose one of the existing shapes from the drop-down list.
- Manual Shape — populates the existing fields and also allows you to add or remove columns from the shape. You can also update the datatype of the fields.
Description of the illustration create_stream_shape.png

A stream is created with specified details.

CSV Data for Pre-defined Formats

When your data format is CSV, select a predefined format based on the variations of CSV data that differs due to the originating source of these CSV. The following table describes the CSV data for each of these predefined formats:

CSV Predefined Format	Description
`DEFAULT`	Standard comma separated format, as for `RFC4180` but allowing empty lines
`EXCEL`	Excel file format (using a comma as the value delimiter).
`INFORMIX_UNLOAD_CSV`	Default `Informix CSV UNLOAD` format used by the `UNLOAD TO file_name` operation (escaping is disabled.) This is a comma-delimited format with a LF character as the line separator. Values are not quoted and special characters are escaped with '\'. The default NULL string is "\\N".
`MYSQL`	Default `MySQL` format used by the `SELECT INTO OUTFILE` and `LOAD DATA INFILE` operations. This is a tab-delimited format with a LF character as the line separator. Values are not quoted and special characters are escaped with '\'. The default NULL string is "\\N".
`POSTGRESQL_CSV`	Default `PostgreSQL CSV` format used by the `COPY` operation. This is a comma-delimited format with a LF character as the line separator. The default NULL string is "".
`POSTGRESQL_TEXT`	Default `PostgreSQL` text format used by the `COPY` operation. This is a tab-delimited format with a LF character as the line separator. The default NULL string is "\\N".
`RFC4180`	Comma separated format as defined by `RFC4180`
`TDF`	Tab-delimited format

Capabilities of JMS Source

The capabilities of JMS Source are listed in the following table:

Capability	Description	Comments
Ability to connect to JMS Cluster	JMS consumer should be able to connect to JMS cluster and handle JMS server fail-over
Message Format support	Map and TextMessage (JSON, CSV and AVRO)	Does not support xml and object
Message selector	JMS message selector to use to filter messages. Only messages that match the selector will produce events.
Re-connection	Reconnect to JMS server or JMS cluster
Read messages from queue/distributed queue
Read messages from topic	Read messages from JMS topic. By default the subscriber is non-durable
Support for Durable subscriber	A durable subscriber registers a durable subscription by specifying a unique identity that is retained by the JMS provider. If the consumer reconnects to JMS topic, it would read messages from where it last read.
T3 Support	Weblogic JMS Protocol

JMS Server Clean Up

When you create a JMS stream and select the durable subscription option (by providing client ID and subscription ID value), Stream Analytics creates the durable subscription (if not already present) when the pipeline using this stream is running. When you come out of the pipeline or unpublish the pipeline(or kill the running pipeline), the durable subscription remains on the JMS Server. It is advisable to delete the durable subscription from the JMS Server and clean up the resources, if you do not intend to publish this pipeline anymore.

Creating a Reference

The reference defines a read-only source of reference data to enrich a stream. A stream containing a customer name could use a reference containing customer data to add the customer’s address to the stream by doing a lookup using the customer name.

A database reference is a reference to specified table in the database. With cache enabled for database reference, when the values gets pulled from database, they are maintained in coherence cache for reference from where they can be served on next request. A database reference requires a database connection.

A coherence reference can be any external cache defined in coherence cluster that can have data from an external system.

To create a reference:

Navigate to Catalog.
Select Reference in the Create New Item menu.
Provide details for the following fields on the Type Properties page and click Next:
- Name — name of the reference
- Description — description of the reference
- Tags — tags you want to use for the reference
- Reference Type — the reference type of the reference. The supported reference types are: Coherence and Database.
  
  Description of the illustration create_reference_type.png
Provide details for the following fields on the Source Details page and click Next:

When the reference type is Coherence, enter or select appropriate values for:
- Connection — the connection for the coherence reference
  
  Description of the illustration create_reference_source.png
- Cache name — the name of the cache to enable caching for better performance at the cost of higher memory usage of the Spark applications. Caching is supported only for single equality join condition. When you update the cache, the application will receive the updated data very quickly.
Coherence reference has data in key-value pairs. Key is object type and value is Map<String,Object>. Map<String,Object> is map of attribute names and values, attributes list should match with external event type. In this release, only external schema for key and value s supported.

When the reference type is Database Table, enter or select appropriate values for:
- Connection — the connection for the database reference
- Enable Caching — select this option if you want to enable caching
- Expiry Delay — the amount of time from last update that entries will be kept by the cache before being marked as expired. Any attempt to read an expired entry will result in a reloading of the entry from the configured cache store. This field is enabled only when caching is enabled.
Provide details for the following fields on the Shape page and click Save:

When the reference type is Coherence:
- Select Existing Shape — select a shape that you want to use for the reference
  
  Remember:
  Ensure that you do not use any of the CQL reserved words as the column names. If you use the reserved keywords, you cannot deploy the pipeline.
- Manual Shape — select this option if you want to define your own shape
Note:
When you load coherence data, ensure that you include precision and scale for number type. Only when these values are specified, the join works. For example,
```
NamedCache cache  = CacheFactory.getCache("externalcachetimestamp");

        java.math.BigDecimal big10 = new java.math.BigDecimal("10",new
MathContext(58)).setScale(56, RoundingMode.HALF_UP);

        Map<String,Object> order1 = new HashMap<String, Object>();
order1.put("strValue", "Test");
order1.put("intervalValue", "+000000002 03:04:11.330000000");
        order1.put("orderTag", big10);

        cache.put(big10,order1);
```
When the reference type is Database Table:
- Shape Name — select a shape that you want to use for the reference

When the datatype of the table data is not supported, the table columns do not have auto generated datatype. Only the following datatypes are supported:

numeric
interval day to second
text
timestamp (without timezone)
date time (without timezone)

Note:
The date column cannot be mapped to timestamp. This is a limitation in the current release.

A reference is created with the specified details.

Limitations of Coherence as Reference

With coherence as reference, there are a few limitations:

You cannot test the connection
You need to specify the cache name manually
Only equal operator is allowed while establishing a correlation with coherence reference
You must use manual shape

Creating a Dashboard

Dashboard is a visualization tool that helps you look at and analyze the data related to a pipeline based on various metrics like visualizations. A dashboard can have visualizations created out of cubes as well.

Dashboard is an analytics feature. You can create dashboards in Stream Analytics to have a quick view at the metrics.

To create a dashboard:

Go to the Catalog.
Select Dashboard in the Create New Item menu.

The Create Dashboard screen appears.

Description of the illustration create_dashboard.png
Provide suitable details for the following fields:
- Name — enter a name for the dashboard. this is a mandatory field.
- Description — enter a suitable description for the dashboard. This is an optional field.
- Tags — enter or select logical tags to easily identify the dashboard in the catalog. This is an optional field.
Click Next.
Enter a custom stylesheet for the dashboard. This is an optional step.
Click Save.
You can see the dashboard in the Catalog.

After you have created the dashboard, it is just an empty dashboard. You need to start adding visualizations to the dashboard.

Editing a Dashboard

To edit a dashboard:

Click the required dashboard in the catalog.

The dashboard opens in the dashboard editor.

Description of the illustration edit_dashboard.png
Click the Add a new visualization icon to see a list of existing visualizations. Visualizations from the pipelines and as well as from the cube explorations appear here. Go through the list, select one or more visualizations and add them to the dashboard.
Click the Specify refresh interval icon to select the refresh frequency for the dashboard. This is applicable only for cube based visualizations not applicable for streaming charts created out of pipeline.

This just a client side setting and is not persisted with the Superset Version 0.17.0.
Click the Apply CSS to the dashboard icon to select a CSS. You can also edit the CSS in the live editor.

You can also see the active filter applied to the dashboard by clicking the Active dashboard filters icon. You can save the link to the dashboard or email the link to someone using the Copy the link to the clipboard and Email the link icons respectively.
Click the Save icon to save the changes you have made to the dashboard.
Hover over the added visualization, click the Explore chart icon to open the chart editor of the visualization.

Description of the illustration explore_chart.png

You can see the metadata of the visualization. You can also move the chart around the canvas, refresh it, or remove it from the dashboard.

A cube exploration looks like the following:

Description of the illustration cube_exploration.png

The various options like time granularity, group by, table timestamp format, row limit, filters, and result filters add more granularity and details to the dashboard.
Click Save as to make the following changes to the dashboard:
- Overwrite the visualization
- Overwrite the current visualization with a different name
- Add the visualization to an existing dashboard
- Add the visualization to a new dashboard

Creating a Cube

Cube is a data structure that helps in quickly analyzing the data related to a business problem on multiple dimensions.

To create a cube:

Go to the Catalog.
From the Create New Item menu, select Cube.
On the Create Cube — Type Properties screen, provide suitable details for the following fields:
- Name — enter a name for the cube. This is a mandatory field.
  Make sure that the names you use for the underlying sources for the cube like Pipeline Name, Druid Connection, and Kafka Target use names that contain alphanumeric, hyphen, and underscore characters.
- Description — enter a suitable description for the cube. This is an optional field.
- Tags — enter or select logical tags for the cube. This is an optional field.
- Source Type — select the source type from the drop-down list. Currently, Published Pipeline is the only supported type. This is a mandatory field.
Description of the illustration create_cube_type.png
Click Next and provide suitable details for the following fields on the Ingestion Details screen:
- Connection — the connection for the cube. This is a mandatory field.
- Pipelines — select a pipeline to be used as the base for the cube. This is a mandatory field.
- Kafka Target — the Kafka target for the cube. This is a mandatory field.
- Timestamp — select a column from the pipeline to be used as the timestamp. This is a mandatory field.
- Timestamp format — select or set a suitable format for the timestamp using Joda time format. This is a mandatory field. auto is the default value.
- Metrics — select metrics for creating measures
- Dimensions — select dimensions for group by
- High Cardinality Dimensions — high cardinality dimensions such as unique IDs. Hyperlog approximation will be used.
Description of the illustration cube_ingestion.png
Click Next and select the required values for the Metric on the Metric Capabilities screen.

Description of the illustration cube_metric_details.png
Click Next and make any changes, if required, on the Advanced Settings screen.
- Segment granularity — select the granularity with which you want to create segments
- Query granularity — select the minimum granularity to be able to query results and the granularity of the data inside the segment
- Task count — select the maximum number of reading tasks in a replica set. This means that the maximum number of reading tasks is taskCount*replicas and the total number of tasks (reading + publishing) is higher than this. The number of reading tasks is less than taskCount if taskCount > {numKafkaPartitions}.
- Task duration — select the length of time before tasks stop reading and begin publishing their segment. The segments are only pushed to deep storage and loadable by historical nodes when the indexing task completes.
- Maximum rows in memory — enter a number greater than or equal to 0. This number indicates the number of rows to aggregate before persisting. This number is the post-aggregation rows, so it is not equivalent to the number of input events, but the number of aggregated rows that those events result in. This is used to manage the required JVM heap size. Maximum heap memory usage for indexing scales with maxRowsInMemory*(2 + maxPendingPersists).
- Maximum rows per segment — enter a number greater than or equal to 0. This is the number of rows to aggregate into a segment; this number is post-aggregation rows.
- Immediate Persist Period — select the period that determines the rate at which intermediate persists occur. This allows the data cube is ready for query earlier before the indexing task finishes.
- Report Parse Exception — select this option to throw exceptions encountered during parsing and halt ingestion.
- Advanced IO Config — specify name-value pair in a CSV format. Available configurations are replicas, startDelay, period, useEarliestOffset, completionTimeout, and lateMessageRejectionPeriod.
- Advanced Tuning Config — specify name-value pair in CSV format. Available configurations are maxPendingPersists, handoffConditionTimeout, resetOffsetAutomatically, workerThreads, chatThreads, httpTimeout, and shutdownTimeout.
Description of the illustration cube_advanced_settings.png
Click Save to save the changes you have made.

You can see the cube you have created in the catalog.

Exploring a Cube

When you create druid based cube, you can explore data in it.

To explore a cube:

In the Catalog, click the cube that you want to explore.
The Cube Exploration canvas appears.

Construct a query by setting the various parameters.

Visualization Type — the type of visualization to be used for displaying data. The supported visualizations are:

Distribution - Bar Chart

Separator

Sunburst

Pie Chart

World Cloud

Sankey

Time Series - Line Chart

Treemap

Directed force Layout

Time Series - Dual Axis Line Chart

Calendar Heatmap

World Map

Time Series - Bar Chart

Box Plot

Filter Box

Time Series - Percent Change

Bubble Chart

iFrame

Time Series - Stacked

Bullet Chart

Streaming Chart

Table View

Big Number with Trendline

Parallel Coordinates

Markup

Big Number

Heatmap

Pivot Table

Histogram

Horizon

Time — time related form attributes like time granularity, origin (starting point of time), and time range
Group By — parameters to aggregate the query data
Not Grouped By — parameter to query atomic rows
Options
Filters — columns that you can use in filters
Result Filters — columns that you can use in result filters

Description of cube_exploration.png follows
Description of the illustration cube_exploration.png

Click Query to run the query with the defined parameters.
Click Save As to save the cube exploration. You can save it as a visualization, choose to add it to an existing dashboard, not to add to a dashboard, or to a new dashboard.

Creating a Target

The target defines a destination for output data coming from a pipeline.

To create a target:

Navigate to Catalog.
Select Target in the Create New Item menu.
Provide details for the following fields on the Type Properties page and click Save and Next:
- Name — name of the target
- Description — description of the target
- Tags — tags you want to use for the target
- Target Type — the transport type of the target. Supported types are JMS, Kafka and Rest. The target is a sink for the output event. Each type of target is a different sink system and therefore different configuration parameters are required for different types.
  
  Description of the illustration create_target_type.png
Provide details for the following fields on the Target Details page and click Next:

When the target type is JMS:
- Connection — the connection for the target
- Jndi name — the topic or queue name defined in Jndi to be used in the target
  
  Description of the illustration create_target_details.png
- Data Format — select a suitable data format. This is a mandatory field. The supported data format types are: CSV and JSON.
When the target type is Kafka:
- Connection — the connection for the target
- Topic Name — the Kafka topic to be used in the target
- Data Format — select a suitable data format. This is a mandatory field. The supported data format types are: CSV and JSON.
When the target type is REST:
- URL — enter the REST service URL. This is a mandatory field.
- Custom HTTP headers — set the custom headers for HTTP. This is an optional field.
- Batch processing — select this option to send events in batches and not one by one. Enable this option for high throughput pipelines. This is an optional field.
- Data Format — select a suitable data format. This is a mandatory field.
Click Test connection to check if the connection has been established successfully.

Testing REST targets is a heuristic process. It uses proxy settings. The testing process uses GET request to ping the given URL and returns success if the server returns OK (status code 200). The return content is of the type of application/json.
Provide details for the following fields on the Data Format page and click Next:

When the data format type is CSV:
- CSV Predefined Format — select a predefined CSV format. This supported formats are: Excel, InfomixUnload, InfomixUnloadCsv, MySQL, PostgreSQLCsv, PostgreSQLText.
- Create the header row — select this option if you want to create a header row in the target.
When the data format type is JSON:
- Create nested json object — select this option if you want a nested json object to be created for the target
  
  Description of the illustration create_target_dataformat.png
Select one of the mechanisms to define the shape on the Shape page and click Save:
- Select Existing Shape lets you choose one of the existing shapes from the drop-down list.
- Manual Shape populates the existing fields and also allows you to add or remove columns from the shape. You can also update the datatype of the fields.
  
  Description of the illustration create_target_shape.png

A target is created with specified details.

Creating Target from Pipeline Editor

Alternatively, you can also create a target from the pipeline editor. When you click Create in the target stage, you are navigated to the Create Target dialog box. Provide all the required details and complete the target creation process. When you create a target from the pipeline editor, the shape gets pre-populated with the shape from the last stage.

Creating a Geo Fence

Geo fences are further classified into two categories: manual geo fence and database-based geo fence.

Create a Manual Geo Fence

To create a manual geo fence:

Navigate to the Catalog page.
Click Create New Item and select Geo Fence from the drop-down list.

The Create Geo Fence dialog opens.
Enter a suitable name for the Geo Fence.
Select Manually Created Geo Fence as the Type.
Click Save.

The Geo Fence Editor opens. In this editor you can create the geo fence according to your requirement.
Within the Geo Fence Editor, Zoom In or Zoom Out to navigate to the required area using the zoom icons in the toolbar located on the top-left side of the screen.

You can also use the Marquee Zoom tool to zoom a specific area on the map. You can mark an area using the marquee zoom and that area in map is zoomed.
Click the Polygon Tool and mark the area around a region to create a geo fence.

Description of the illustration create_geo_fence.png
Enter a name and description, and click Save to save your changes.

Update a Manual Geo Fence

To update a manual geo fence:

Navigate to the Catalog page.
Click the name of the geo fence you want to update.

The Geo Fence Editor opens. You can edit/update the geo fence here.

Search Within a Manual Geo Fence

You can search the geo fence based on the country and a region or address. The search field allows you search within the available list of countries. When you click the search results tile in the left center of the geo fence and select any result, you are automatically zoomed in to that specific area.

Delete a Manual Geo Fence

To delete a manual geo fence:

Navigate to Catalog page.
Click Actions, then select Delete Item to delete the selected geo fence.

Create a Database-based Geo Fence

To create a database-based geo fence:

Navigate to Catalog page.
Click Create New Item and then select Geo Fence from the drop-down list.

The Create Geo Fence dialog opens.
Enter a suitable name for the geo fence.
Select Geo Fence from Database as the Type.
Click Next and select Connection.
Click Next.

All tables that have the field type as SDO_GEOMETRY appear in the drop-down list.
Select the required table to define the shape.
Click Save.

Note:

You cannot edit/update database-based geo fences.

Delete a Database-based Geo Fence

To delete a database-based geo fence:

Navigate to Catalog page.
Click Actions and then select Delete Item to delete the selected geo fence.

Display the Map Using Tile Layers

Tile layer is the base map that provides immediate geographic context. Tiles are stored in the map tile server. <ph ishcondition="Product_Family=Cloud" varref="streaming">Stream Analytics</ph><ph ishcondition="Product_Family=OnPremise" varref="osa">Oracle Stream Analytics</ph> supports two types of tile layers. Open Street Maps tile layer is a free map. And, Elocation tile layer is an Oracle tile layer. These tile layers contains huge amount of data pertaining to:

Roads, railways, waterways, etc.
Restaurants, shops, stations, ATMs, and more
Walking and cycling paths
Buildings, campuses, etc.

You can choose if you would like to see the map in Elocation tile layer or Open Street Maps tile layer. To set your preference:

Click the user name in the top right corner of the screen.
Click Preferences. The Preferences page opens.
Click Map.
Under Tile Layer, choose Open Street Maps Tile Layer option from the drop-down list.

Description of the illustration tilelayer_1.png
Click Save. The map looks like this:

Description of the illustration tilelayer_2.png
To display the map in Elocation tile layer, follow steps 1 to 3.
From the Tile Layer drop-down list, choose Elocation Tile Layer.
Click Save. The map looks like this:

Description of the illustration tilelayer_4.png

Creating a Predictive Model

To create a predictive model:

In the Create New Item menu, select Predictive Model (Beta).
The Create Predictive Model page opens.
Under Type Properties do the following and then click Next:
1. In the Name field, enter a meaningful name for your PMML model.
2. In the Predictive Model Type drop-down list, select PMML Model.
  
  Note:
  Only PMML Models up to version 4.1 are supported in this release.
Under Predictive Model Details, do the following and click Save:
1. For Predictive Model URL, upload your PMML file.
2. In the Model Version field, enter the version of this artifact. For example, 1.0.
3. (Optional) In the Version Description, enter a meaningful description for your PMML file.
4. In the Algorithm field, accept the default. The algorithm is derived from the PMML file you have uploaded.
5. (Optional) In the Tool drop-down list, select the tool with which you created your PMML file.
  
  Description of the illustration pmml_filledup.png

Your predictive model has been created. It is displayed in the Catalog if you have selected the Predictive Models option under Show Me.

Description of predictive_model.png follows
Description of the illustration predictive_model.png

Limited Support for Predictive Models

The menu commands for creating Predictive Models and Scoring Stages are marked Beta, for example, Predictive Model (Beta). The Beta label indicates that the functionality has been tested, but is not fully supported. The import and scoring of Predictive Models might contain undocumented limitations and you should use them as is.

Creating a Custom Jar

A custom jar is a user-supplied Jar archive containing Java classes for custom stage types or custom functions that will be used within a pipeline.

To create a custom jar:

In the Create New Item menu, select Custom Jar.
The Import a jar for custom stages and functions wizard appears.
On the Type Properties page, enter/select suitable values and click Next:
1. In the Name field, enter a meaningful name for the custom jar you are trying to import into the application.
2. In the Description field, provide a suitable description.
3. In the Tags field, select one or more of existing tags, or enter your own tags.
4. In the Custom Jar Type drop-down list, select Custom Jar.
Description of the illustration create_custom_jar.png
On the Custom Jar Details page, click Upload file, select the jar file that you want to import into the application, and then click Save.
Make sure that the jar file you select for uploading is a valid jar file and includes all the required dependencies.

Creating a Pipeline

A pipeline is a Spark application where you implement your business logic. It can have multiple stages such as a query stage, a pattern stage, a business rule stage, a query group stage, a custom stage and many more.

To create a pipeline:

Navigate to Catalog.
Select Pipeline in the Create New Item menu.
Provide details for the following fields and click Save:
- Name — name of the pipeline
- Description — description of the pipeline
- Tags — tags you want to use for the pipeline
- Stream — the stream you want to use for the pipeline
  
  Description of the illustration create_application.png

A pipeline is created with specified details.

Configuring a Pipeline

You can configure the pipeline to use various stages like query, pattern, rules, query group, scoring, and custom stage from custom jars.

Exporting and Importing a Pipeline and Its Dependent Artifacts

The export and import feature lets you migrate your pipeline and its contents between Stream Analytics systems (such as development and production) in a matter of few clicks. You also have the option to migrate only select artifacts. You can import a pipeline developed with the latest version of Stream Analytics. On re-import, the existing metadata is overwritten with the newly imported metadata if the pipeline is not published. You can delete the imported artifacts by right-clicking them and selecting Delete.

You can export and import pipelines and artifacts except for the following:

Cubes
Dashboards
Custom Stages
Visualizations
File Streams
Predictive Models

In your Stream Analytics instance, under Catalog, right-click the pipeline or artifact that you want to export to another instance of the Stream Analytics and then select Export.

Description of the illustration export_menuoption.png

Your items are exported as a ZIP file.
Go to the Stream Analytics instance to which you want to import the exported metadata.
On the toolbar, click Import.

Description of the illustration import_button.png
In the Import dialog box, click Select file and then select the exported ZIP file.

Description of the illustration import_dialog.png
Click Import.

Description of the illustration importinprogress.png

When the metadata is imported successfully, a message similar to the following appears:

Description of the illustration importresults.png

Publishing a Pipeline

You must publish a pipeline to make the pipeline available for all the users of Stream Analytics and send data to targets.

A published pipeline will continue to run on your Spark cluster after you exit the Pipeline Editor, unlike the draft pipelines which are undeployed to release resources.

To publish a pipeline:

Open a draft pipeline in the Pipeline Editor.
Click Publish.
The Pipeline Settings dialog box opens.
Update any required settings.

Note:
Make sure to allot more memory to executors in the scenarios where you have large windows.
Click Publish to publish the pipeline.
A confirmation message appears when the pipeline is published.

You can also publish a pipeline from the Catalog using the Publish option in the Actions menu.

Using the Topology Viewer

Topology is a graphical representation and illustration of the connected entities and the dependencies between the artifacts.

The topology viewer helps you in identifying the dependencies that a selected entity has on other entities. Understanding the dependencies helps you in being cautious while deleting or undeploying an entity. Stream Analytics supports two contexts for the topology — Immediate Family and Extended Family.

You can launch the Topology viewer in any of the following ways:

Select Show topology from the Catalog Actions menu to launch the Topology Viewer for the selected entity.

Description of the illustration show_topology_catalog_actions_menu.png
Click the Show Topology icon in the Pipeline Editor.

Description of the illustration show_topology_exploration_editor.png

Click the Show Topology icon at the top-right corner of the editor to open the topology viewer.By default, the topology of the entity from which you launch the Topology Viewer is displayed. The context of this topology is Immediate Family, which indicates that only the immediate dependencies and connections between the entity and other entities are shown. You can switch the context of the topology to display the full topology of the entity from which you have launched the Topology Viewer. The topology in an Extended Family context displays all the dependencies and connections in the topology in a hierarchical manner.

Note:

The entity for which the topology is shown has a grey box surrounding it in the Topology Viewer.

Immediate Family

Immediate Family context displays the dependencies between the selected entity and its child or parent.

The following figure illustrates how a topology looks in the Immediate Family.

Description of topology_viewer_immediate.png follows
Description of the illustration topology_viewer_immediate.png

Extended Family

Extended Family context displays the dependencies between the entities in a full context, that is if an entity has a child entity and a parent entity, and the parent entity has other dependencies, all the dependencies are shown in the Full context.

The following figure illustrates how a topology looks in the Extended Family.

Description of topology_viewer_full.png follows
Description of the illustration topology_viewer_full.png

Working with Patterns

This topic applies only to Oracle user-managed services.

Patterns are a stage within a pipeline. When working from a pattern, you need to specify a few key fields to discover an interesting result. You can create pattern stages within the pipeline. Patterns are not stand-alone artifacts, they need to be embedded within a pipeline.

Topics:

About Stream Analytics Patterns

This topic applies only to Oracle user-managed services.

The visual representation of the event stream varies from one pattern type to another based on the key fields you choose.

Click Patterns on the Home page to see all the available patterns. Use the filters at left to view different categories of pattern. You can see full descriptions and learn more about each pattern by clicking the user assistant icon. Click again to hide the extra information.

A pattern provides you with the results displayed in a live output stream based on common business scenarios.

Note:

While entering data in the fields for a specific pattern, ensure that the data you enter corresponds to the datatype of the field. If there is a mismatch between the entered data and the datatype, the pattern will not deploy and throw an error.

You can include or exclude patterns based on their categories using the View All link in the left panel under Show Me. When you click View All, a check mark appears next to it and all the patterns are displayed on the page.

When you want to display/view only a few/selective patterns, deselect View All and select the individual patterns. Only the selected patterns are shown in the catalog.

Description of patterns_showtime.png follows
Description of the illustration patterns_showtime.png

The following table lists the categories of patterns:

Category	Pattern
Enrichment	Reverse Geo Code: Near By Left Outer Join
Outlier	Fluctuation
Inclusion	Union Left Outer Join
Missing Event	'A' Not Followed by 'B' Detect Missing Event
Spatial	Proximity: Stream with Geo Fence Geo Fence Spatial: Speed Interaction: Single Stream Reverse Geo Code: Near By Geo Code Spatial: Point to Polygon Interaction: Two Stream Proximity: Two Stream Direction Reverse Geo Code: Near By Place Proximity: Single Stream Geo Filter
Filter	Eliminate Duplicates Fluctuation
State	'A' Not Followed by 'B' Inverse W Detect Missing Event W 'A' Followed by 'B' ‘B’ Not Preceded by ‘A’
Finance	Inverse W W
Trend	'A' Not Followed by 'B Top N Change Detector Up Trend Detect Missing Event Down Trend 'A' Followed by 'B' Detect Duplicates Bottom N
Shape Detector	Inverse W W
Statistical	Correlation Quantile

Creating a Pipeline Using a Pattern

This topic applies only to Oracle user-managed services.

Instead of creating a pattern stage from within a pipeline, you can also create a pipeline for a pattern directly.

To create a pipeline using a pattern:

Click Patterns in the left tree on the Home page.
The Patterns page appears.
Scroll through the list of available patterns and select the required pattern.
Click Use this pattern within the selected pattern tile.

The Create pipeline using <Pattern> dialog box appears.
Fill in the details for the metadata in the Pipeline section.
Enter details for the Pattern Stage.
Click Save.
The pipeline editor opens where you can specify the parameters required for the pattern. The pipeline also appears in the Catalog.

Understanding Expression Builder Functions

This topic applies only to Oracle user-managed services.

Expression Builder is an editor that allows you to build expressions using various existing functions. The expressions help you in achieving the required results for your pipelines.

Topics:

What are Bessel Functions?

The mathematical cylinder functions for integers are known as Bessel functions.

The following Bessel functions are supported in this release:

Function Name	Description
`BesselI0(x)`	Returns the modified Bessel function of order 0 of the double argument as a double
`BesselI0_exp(x)`	Returns the exponentially scaled modified Bessel function of order 0 of the double argument as a double
`BesselI1(x)`	Returns the modified Bessel function of order 1 of the double argument as a double
`BesselI1_exp(x)`	Returns the exponentially scaled modified Bessel function of order 1 of the double argument as a double
`BesselJ(x,x)`	Returns the Bessel function of the first kind of order n of the argument as a double
`BesselK(x,x)`	Returns the modified Bessel function of the third kind of order n of the argument as a double
`BesselK0_exp(x)`	Returns the exponentially scaled modified Bessel function of the third kind of order 0 of the double argument as a double
`BesselK1_exp(x)`	Returns the exponentially scaled modified Bessel function of the third kind of order 1 of the double argument as a double
`BesselY(x)`	Returns the Bessel function of the second kind of order n of the double argument as a double

What are Conversion Functions?

The conversion functions help in converting values from one data type to other.

The following conversion functions are supported in this release:

Function Name	Description
`bigdecimal(value1)`	Converts the given value to bigdecimal
`boolean(value1)`	Converts the given value to logical
`date(value1,value2)`	Converts the given value to datetime
`double(value1)`	Converts the given value to double
`float(value1)`	Converts the given value to float
`int(value1)`	Converts the given value to integer
`long(value1)`	Converts the given value to long
`string(value1,value2)`	Converts the given value to string

What are Date Functions?

The following date functions are supported in this release:

Function Name	Description
`day(date)`	Returns day of the date
`eventtimestamp()`	Returns event timestamp from stream
`hour(date)`	Returns hour of the date
`minute(date)`	Returns minute of the date
`month(date)`	Returns month of the date
`nanosecond(date)`	Returns nanosecond of the date
`second(date)`	Returns second of the date
`systimestamp()`	Returns the system’s timestamp on which the application is running
`timeformat(value1,value2)`	Returns the provided timestamp in required time format
`timestamp()`	Returns the current output time
`year(date)`	Returns year of the date

What are Geometry Functions?

The Geometry functions allow you to convert the given values into a geometrical shape.

The following interval functions are supported in this release:

Function Name Description

Function Name	Description
`CreatePoint(lat,long,SRID)`	Returns a 2–dimensional point type geometry from the given latitude and longitude. The default SRID is 8307. The return value is of the datatype `sdo geometry`.
`distance(lat1,long1,lat2,long2,SRID)`	Returns distance between the first set of latitude, longitude and the second set of latitude, longitude values. The default SRID is 8307. The return value is of the datatype `double`.

CreatePoint(lat,long,SRID)

Returns a 2–dimensional point type geometry from the given latitude and longitude. The default SRID is 8307.

The return value is of the datatype sdo geometry.

distance(lat1,long1,lat2,long2,SRID)

Returns distance between the first set of latitude, longitude and the second set of latitude, longitude values. The default SRID is 8307.

The return value is of the datatype double.

What are Interval Functions?

The Interval functions help you in calculating time interval from given values.

The following interval functions are supported in this release:

Function Name Description

Function Name	Description
`numtodsinterval(n,interval_unit)`	Converts the given value to an `INTERVAL DAY TO SECOND` literal. The value of the `interval_unit` specifies the unit of `n` and must resolve to one of the string values: DAY, HOUR, MINUTE, or SECOND. The return value is of the datatype `interval`.
`to_dsinterval(string)`	Converts a string in format `DD HH:MM:SS` into a `INTERVAL DAY TO SECOND` data type. The `DD` indicates the number of days between 0 to 99. The `HH:MM:SS` indicates the number of hours, minutes and seconds in the interval from 0:0:0 to 23:59:59.999999. The seconds part can accept upto six decimal places. The return value is of the datatype `interval`.

numtodsinterval(n,interval_unit)

Converts the given value to an INTERVAL DAY TO SECOND literal. The value of the interval_unit specifies the unit of n and must resolve to one of the string values: DAY, HOUR, MINUTE, or SECOND.

The return value is of the datatype interval.

to_dsinterval(string)

Converts a string in format DD HH:MM:SS into a INTERVAL DAY TO SECOND data type. The DD indicates the number of days between 0 to 99. The HH:MM:SS indicates the number of hours, minutes and seconds in the interval from 0:0:0 to 23:59:59.999999. The seconds part can accept upto six decimal places.

The return value is of the datatype interval.

What are Math Functions?

The math functions allow you to perform various mathematical operations and calculations ranging from simple to complex.

The following math functions are supported in this release:

Function Name	Description
`IEEEremainder(value1,value2)`	Computes the remainder operation on two arguments as prescribed by the IEEE 754 standard
`abs(value1)`	Returns the absolute value of a number
`acos(value1)`	Returns arc cosine of a value
`asin(value1)`	Returns arc sine of a value
`atan(value1)`	Returns arc tangent of a value
`atan2(arg1,arg2)`	Returns polar angle of a point (`arg2`, `arg1`)
`binomial(base,power)`	Returns binomial coefficient of the base raised to the specified power
`bitMaskWithBitsSetFromTo(x)`	BitMask with BitsSet (From, To)
`cbrt(value1)`	Returns cubic root of the specified value
`ceil(value1)`	Rounds to ceiling
`copySign(value1,value2)`	Returns the first floating-point argument with the sign of the second floating-point argument
`cos(value1)`	Returns cosine of a value
`cosh(value1)`	Returns cosine hyperbolic of a value
`exp(x)`	Returns exponent of a value
`expm1(x)`	More precise equivalent of `exp(x)`; Returns 1 when x is around zero
`factorial(value1)`	Returns factorial of a natural number
`floor(value1)`	Rounds to floor
`getExponent(value1)`	Returns the unbiased exponent used in the representation of a double
`getSeedAtRowColumn(value1,value2)`	Returns a deterministic seed as an integer from a (seemingly gigantic) matrix of predefined seeds
`hash(value1)`	Returns an integer hashcode for the specified double value
`hypot(value1,value2)`	Returns square root of sum of squares of the two arguments
`leastSignificantBit(value1)`	Returns the least significant 64 bits of this UUID's 128 bit value
`log(value1,value2)`	Calculates the log value of the given argument to the given base, where `value 1` is the value and `value 2` is the base
`log1(value1)`	Returns the natural logarithm of a number
`log10(value1)`	Calculates the log value of the given argument to base 10
`log2(value1)`	Calculates the log value of the given argument to base 2
`logFactorial(value1)`	Returns the natural logarithm (base e) of the factorial of its integer argument as a double
`longFactorial(value1)`	Returns the factorial of its integer argument (in the range k >= 0 && k < 21) as a long
`maximum(value1,value2)`	Returns the maximum of 2 arguments
`minimum(value1,value2)`	Returns the minimum of 2 arguments
`mod(value1,value2)`	Returns modulo of a number
`mosttSignificantBit(value1)`	Returns the most significant 64 bits of this UUID's 128 bit value
`nextAfter(value1,value2)`	Returns the floating-point number adjacent to the first argument in the direction of the second argument
`nextDown(value1)`	Returns the floating-point value adjacent to the input argument in the direction of negative infinity
`nextUp(value1)`	Returns the floating-point value adjacent to the input argument in the direction of positive infinity
`Pow(m,n)`	Returns m raised to the nth power
`rint(value1)`	Returns the double value that is closest in value to the argument and is equal to a mathematical integer
`round(value1)`	Rounds to the nearest integral value
`Scalb(d,scaleFactor)`	Returns d × 2scaleFactor rounded as if performed by a single correctly rounded floating-point multiply to a member of the double value set
`signum(value1)`	Returns signum of an argument as a double value
`sin(value1)`	Returns sine of a value
`sinh(value1)`	Returns sine hyperbolic of a value
`sqrt(value1)`	Returns square root of a value
`stirlingCorrection(value1)`	Returns the correction term of the Stirling approximation of the natural logarithm (base e) of the factorial of the integer argument as a double
`tan(value1)`	Returns tangent of a value
`tanh(value1)`	Returns tangent hyperbolic of a value
`toDegrees(value1)`	Converts the argument value to degrees
`toRadians(value1)`	Returns the measurement of the angle in radians
`ulp(value1)`	Returns the size of an ulp of the argument

What are Null-related Functions?

The following null-related functions are supported in this release:

Function Name	Description
`nvl(value1,value2)`	Replaces null with a value of the same type

What are Statistical Functions?

Statistical functions help you in calculating the statistics of different values.

The following statistical functions are supported in this release:

Function Name	Description
`beta1(value1,value2,value3)`	Returns the area from zero to `value3` under the beta density function
`betaComplemented(value1,value2,value3)`	Returns the area under the right hand tail (from `value3` to infinity) of the beta density function
`binomial2(value1,value2,value3)`	Returns the sum of the terms 0 through `value1` of the Binomial probability density. All arguments must be positive.
`binomialComplemented(value1,value2,value3)`	Returns the sum of the terms `value1+1` through `value2` of the binomial probability density. All arguments must be positive.
`chiSquare(value1,value2)`	Returns the area under the left hand tail (from 0 to `value2`) of the chi square probability density function with `value1` degrees of freedom. The arguments must both be positive.
`chiSquareComplemented(value1,value2)`	Returns the area under the right hand tail (from `value2` to infinity) of the chi square probability density function with `value1` degrees of freedom. The arguments must both be positive.
`errorFunction(value1)`	Returns the error function of the normal distribution
`errorFunctionComplemented(value1)`	Returns the complementary error function of the normal distribution
`gamma(value1,value2,value3)`	Returns the gamma function of the arguments
`gammaComplemented(value1,value2,value3)`	Returns the integral from `value3` to infinity of the gamma probability density function
`incompleteBeta(value1,value2,value3)`	Returns the incomplete beta function evaluated from zero to `value3`
`incompleteGamma(value1,value2)`	Returns the incomplete gamma function
`incompleteGammaComplement(value1,value2)`	Returns the complemented incomplete gamma function
`logGamma(value1)`	Returns the natural logarithm of the gamma function
`negativeBinomial(value1,value2,value3)`	Returns the sum of the terms 0 through `value1` of the negative binomial distribution. All arguments must be positive.
`negativeBinomialComplemented(value1,value2,value3)`	Returns the sum of the terms `value1+1` to infinity of the negative binomial distribution. All arguments must be positive.
`normal(value1,value2,value3)`	Returns the area under the normal (Gaussian) probability density function, integrated from minus infinity to `value1` (assumes mean is zero, variance is one)
`normalInverse(value1)`	Returns the value for which the area under the normal (Gaussian) probability density function is equal to the argument `value1` (assumes mean is zero, variance is one)
`poisson(value1,value2)`	Returns the sum of the first `value1` terms of the Poisson distribution. The arguments must both be positive.
`poissonComplemented(value1,value2)`	Returns the sum of the terms `value1+1` to infinity of the poisson distribution
`studentT(value1,value2)`	Returns the integral from minus infinity to `value2` of the Student-t distribution with `value1` > 0 degrees of freedom
`studentTInverse(value1,value2)`	Returns the value, for which the area under the Student-t probability density function is equal to `1-value1/2`. The function uses the studentT function to determine the return value iteratively.

What are String Functions?

The following String functions are supported in this release:

Function Name	Description
`coalesce(value1,...)`	Returns the first non-null expression in the list. If all expressions evaluate to null, then the COALESCE function will return null
`concat(value1,...)`	Returns concatenation of values converted to strings
`indexof(string,match)`	Returns first index of `\'match\'` in `\'string\'`or 1 if not found
`initcap(value1)`	Returns a specified text expression, with the first letter of each word in uppercase and all other letters in lowercase
`length(value1)`	Returns the length of the specified string
`like(value1,value2)`	Returns a matching pattern
`lower(value1)`	Converts the given string to lower case
`lpad(value1,value2,value3)`	Pads the left side of a string with a specific set of characters (when `string1` is not null)
`ltrim(value1,value2)`	Removes all specified characters from the left hand side of a string
`replace(string,match,replacement)`	Replaces all `\'match\'` with `\'replacement\'` in `\'string\'`
`rpad(value1,value2,value3)`	Pads the right side of a string with a specific set of characters (when `string1` is not null)
`rtrim(value1,value2)`	Removes all specified characters from the right hand side of a string
`substr(string,from)`	Returns substring of a 'string' when indices are between 'from' (inclusive) and up to the end of the string
`substring(string,from,to)`	Returns substring of a \'string\' when indices are between \'from\' (inclusive) and \'to\' (exclusive)
`translate(value1,value2,value3)`	Replaces a sequence of characters in a string with another set of characters. However, it replaces a single character at a time.
`upper(value1)`	Converts given string to uppercase